[Bf-committers] OMP to BLI_task: About threading and lockfree operations

Wed Jan 27 10:03:44 CET 2016

Sure answer is: we have to be thread-safe, otherwise it's just waiting for
nasty heisenbug to happen.

Now, our atomic_ops.h is based on Jemalloc from 3 years ago. There are some
interesting twists happened in the upstream's atomic.h supporting C11
atomics and such. We can totally:

1. Optionally update our atomic ops
2. Switch away from OSAtomic on OSX and use different implementation

On Tue, Jan 26, 2016 at 11:49 PM, Brecht Van Lommel <
brechtvanlommel at pandora.be> wrote:

> It's fine to use __sync_fetch_and_and on OS X, no need to use
> OSAtomic* functions. Using 8 bit __sync_fetch_and_and on OS X compiles
> fine for me.
>
> OpenImageIO removed OSAtomic* almost 3 years ago, and we've had no
> problem with that:
>
> https://github.com/OpenImageIO/oiio/commit/14d5d8e572322e5e0decdd1b7d3f4d0484819473
>
> On Tue, Jan 26, 2016 at 9:57 PM, Bastien Montagne <montagne29 at wanadoo.fr>
> wrote:
> > Hi devs,
> >
> > So, while working on switching from OMP to BLI_task, I've hit today an
> > rather hairy issue in pbvh_update_normals() (in BKE's pbvh.c).
> >
> > Current OMP-based code of that function executes in (roughly) 28ms.
> > However, reading it, it's obvious it's lacking a `#pragma omp critical`
> > section
> > around main part of the second OMP loop, since affected MVert may be
> > used by several nodes, and hence suffer concurrency here.
> >
> > Now, if I add that critical section, code now takes over 100ms to run!
> >
> > Changing to BLI_task parrallelized code, with same correct protection
> > (using a spinlock) I can get about 70ms. Using some gcc atomic op
> > (__sync_fetch_and_and), since the atomic ME_VERT_PBVH_UPDATE flag
> > checking and clearing is enough to prevent concurrency here,
> > it can go down to 20ms.
> >
> > Sad part of the story: because MVert->flag is and 8bit var, there is no
> > clean way to do the same under OSX, we can only hack around
> > OSAtomicTestAndClear but it's far from nice and clean (need to retrieve
> > bit number from bitmask eg.).
> >
> > Now comes the funny question: since the probability of thread
> > concurrency here is very low (most vertices are used by only one bvhnode,
> > and the 'fragile' part of the code is *very* short, two bitflag
> > operations), do we want to care about absolute safety of computed
> > normals here?
> > Missing 'critical section' in second loop never hurted us so far it
> seems...
> >
> > Need your thoughts here, before I loose more time on this hell. ;)
> > _______________________________________________
> > Bf-committers mailing list
> > Bf-committers at blender.org
> > http://lists.blender.org/mailman/listinfo/bf-committers
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>

-- 
With best regards, Sergey Sharybin