[Bf-committers] OMP to BLI_task: About threading and lockfree operations

Wed Jan 27 12:13:00 CET 2016

Thanks guys, thats globally good news. :)

For now will just add needed bit to our atomic_ops.h, but yes, think we 
should definitively think about updating it...

Le 27/01/2016 10:03, Sergey Sharybin a écrit :
> Sure answer is: we have to be thread-safe, otherwise it's just waiting for
> nasty heisenbug to happen.
>
> Now, our atomic_ops.h is based on Jemalloc from 3 years ago. There are some
> interesting twists happened in the upstream's atomic.h supporting C11
> atomics and such. We can totally:
>
> 1. Optionally update our atomic ops
> 2. Switch away from OSAtomic on OSX and use different implementation
>
> On Tue, Jan 26, 2016 at 11:49 PM, Brecht Van Lommel <
> brechtvanlommel at pandora.be> wrote:
>
>> It's fine to use __sync_fetch_and_and on OS X, no need to use
>> OSAtomic* functions. Using 8 bit __sync_fetch_and_and on OS X compiles
>> fine for me.
>>
>> OpenImageIO removed OSAtomic* almost 3 years ago, and we've had no
>> problem with that:
>>
>> https://github.com/OpenImageIO/oiio/commit/14d5d8e572322e5e0decdd1b7d3f4d0484819473
>>
>> On Tue, Jan 26, 2016 at 9:57 PM, Bastien Montagne <montagne29 at wanadoo.fr>
>> wrote:
>>> Hi devs,
>>>
>>> So, while working on switching from OMP to BLI_task, I've hit today an
>>> rather hairy issue in pbvh_update_normals() (in BKE's pbvh.c).
>>>
>>> Current OMP-based code of that function executes in (roughly) 28ms.
>>> However, reading it, it's obvious it's lacking a `#pragma omp critical`
>>> section
>>> around main part of the second OMP loop, since affected MVert may be
>>> used by several nodes, and hence suffer concurrency here.
>>>
>>> Now, if I add that critical section, code now takes over 100ms to run!
>>>
>>> Changing to BLI_task parrallelized code, with same correct protection
>>> (using a spinlock) I can get about 70ms. Using some gcc atomic op
>>> (__sync_fetch_and_and), since the atomic ME_VERT_PBVH_UPDATE flag
>>> checking and clearing is enough to prevent concurrency here,
>>> it can go down to 20ms.
>>>
>>> Sad part of the story: because MVert->flag is and 8bit var, there is no
>>> clean way to do the same under OSX, we can only hack around
>>> OSAtomicTestAndClear but it's far from nice and clean (need to retrieve
>>> bit number from bitmask eg.).
>>>
>>> Now comes the funny question: since the probability of thread
>>> concurrency here is very low (most vertices are used by only one bvhnode,
>>> and the 'fragile' part of the code is *very* short, two bitflag
>>> operations), do we want to care about absolute safety of computed
>>> normals here?
>>> Missing 'critical section' in second loop never hurted us so far it
>> seems...
>>> Need your thoughts here, before I loose more time on this hell. ;)
>>> _______________________________________________
>>> Bf-committers mailing list
>>> Bf-committers at blender.org
>>> http://lists.blender.org/mailman/listinfo/bf-committers
>> _______________________________________________
>> Bf-committers mailing list
>> Bf-committers at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-committers
>>
>
>