[Bf-committers] OMP to BLI_task: About threading and lockfree operations

Tue Jan 26 21:57:34 CET 2016

Hi devs,

So, while working on switching from OMP to BLI_task, I've hit today an 
rather hairy issue in pbvh_update_normals() (in BKE's pbvh.c).

Current OMP-based code of that function executes in (roughly) 28ms. 
However, reading it, it's obvious it's lacking a `#pragma omp critical` 
section
around main part of the second OMP loop, since affected MVert may be 
used by several nodes, and hence suffer concurrency here.

Now, if I add that critical section, code now takes over 100ms to run!

Changing to BLI_task parrallelized code, with same correct protection 
(using a spinlock) I can get about 70ms. Using some gcc atomic op
(__sync_fetch_and_and), since the atomic ME_VERT_PBVH_UPDATE flag 
checking and clearing is enough to prevent concurrency here,
it can go down to 20ms.

Sad part of the story: because MVert->flag is and 8bit var, there is no 
clean way to do the same under OSX, we can only hack around
OSAtomicTestAndClear but it's far from nice and clean (need to retrieve 
bit number from bitmask eg.).

Now comes the funny question: since the probability of thread 
concurrency here is very low (most vertices are used by only one bvhnode,
and the 'fragile' part of the code is *very* short, two bitflag 
operations), do we want to care about absolute safety of computed 
normals here?
Missing 'critical section' in second loop never hurted us so far it seems...

Need your thoughts here, before I loose more time on this hell. ;)