[Bf-committers] Blender's implementation of Spinlock in Windows (copy)

Thu May 24 21:55:10 CEST 2018

 Hey all,

I originally posted a topic over at devtalk (https://devtalk.blender.org/t
/threading-question-why-no-mm-pause-yield-in-the-spinlock-implementation/466),
but its only got 25 views after multiple days. So I'm trying this mailing
list in an attempt to get a few more eyeballs on this. Also: the first time
I submitted this email, I forgot to subscribe to this list. So my first
email is stuck somewhere in the moderator queue. That's why I'm calling
this a (copy).

I noticed that the Windows implementation of SpinLock seemed to be missing
the "YieldProcessor" command. The spinlock implementation is in
"source/blender/blenlib/intern/threads.c".

void BLI_spin_lock(SpinLock *spin)
{
#if defined(__APPLE__)
  OSSpinLockLock(spin);
#elif defined(_MSC_VER)
  while (InterlockedExchangeAcquire(spin, 1)) {
    while (*spin) {
      /* pass */
    }
  }
#else
  pthread_spin_lock(spin);
#endif
}

I propose adding the "YieldProcessor()" macro to the inner while(*spin)
loop.

"YieldProcessor" is a macro in Windows that compiles into the _mm_pause
intrinsic (for the assembly instruction "pause"). Windows documentation (
https://msdn.microsoft.com/en-us/library/windows/desktop/
ms687419(v=vs.85).aspx ) suggests that this simply gives
processor-resources to the hyperthread-sibling.

Intel's documentation of _mm_pause goes even further, suggesting that the
pause asm instruction allows for the processor to come out of a spinlock
more quickly. ( https://software.intel.com/en-us/node/524249 )

"pause" is in fact a special NOP command, its an alias to the assembly code
"rep nop", and therefore compiles under all x86-supported processors. Its
the ideal command that has broad compatibility and improves
lock-performance in modern processors.

In short, I'm suggesting the following one-line diff:

    while (*spin) {
        YieldProcessor(); // Special "NOP" hint to processor for
hyperthreads and spinlocks
    }

I would expect that this diff would improve performance.

Note: This "pause" may be related to: https://developer.blender.org/T53068.
I noticed that the Windows implementation of pthreads does not have a
_mm_pause() in it, so that may have been causing poor performance a few
builds ago.

-- Percy