I noticed that the Windows implementation of SpinLock seemed to be missing
the "YieldProcessor" command. The spinlock implementation is in

void BLI_spin_lock(SpinLock *spin)
#if defined(__APPLE__)
#elif defined(_MSC_VER)
  while (InterlockedExchangeAcquire(spin, 1)) {
    while (*spin) {
      /* pass */

I propose adding the "YieldProcessor()" macro to the inner while(*spin)

"YieldProcessor" is a macro in Windows that compiles into the _mm_pause
intrinsic (for the assembly instruction "pause"). Windows documentation (
ms687419(v=vs.85).aspx ) suggests that this simply gives
processor-resources to the hyperthread-sibling.

Intel's documentation of _mm_pause goes even further, suggesting that the
pause asm instruction allows for the processor to come out of a spinlock
more quickly. ( https://software.intel.com/en-us/node/524249 )

"pause" is in fact a special NOP command, its an alias to the assembly code
"rep nop", and therefore compiles under all x86-supported processors. Its
the ideal command that has broad compatibility and improves
lock-performance in modern processors.

In short, I'm suggesting the following one-line diff:

    while (*spin) {
        YieldProcessor(); // Special "NOP" hint to processor for
hyperthreads and spinlocks

I would expect that this diff would improve performance.

Note: This "pause" may be related to: https://developer.blender.org/T53068.
I noticed that the Windows implementation of pthreads does not have a
_mm_pause() in it, so that may have been causing poor performance a few
builds ago.

