[Bf-blender-cvs] [f6c8a78ac68] master: Cycles: Fix bvh2 gen on Apple Silicon and use it to speed up renders

Thu Jan 20 16:37:51 CET 2022

Commit: f6c8a78ac684242ba067499511a0db2fa64657fe
Author: Michael Jones
Date:   Thu Jan 20 10:11:58 2022 +0000
Branches: master
https://developer.blender.org/rBf6c8a78ac684242ba067499511a0db2fa64657fe

Cycles: Fix bvh2 gen on Apple Silicon and use it to speed up renders

This patch fixes a correctness issue discovered in the `int4 select(...)` function on Apple Silicon machines, which causes bad bvh2 builds. Although the generated bvh2s give correct renders, the resulting runtime performance is terrible. This fix allows us to switch over to bvh2 on Apple Silicon giving a significant performance uplift for many of the standard benchmarking assets. It also fixes some unit test failures stemming from the use of MetalRT, and trivially enables the new pointclo [...]

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13877

===================================================================

M	intern/cycles/device/metal/device_impl.mm
M	intern/cycles/util/math_int4.h

===================================================================

diff --git a/intern/cycles/device/metal/device_impl.mm b/intern/cycles/device/metal/device_impl.mm
index 5906da3680b..17acb2c94e4 100644
--- a/intern/cycles/device/metal/device_impl.mm
+++ b/intern/cycles/device/metal/device_impl.mm
@@ -87,17 +87,14 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
     default:
       break;
     case METAL_GPU_INTEL: {
-      use_metalrt = false;
       max_threads_per_threadgroup = 64;
       break;
     }
     case METAL_GPU_AMD: {
-      use_metalrt = false;
       max_threads_per_threadgroup = 128;
       break;
     }
     case METAL_GPU_APPLE: {
-      use_metalrt = true;
       max_threads_per_threadgroup = 512;
       break;
     }
diff --git a/intern/cycles/util/math_int4.h b/intern/cycles/util/math_int4.h
index 9e3f001efc2..eaa9be73b63 100644
--- a/intern/cycles/util/math_int4.h
+++ b/intern/cycles/util/math_int4.h
@@ -131,10 +131,7 @@ ccl_device_inline int4 clamp(const int4 &a, const int4 &mn, const int4 &mx)
 ccl_device_inline int4 select(const int4 &mask, const int4 &a, const int4 &b)
 {
 #  ifdef __KERNEL_SSE__
-  const __m128 m = _mm_cvtepi32_ps(mask);
-  /* TODO(sergey): avoid cvt. */
-  return int4(_mm_castps_si128(
-      _mm_or_ps(_mm_and_ps(m, _mm_castsi128_ps(a)), _mm_andnot_ps(m, _mm_castsi128_ps(b)))));
+  return int4(_mm_or_si128(_mm_and_si128(mask, a), _mm_andnot_si128(mask, b)));
 #  else
   return make_int4(
       (mask.x) ? a.x : b.x, (mask.y) ? a.y : b.y, (mask.z) ? a.z : b.z, (mask.w) ? a.w : b.w);