[Bf-blender-cvs] [e0716af1a4f] cycles-x: Cycles X: Align kernels of existing and new paths
Sergey Sharybin
noreply at git.blender.org
Fri May 21 20:04:55 CEST 2021
Commit: e0716af1a4f43bc3bf9238556dcd44d35e830ed9
Author: Sergey Sharybin
Date: Fri May 21 14:31:50 2021 +0200
Branches: cycles-x
https://developer.blender.org/rBe0716af1a4f43bc3bf9238556dcd44d35e830ed9
Cycles X: Align kernels of existing and new paths
Only enqueue new kernels when the existing wavefront is at the
intersect closest stage. This seems to positively affect on the
coherency, gaining performance:
```
new cycles-x(1) megakernel(2)
bmw27.blend 10.198 10.6995 10.4269
classroom.blend 16.7821 17.2352 16.6609
pabellon.blend 9.39898 9.65984 9.14966
monster.blend 10.5923 10.5799 12.0106
barbershop_interior.blend 11.777 11.8852 12.5769
junkshop.blend 16.085 16.2971 16.5213
pvt_flat.blend 16.5704 16.3189 17.4047
(1) cyclex-x branch hash ad81074fab1
(2) cyclex-x branch hash ef6ce4fa8ca (right before disabling megakernel)
```
While the pvt_flat (with adaptive sampling) is 1% slower, some
other scenes has performance gained almost all the way back in
comparison to the Cycles-X before megakernel removal.
Note that coherency is a hypothesis. Performance gain might also be
caused by less active paths array calculations.
===================================================================
M intern/cycles/integrator/path_trace_work_gpu.cpp
M intern/cycles/integrator/path_trace_work_gpu.h
===================================================================
diff --git a/intern/cycles/integrator/path_trace_work_gpu.cpp b/intern/cycles/integrator/path_trace_work_gpu.cpp
index 615832dd443..6a50feab497 100644
--- a/intern/cycles/integrator/path_trace_work_gpu.cpp
+++ b/intern/cycles/integrator/path_trace_work_gpu.cpp
@@ -193,6 +193,23 @@ void PathTraceWorkGPU::render_samples(int start_sample, int samples_num)
}
}
+DeviceKernel PathTraceWorkGPU::get_most_queued_kernel() const
+{
+ const IntegratorQueueCounter *queue_counter = integrator_queue_counter_.data();
+
+ int max_num_queued = 0;
+ DeviceKernel kernel = DEVICE_KERNEL_NUM;
+
+ for (int i = 0; i < DEVICE_KERNEL_INTEGRATOR_NUM; i++) {
+ if (queue_counter->num_queued[i] > max_num_queued) {
+ kernel = (DeviceKernel)i;
+ max_num_queued = queue_counter->num_queued[i];
+ }
+ }
+
+ return kernel;
+}
+
void PathTraceWorkGPU::enqueue_reset()
{
const int num_keys = integrator_sort_key_counter_.size();
@@ -210,7 +227,7 @@ void PathTraceWorkGPU::enqueue_reset()
bool PathTraceWorkGPU::enqueue_path_iteration()
{
/* Find kernel to execute, with max number of queued paths. */
- IntegratorQueueCounter *queue_counter = integrator_queue_counter_.data();
+ const IntegratorQueueCounter *queue_counter = integrator_queue_counter_.data();
int num_paths = 0;
for (int i = 0; i < DEVICE_KERNEL_INTEGRATOR_NUM; i++) {
@@ -222,17 +239,8 @@ bool PathTraceWorkGPU::enqueue_path_iteration()
}
/* Find kernel to execute, with max number of queued paths. */
- int max_num_queued = 0;
- DeviceKernel kernel = DEVICE_KERNEL_NUM;
-
- for (int i = 0; i < DEVICE_KERNEL_INTEGRATOR_NUM; i++) {
- if (queue_counter->num_queued[i] > max_num_queued) {
- kernel = (DeviceKernel)i;
- max_num_queued = queue_counter->num_queued[i];
- }
- }
-
- if (max_num_queued == 0) {
+ const DeviceKernel kernel = get_most_queued_kernel();
+ if (kernel == DEVICE_KERNEL_NUM) {
return false;
}
@@ -390,6 +398,15 @@ void PathTraceWorkGPU::compute_queued_paths(DeviceKernel kernel, int queued_kern
bool PathTraceWorkGPU::enqueue_work_tiles(bool &finished)
{
+ /* If there are existing paths wait them to go to intersect closest kernel, which will align the
+ * wavefront of the existing and newely added paths. */
+ /* TODO: Check whether counting new intersection kernels here will have positive affect on the
+ * performance. */
+ const DeviceKernel kernel = get_most_queued_kernel();
+ if (kernel != DEVICE_KERNEL_NUM && kernel != DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST) {
+ return false;
+ }
+
const float regenerate_threshold = 0.5f;
int num_paths = get_num_active_paths();
diff --git a/intern/cycles/integrator/path_trace_work_gpu.h b/intern/cycles/integrator/path_trace_work_gpu.h
index e3b67c08cac..3cd193e606f 100644
--- a/intern/cycles/integrator/path_trace_work_gpu.h
+++ b/intern/cycles/integrator/path_trace_work_gpu.h
@@ -54,6 +54,9 @@ class PathTraceWorkGPU : public PathTraceWork {
void alloc_integrator_queue();
void alloc_integrator_sorting();
+ /* Returns DEVICE_KERNEL_NUM if there are no scheduled kernels. */
+ DeviceKernel get_most_queued_kernel() const;
+
void enqueue_reset();
bool enqueue_work_tiles(bool &finished);
More information about the Bf-blender-cvs
mailing list