[Bf-cycles] split kernel and CUDA
Stefan Werner
swerner at smithmicro.com
Tue May 17 08:47:55 CEST 2016
Hi,
Has anyone experimented with building a split kernel for CUDA? It seems to me that this could lift some of the limitations on Nvidia hardware, such as the high memory requirements on cards with many CUDA cores or the driver time out. I just tried out what happens when I take the shared ShaderData (KernelGlobals.sd_input) from the split kernel into the CUDA kernel, as opposed to creating separate ShaderData structs on the stack, and it looks like it has an impact:
before:
ptxas info : Compiling entry function 'kernel_cuda_branched_path_trace' for 'sm_50'
ptxas info : Function properties for kernel_cuda_branched_path_trace
68416 bytes stack frame, 1188 bytes spill stores, 3532 bytes spill loads
after:
ptxas info : Compiling entry function 'kernel_cuda_branched_path_trace' for 'sm_50'
ptxas info : Function properties for kernel_cuda_branched_path_trace
58976 bytes stack frame, 1256 bytes spill stores, 3676 bytes spill loads
-Stefan
More information about the Bf-cycles
mailing list