[Bf-cycles] split kernel and CUDA

Tue May 17 08:47:55 CEST 2016

Hi,

Has anyone experimented with building a split kernel for CUDA? It seems to me that this could lift some of the limitations on Nvidia hardware, such as the high memory requirements on cards with many CUDA cores or the driver time out. I just tried out what happens when I take the shared ShaderData (KernelGlobals.sd_input) from the split kernel into the CUDA kernel, as opposed to creating separate ShaderData structs on the stack, and it looks like it has an impact:

before:
ptxas info    : Compiling entry function 'kernel_cuda_branched_path_trace' for 'sm_50'
ptxas info    : Function properties for kernel_cuda_branched_path_trace
    68416 bytes stack frame, 1188 bytes spill stores, 3532 bytes spill loads

after:
ptxas info    : Compiling entry function 'kernel_cuda_branched_path_trace' for 'sm_50'
ptxas info    : Function properties for kernel_cuda_branched_path_trace
    58976 bytes stack frame, 1256 bytes spill stores, 3676 bytes spill loads

-Stefan