[Bf-cycles] split kernel and CUDA

Stefan Werner swerner at smithmicro.com
Tue May 17 08:47:55 CEST 2016


Hi,

Has anyone experimented with building a split kernel for CUDA? It seems to me that this could lift some of the limitations on Nvidia hardware, such as the high memory requirements on cards with many CUDA cores or the driver time out. I just tried out what happens when I take the shared ShaderData (KernelGlobals.sd_input) from the split kernel into the CUDA kernel, as opposed to creating separate ShaderData structs on the stack, and it looks like it has an impact:

before:
ptxas info    : Compiling entry function 'kernel_cuda_branched_path_trace' for 'sm_50'
ptxas info    : Function properties for kernel_cuda_branched_path_trace
    68416 bytes stack frame, 1188 bytes spill stores, 3532 bytes spill loads

after:
ptxas info    : Compiling entry function 'kernel_cuda_branched_path_trace' for 'sm_50'
ptxas info    : Function properties for kernel_cuda_branched_path_trace
    58976 bytes stack frame, 1256 bytes spill stores, 3676 bytes spill loads

-Stefan



More information about the Bf-cycles mailing list