[Bf-cycles] split kernel and CUDA
Thomas Dinges
blender at dingto.org
Tue May 17 16:45:59 CEST 2016
That sounds promising, feel free to submit a patch for this and we can
check. :)
Am 17.05.2016 um 16:40 schrieb Stefan Werner:
>
> The patch is surprisingly clean. It removes some of the #ifdef
> __SPLIT_KERNEL__ blocks and unifies CPU, OpenCL and CUDA a bit more. I
> didn’t run a speed benchmark, and I wouldn’t even make speed the
> ultimate top priority: Right now, the problem we see in the field is
> that people are unable to use high-end gaming GPUs because the VRAM is
> so full of geometry and textures that the CUDA runtime doesn’t have
> room for kernel memory any more. On my 1664 core M4000 card, I see a
> simple kernel launch already taking ~1600MB of VRAM with almost empty
> scenes.
>
>
>
> It looks to me like the CUDA compiler reserves room for every stack
> instance of ShaderData (or other structs) in advance, and that sharing
> that memory instead of instantiating it separately is an easy way to
> reduce VRAM requirements without changing the code much.
>
>
>
> -Stefan
>
>
>
> *From: *<bf-cycles-bounces at blender.org> on behalf of Sergey Sharybin
> <sergey.vfx at gmail.com>
> *Reply-To: *Discussion list to assist Cycles render engine developers
> <bf-cycles at blender.org>
> *Date: *Tuesday, May 17, 2016 at 9:20 AM
> *To: *Discussion list to assist Cycles render engine developers
> <bf-cycles at blender.org>
> *Subject: *Re: [Bf-cycles] split kernel and CUDA
>
>
>
> hi,
>
>
>
> Lukas Stocker was doing experiments with CUDA split kernel. With
> the current design of the split it was taking more VRAM actually,
> AFAIR. Hopefully he'll read this mail and reply in more details.
>
>
>
> Would be cool to have this front moving forward, but i fear we'll
> have to step back and reconsider some things about how split
> kernel works together with a regular one.
>
>
>
> There are interesting results on the stack memory! I can see
> number of spill loads go up tho, did you measure if it gives
> measurable render time slowdown? And how messy is the patch i
> wonder :)
>
>
>
> On Tue, May 17, 2016 at 8:47 AM, Stefan Werner
> <swerner at smithmicro.com <mailto:swerner at smithmicro.com>> wrote:
>
> Hi,
>
> Has anyone experimented with building a split kernel for CUDA?
> It seems to me that this could lift some of the limitations on
> Nvidia hardware, such as the high memory requirements on cards
> with many CUDA cores or the driver time out. I just tried out
> what happens when I take the shared ShaderData
> (KernelGlobals.sd_input) from the split kernel into the CUDA
> kernel, as opposed to creating separate ShaderData structs on
> the stack, and it looks like it has an impact:
>
> before:
> ptxas info : Compiling entry function
> 'kernel_cuda_branched_path_trace' for 'sm_50'
> ptxas info : Function properties for
> kernel_cuda_branched_path_trace
> 68416 bytes stack frame, 1188 bytes spill stores, 3532
> bytes spill loads
>
> after:
> ptxas info : Compiling entry function
> 'kernel_cuda_branched_path_trace' for 'sm_50'
> ptxas info : Function properties for
> kernel_cuda_branched_path_trace
> 58976 bytes stack frame, 1256 bytes spill stores, 3676
> bytes spill loads
>
> -Stefan
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org <mailto:Bf-cycles at blender.org>
> https://lists.blender.org/mailman/listinfo/bf-cycles
>
>
>
>
>
> --
>
> With best regards, Sergey Sharybin
>
>
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> https://lists.blender.org/mailman/listinfo/bf-cycles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20160517/e19baa6a/attachment.htm
More information about the Bf-cycles
mailing list