[Bf-cycles] Non-Progressive integrator on GPU

Matthew Heimlich matt.heimlich at gmail.com
Sun May 19 19:57:03 CEST 2013


If anyone's interested in a non-prog hair GPU build I could probably
provide one, unless you think the build would take more than 24GB of
RAM. If people express interest I'll go ahead and do it.

On Sun, May 19, 2013 at 9:10 AM, Thomas Dinges <blender at dingto.org> wrote:
> Hi Brecht,
> thank you for the memmove alternative, works fine. :)
>
> I agree with you that this is nothing for Trunk, we can re-evaluate this
> when switch to a new Toolkit.
> Toolkit 5.0 already works better with our big kernel, but brings a
> slowdown. Fingers crossed for 5.x, hope nvidia will release that soon.
>
> In the meantime, everyone interested in GPU Non-Progressive rendering
> can use this:
>
> Patch: http://blender.dingto.org/patches/non_progressive_gpu.diff
> Build (Windows x64):
> http://blender.dingto.org/win64_r56913_GPU_Non-Progressive.7z
>
> This still comes with disabled Hair support on the GPU, so basically
> Blender 2.67 feature set, just with Non-Progressive on the GPU.
>
> Best regards,
> Thomas
>
> Am 19.05.2013 06:23, schrieb Brecht Van Lommel:
>> Hi,
>>
>> On Sun, May 19, 2013 at 1:50 AM, Thomas Dinges <blender at dingto.org> wrote:
>>> Hi Brecht,
>>> I looked into enabling the Non-Progressive integrator on GPU and want to
>>> share my findings.
>>>
>>> As far as I can tell there is 1 problem (maybe 2).
>>>
>>> 1) CUDA does not know memset(), called from within
>>> shader_merge_closures() in kernel_shaders.h.
>>> I could not find a direct alternative, but it seems there are
>>> workarounds for it.
>>> https://devtalk.nvidia.com/default/topic/394123/moving-memory-cudamemmove-/
>> We don't need memmove, can just be replaced with:
>>
>> for(int k = 0; k < size; k++)
>>      scj[k] = scj[k+1];
>>
>>> (2) With Non Progressive integrator enabled, the CUDA compiler takes a
>>> lot of memory. I had to disable __HAIR__ in order to keep my RAM alive,
>>> but even then it took 4.5 GB (just the compiler process, peak) to
>>> compile the sm_21 kernel.
>> To reduce memory you could try replacing __device with
>> __device_noinline for some big functions called from the
>> non-progressive integrator code. It might reduce performance for the
>> progressive integrator but might also not, needs testing.
>>
>> Still not sure we want to have a kernel that pushes against memory
>> limits again, we should keep it manageable so that things don't break
>> on every feature added.
>>
>> Brecht.
>> _______________________________________________
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-cycles
>
>
> --
> Thomas Dinges
> Blender Developer, Artist and Musician
>
> www.dingto.org
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles


More information about the Bf-cycles mailing list