[Bf-cycles] Non-Progressive integrator on GPU

Sun May 19 20:08:52 CEST 2013

I don't think it would be that much, more like 10 or so, but hard to say. :)

I tried building it with Toolkit 5.0 as well, and it uses much less RAM, 
so I really hope the upcoming Toolkit will handle the code well & give 
us the same performance as the Toolkit 4.2.

Am 19.05.2013 19:57, schrieb Matthew Heimlich:
> If anyone's interested in a non-prog hair GPU build I could probably
> provide one, unless you think the build would take more than 24GB of
> RAM. If people express interest I'll go ahead and do it.
>
> On Sun, May 19, 2013 at 9:10 AM, Thomas Dinges <blender at dingto.org> wrote:
>> Hi Brecht,
>> thank you for the memmove alternative, works fine. :)
>>
>> I agree with you that this is nothing for Trunk, we can re-evaluate this
>> when switch to a new Toolkit.
>> Toolkit 5.0 already works better with our big kernel, but brings a
>> slowdown. Fingers crossed for 5.x, hope nvidia will release that soon.
>>
>> In the meantime, everyone interested in GPU Non-Progressive rendering
>> can use this:
>>
>> Patch: http://blender.dingto.org/patches/non_progressive_gpu.diff
>> Build (Windows x64):
>> http://blender.dingto.org/win64_r56913_GPU_Non-Progressive.7z
>>
>> This still comes with disabled Hair support on the GPU, so basically
>> Blender 2.67 feature set, just with Non-Progressive on the GPU.
>>
>> Best regards,
>> Thomas
>>
>> Am 19.05.2013 06:23, schrieb Brecht Van Lommel:
>>> Hi,
>>>
>>> On Sun, May 19, 2013 at 1:50 AM, Thomas Dinges <blender at dingto.org> wrote:
>>>> Hi Brecht,
>>>> I looked into enabling the Non-Progressive integrator on GPU and want to
>>>> share my findings.
>>>>
>>>> As far as I can tell there is 1 problem (maybe 2).
>>>>
>>>> 1) CUDA does not know memset(), called from within
>>>> shader_merge_closures() in kernel_shaders.h.
>>>> I could not find a direct alternative, but it seems there are
>>>> workarounds for it.
>>>> https://devtalk.nvidia.com/default/topic/394123/moving-memory-cudamemmove-/
>>> We don't need memmove, can just be replaced with:
>>>
>>> for(int k = 0; k < size; k++)
>>>       scj[k] = scj[k+1];
>>>
>>>> (2) With Non Progressive integrator enabled, the CUDA compiler takes a
>>>> lot of memory. I had to disable __HAIR__ in order to keep my RAM alive,
>>>> but even then it took 4.5 GB (just the compiler process, peak) to
>>>> compile the sm_21 kernel.
>>> To reduce memory you could try replacing __device with
>>> __device_noinline for some big functions called from the
>>> non-progressive integrator code. It might reduce performance for the
>>> progressive integrator but might also not, needs testing.
>>>
>>> Still not sure we want to have a kernel that pushes against memory
>>> limits again, we should keep it manageable so that things don't break
>>> on every feature added.
>>>
>>> Brecht.
>>> _______________________________________________
>>> Bf-cycles mailing list
>>> Bf-cycles at blender.org
>>> http://lists.blender.org/mailman/listinfo/bf-cycles
>>
>> --
>> Thomas Dinges
>> Blender Developer, Artist and Musician
>>
>> www.dingto.org
>>
>> _______________________________________________
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-cycles
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles

-- 
Thomas Dinges
Blender Developer, Artist and Musician

www.dingto.org