[Bf-cycles] CUDA performance tests

Brecht Van Lommel brechtvanlommel at pandora.be
Mon Apr 28 14:56:14 CEST 2014


Forgot to mention the post __launch_bounds__ test was compiled with
this change which matches the parameters you used:
https://developer.blender.org/rB6dec2b1a2be513718ed7544c8bb14ce620857279

On Mon, Apr 28, 2014 at 2:49 PM, Brecht Van Lommel
<brechtvanlommel at pandora.be> wrote:
> To Rolf, thanks for the tests, I hadn't looked into sm_50 yet. It
> seems there is a big performance regression after the
> __launch_bounds__ change, the other performance differences are more
> or less as expected.
>
> I'm not sure why there is a difference there. You correctly changed
> CUDA_MULTIPROCESSOR_MAX_BLOCKS to 32 for sm_50, so that seems right. I
> also checked the ptxas output to see if it was indeed still using max
> 32 registers and it seems to be the case:
>
> pre __launch_bounds__:
>
> ptxas info    : Compiling entry function 'kernel_cuda_path_trace' for 'sm_50'
> ptxas info    : Function properties for kernel_cuda_path_trace
>     9248 bytes stack frame, 3764 bytes spill stores, 3676 bytes spill loads
> ptxas info    : Used 32 registers, 764 bytes cmem[0], 392 bytes
> cmem[2], 100 textures
>
> post __launch_bounds__:
>
> ptxas info    : Compiling entry function 'kernel_cuda_path_trace' for 'sm_50'
> ptxas info    : Function properties for kernel_cuda_path_trace
>     9248 bytes stack frame, 3776 bytes spill stores, 3668 bytes spill loads
> ptxas info    : Used 32 registers, 764 bytes cmem[0], 392 bytes
> cmem[2], 100 textures
>
> If possible, could you show this ptxas output for your build, to
> verify if you are getting the right values? It prints to the console
> by default during builds.
>
> Thanks,
> Brecht.
>
> On Sat, Apr 26, 2014 at 3:33 PM, Brecht Van Lommel
> <brechtvanlommel at pandora.be> wrote:
>> For the two cards I have here, it seems that after tweaking the code,
>> CUDA 6.0 performance is similar to previous releases with 5.0. There's
>> no improvement and even some small slowdown, but we actually had to do
>> quite a bit work since 5.5 was released to even get it to that point,
>> and some features have been added in the meantime. Overall I think
>> it's acceptable if other cards show similar results.
>>
>> The patches are from an earlier test I did with the CUDA 6.0 beta,
>> based on feedback from NVidia. They help on sm_30 and sm_35, but not
>> sm_20 and sm_21 (these changes were also tested on 480 GTX and Tesla
>> K20m). So it seems we should enable them for sm_3x if we switch to
>> CUDA 6.0.
>>
>> We need to test more cards and platforms, comparing latest master +
>> CUDA 6.0 to the 2.69 and 2.70a releases, before we can make a
>> decision.
>>
>> Brecht.
>>
>> On Fri, Apr 25, 2014 at 5:44 PM, Brecht Van Lommel
>> <brechtvanlommel at pandora.be> wrote:
>>> Hi all,
>>>
>>> We're having some trouble with CUDA performance at the moment, with in
>>> a performance regression on Titan cards compared to 2.69, and unclear
>>> effects from recent changes. Further there's the new CUDA 6.0 toolkit,
>>> which we ideally could move to, as sticking to 5.0 is a problem with
>>> newer compilers.
>>>
>>> We could use some help testing all this, especially for Titan cards as
>>> I don't have one of those.
>>>
>>>
>>> There's a google doc here:
>>> https://docs.google.com/spreadsheets/d/1IKvO6s7h_0oVGbGE3giowx2fBvyXzIfdM3qCSfROJfI
>>>
>>> For now I'm assuming someone with a Titan card can make own builds to
>>> test these git revisions. If not we can build and upload binaries to
>>> test.
>>>
>>> The test is:
>>> * download: http://www.pasteall.org/blend/28679
>>> * run: ./blender -b modified_bmw.blend -f 1
>>>
>>> If you've got a different .blend file that is giving performance
>>> regressions, or you find different behavior when rendering in the UI
>>> rather than command line, then results from that are welcome too.
>>>
>>>
>>> The information that I'm looking for from this is especially:
>>> * Which revision caused the Titan performance regressions?
>>> * Is CUDA 6.0 performance acceptable to us for the next release?
>>> * Were there significant performance regressions due to recent changes?
>>> * Which effect do the two provided patches have when building with CUDA 6.0?
>>>
>>> Thanks,
>>> Brecht.


More information about the Bf-cycles mailing list