[Bf-cycles] Cuda launch bounds for Pascal

Stefan Werner stewreo at gmail.com
Wed Nov 15 12:49:54 CET 2017


Wow, those results are almost the complete opposite of what I'm seeing. I
re-ran the tests on Linux:

Nvidia 1080Ti, driver 384.90, installed as secondary GPU (no display
attached)
Xubuntu 17.04, CUDA 9.0.176, gcc 6.3.0
master branch, 556b13f03e561b54d4f0186e207f080c786f8b66

48 registers:
BMW: 1m28s
Classroom: 3m12s
Fish Cat: 3m07s
Koro: 5m40s
Pavillion: 6m52s
Victor: 15m01s

 64 registers:
 BMW: 1m11s
 Classroom: 2m59s
 Fishy Cat: 2m51s
 Koro: 4m39s
 Pavillion: 5m32s
 Victor: 12m19s

(Victor had a tile size of 32, all others were the *_gpu.blend files with
the default 256 tile size)

On Windows, all GTX cards are treated as display cards, regardless of
whether a monitor is plugged in or not. Only Quadro, Tesla and Titan cards
can be set to TCC, that mode is not available for my GTX.

I wonder what's behind the difference we're seeing? The GPUs themselves
shoudln't be that different, both are based on GP102, where only the 1080Ti
has two SMX units disabled.

-Stefan

On Wed, Nov 15, 2017 at 1:35 AM, Brecht Van Lommel <
brechtvanlommel at pandora.be> wrote:

> Hi,
>
> The registers were set based on benchmarks with a GTX 1080 on Linux, when
> we first optimized the code for Pascal. But that was more than a year ago.
> Going from 63 to 64 registers should be fine if it's faster.
>
> Here's a benchmarks with a Titan Xp, Linux, driver 384.90. Results are not
> so good there:
> CUDA 8.0.61: https://developer.blender.org/F1137606
> CUDA 9.0.102: https://developer.blender.org/F1137502
>
> Which driver and CUDA version are you using?
>
> One difference between Windows and Linux is the compute preemption
> support. It might be useful to test if that min_blocks *= 8 helps on
> Windows, if your GTX 1080Ti is used for display.
> https://developer.blender.org/rBe360d003e
>
> Regards,
> Brecht.
>
>
> On Tue, Nov 14, 2017 at 11:48 PM, Stefan Werner <stewreo at gmail.com> wrote:
>
>> Hello,
>>
>> currently the Cuda kernel uses the same launch bounds for Pascal (SM 6.x)
>> as for Maxwell (SM 5.x) hardware, that is 63 registers for branched path
>> tracing and 48 registers for path tracing. Are all of those derived from
>> benchmarks or is the value for Pascal just being carried over from Maxwell?
>>
>> The reason I'm asking is that I'm observing a performance increase on
>> Pascal when I increase the number of registers to 64 for path tracing. Here
>> are before/after benchmarks from a GTX 1080Ti/Win10:
>>
>> 48 registers (as is):
>> BMW: 1m52
>> Classroom: 3m31s
>> Fishy Cat: 4m33s
>> Koro: 8m30s
>> Pavillion: 7m39s
>>
>> 64 registers:
>> BMW: 1m36s
>> Classroom: 3m34s
>> Fishy Cat: 3m57s
>> Koro: 6m45s
>> Pavillion: 6m39s
>>
>> With the exception of the classroom scene, all benchmarks show
>> significantly better performance. If there are no objections, I'd like to
>> commit that register increase for SM 6.x to master.
>>
>> Running the same test on a Quadro M4000 (Maxwell) shows much smaller
>> differences, so I'd leave SM 5.x as is:
>>
>> 48 registers (as is):
>> BMW: 4m38s
>> Classroom: 12m32s
>> Fishy Cat: 11m18s
>> Koro: 20m38s
>> Pavillion: 21m12s
>>
>> 64 registers:
>> BMW: 4m38s
>> Classroom: 13m07s
>> Fishy Cat: 10m52s
>> Koro: 18m51s
>> Pavillion: 21m32s
>>
>> Another note: 63 registers was a hard limit for SM 2.x hardware. Is 63
>> instead of 64 as register limit for kernels SM 3.x and higher just carried
>> over or is there a reason to not go to 64 registers?
>>
>> -Stefan
>> PS: I'd love it if someone would sacrifice the time to run 48/64 register
>> comparison benchmarks on other Pascal hardware and/or on Linux.
>>
>> _______________________________________________
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
>> https://lists.blender.org/mailman/listinfo/bf-cycles
>>
>>
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> https://lists.blender.org/mailman/listinfo/bf-cycles
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.blender.org/pipermail/bf-cycles/attachments/20171115/2a40aa2d/attachment.html>


More information about the Bf-cycles mailing list