[Bf-cycles] Cuda launch bounds for Pascal

Stefan Werner stewreo at gmail.com
Wed Nov 15 12:49:54 CET 2017

Wow, those results are almost the complete opposite of what I'm seeing. I
re-ran the tests on Linux:

Nvidia 1080Ti, driver 384.90, installed as secondary GPU (no display
Xubuntu 17.04, CUDA 9.0.176, gcc 6.3.0
master branch, 556b13f03e561b54d4f0186e207f080c786f8b66

48 registers:
BMW: 1m28s
Classroom: 3m12s
Fish Cat: 3m07s
Koro: 5m40s
Pavillion: 6m52s
Victor: 15m01s

 64 registers:
 BMW: 1m11s
 Classroom: 2m59s
 Fishy Cat: 2m51s
 Koro: 4m39s
 Pavillion: 5m32s
 Victor: 12m19s

(Victor had a tile size of 32, all others were the *_gpu.blend files with
the default 256 tile size)

On Windows, all GTX cards are treated as display cards, regardless of
whether a monitor is plugged in or not. Only Quadro, Tesla and Titan cards
can be set to TCC, that mode is not available for my GTX.

I wonder what's behind the difference we're seeing? The GPUs themselves
shoudln't be that different, both are based on GP102, where only the 1080Ti
has two SMX units disabled.


On Wed, Nov 15, 2017 at 1:35 AM, Brecht Van Lommel <
brechtvanlommel at pandora.be> wrote:

> Hi,
> The registers were set based on benchmarks with a GTX 1080 on Linux, when
> we first optimized the code for Pascal. But that was more than a year ago.
> Going from 63 to 64 registers should be fine if it's faster.
> Here's a benchmarks with a Titan Xp, Linux, driver 384.90. Results are not
> so good there:
> CUDA 8.0.61: https://developer.blender.org/F1137606
> CUDA 9.0.102: https://developer.blender.org/F1137502
> Which driver and CUDA version are you using?
> One difference between Windows and Linux is the compute preemption
> support. It might be useful to test if that min_blocks *= 8 helps on
> Windows, if your GTX 1080Ti is used for display.
> https://developer.blender.org/rBe360d003e
> Regards,
> Brecht.
> On Tue, Nov 14, 2017 at 11:48 PM, Stefan Werner <stewreo at gmail.com> wrote:
>> Hello,
>> currently the Cuda kernel uses the same launch bounds for Pascal (SM 6.x)
>> as for Maxwell (SM 5.x) hardware, that is 63 registers for branched path
>> tracing and 48 registers for path tracing. Are all of those derived from
>> benchmarks or is the value for Pascal just being carried over from Maxwell?
>> The reason I'm asking is that I'm observing a performance increase on
>> Pascal when I increase the number of registers to 64 for path tracing. Here
>> are before/after benchmarks from a GTX 1080Ti/Win10:
>> 48 registers (as is):
>> BMW: 1m52
>> Classroom: 3m31s
>> Fishy Cat: 4m33s
>> Koro: 8m30s
>> Pavillion: 7m39s
>> 64 registers:
>> BMW: 1m36s
>> Classroom: 3m34s
>> Fishy Cat: 3m57s
>> Koro: 6m45s
>> Pavillion: 6m39s
>> With the exception of the classroom scene, all benchmarks show
>> significantly better performance. If there are no objections, I'd like to
>> commit that register increase for SM 6.x to master.
>> Running the same test on a Quadro M4000 (Maxwell) shows much smaller
>> differences, so I'd leave SM 5.x as is:
>> 48 registers (as is):
>> BMW: 4m38s
>> Classroom: 12m32s
>> Fishy Cat: 11m18s
>> Koro: 20m38s
>> Pavillion: 21m12s
>> 64 registers:
>> BMW: 4m38s
>> Classroom: 13m07s
>> Fishy Cat: 10m52s
>> Koro: 18m51s
>> Pavillion: 21m32s
>> Another note: 63 registers was a hard limit for SM 2.x hardware. Is 63
>> instead of 64 as register limit for kernels SM 3.x and higher just carried
>> over or is there a reason to not go to 64 registers?
>> -Stefan
>> PS: I'd love it if someone would sacrifice the time to run 48/64 register
>> comparison benchmarks on other Pascal hardware and/or on Linux.
>> _______________________________________________
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
>> https://lists.blender.org/mailman/listinfo/bf-cycles
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> https://lists.blender.org/mailman/listinfo/bf-cycles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.blender.org/pipermail/bf-cycles/attachments/20171115/2a40aa2d/attachment.html>

More information about the Bf-cycles mailing list