[Bf-cycles] Bottleneck of gpu render

doug65536 . doug65536 at gmail.com
Thu Apr 3 22:10:12 CEST 2014


Memory bandwidth and divergence *should *be the main factors limiting the
performance of the cycles GPU code. GPUs work by executing the same
instruction across multiple threads - all of the instructions in a warp
must be following the same code path. Whenever threads in the same warp
follow different paths, the GPU must repeat the instructions for each path,
suppressing commitment of results for the pipelines that are not on the
current path for each repeat.

Memory locality is another significant factor: when all of the threads in a
warp fetch consecutive values, the GPU can coalesce the loads into one big,
efficient load from memory. If threads in a warp are fetching from disjoint
memory locations, the GPU needs more cycles to gather all of the loaded
values using multiple memory accesses.

In summary, there will be moments when the GPU is massively parallel,
moments when some of the threads are doing work, and pathological moments
where few or even only one thread in a warp is taking a given code path.

One GPU thread is nowhere near as fast as one CPU thread. GPUs are clocked
a lot slower than CPUs, and CPUs are optimized for very low latencies. GPUs
sacrifice latency to get maximum bandwidth. GPUs rely on parallelism to
beat CPUs.

CUDA makes it appear that the CPU is doing a lot of work because it uses a
polling spin-loop to wait for completion of a kernel execution, to reduce
the latency of waiting for kernel completion as much as possible. It is
possible to configure CUDA not to do this (likely using an IRQ for
notification), and I tried it, CPU usage dropped a lot, but as expected,
kernel completion latency increased. I also spent a *lot* of time (at least
a week) experimenting with overlapping and asynchronous memory transfers,
and my finding was, memory copies to and from GPU are definitely not the
bottleneck.



On Sun, Mar 30, 2014 at 2:12 PM, GeKo <geko.pua at gmail.com> wrote:

> I was reading about the hierarchy of memory on Cuda and It has quiet a lot
> of level. Probably the ray casting is executed mainly on the main memory
> (some requierement of the algorithm), so the bottleneck could be on the
> memory access but It is only a supposition.
>
>
> 2014-03-29 19:37 GMT+01:00 Мукаев Виктор <vitos1g at gmail.com>:
>
> I heard that 6xx series each cuda core was 3 times less powerfull compare
>> to 5xx series cores. Though they were much better in terms of power
>> consuption, which was main feature of 6xx series(no links, just words, so
>> can't be sure)  So i think CUDA cores nowadays isn't right value for
>> measuring performance, ti's important, but not the main. We should stick to
>> total transistors count per GPU :)
>> Message: 5
>>
>>> Date: Fri, 28 Mar 2014 23:50:22 +0100
>>> From: GeKo <geko.pua at gmail.com>
>>> Subject: Re: [Bf-cycles] Bottleneck of gpu render
>>> To: bf-cycles at blender.org
>>> Message-ID:
>>>         <
>>> CAK8Fe3GDNztWkaQoJUGJEO4Kg3vH2cUDCbOYENqQ0Eq9ZDHZ6g at mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>>
>>> But i really mean about the way that cycles process de model. I dont have
>>> any knowledge about raytracing render, but i am shocked about how the new
>>> hardware know improve so much rhe rendering, even when they have 5 time
>>> more cores.
>>> On 28 Mar 2014 23:11, "Brecht Van Lommel" <brechtvanlommel at pandora.be>
>>> wrote:
>>>
>>> > There are various possibilities:
>>> >
>>> > * Memory and texture memory read/write bottlenecks
>>> > * Different Mhz for cores
>>> > * Different GPU architecture
>>> > * Not all cores occupied due to divergence or mismatched batch sizes
>>> > * CPU Overhead
>>> >
>>> > There's too many interacting parts to be sure without closely
>>> > investigating.
>>> >
>>> > Brecht.
>>> >
>>> > On Fri, Mar 28, 2014 at 9:30 PM, GeKo <geko.pua at gmail.com> wrote:
>>> > > Short question, Why the execution time don't decrease "linearly"
>>> > according
>>> > > to the number of cuda cores?
>>> > >
>>> > > _______________________________________________
>>> > > Bf-cycles mailing list
>>> > > Bf-cycles at blender.org
>>> > > http://lists.blender.org/mailman/listinfo/bf-cycles
>>> > >
>>> > _______________________________________________
>>> > Bf-cycles mailing list
>>> > Bf-cycles at blender.org
>>> > http://lists.blender.org/mailman/listinfo/bf-cycles
>>> >
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL:
>>> http://lists.blender.org/pipermail/bf-cycles/attachments/20140328/b1414405/attachment-0001.htm
>>>
>>
>>
>> _______________________________________________
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-cycles
>>
>>
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20140403/c5c5db3e/attachment.htm 


More information about the Bf-cycles mailing list