[Bf-cycles] OpenCL and AMD GPUs

Martijn Berger martijn.berger at gmail.com
Mon Oct 27 22:43:06 CET 2014

Hi George,

First, welcome again and I must say I am very happy that you want to do

I assume your approach will be mostly in line with the split as suggested
in the "*Megakernels Considered Harmful*: Wavefront Path Tracing on GPUs"
paper. I am not sure how well this will map unaltered on our shading
approach but that is something that only trying it will actually answer.
I also want to draw you attention to our developer.blender.org site as it
allows you to publish and maintain your own set of changes in a so called
"differential". it is a very convenient way to publish a stack of patches
for code review.

best regards,

Martijn Berger

On Mon, Oct 27, 2014 at 9:34 PM, Kyriazis, George <George.Kyriazis at amd.com>

> Greetings bf-cycles,
> I work for AMD, and we have been thinking about working in the OpenCL
> kernel (read: have started working on it).  It is, I presume, a well-known
> fact that the OpenCL implementation has "issues" on AMD.
> Our current approach is to split up the OpenCL kernel into multiple
> (smaller) kernels, in order to get better utilization of the GPU.  I've had
> brief discussions with Martijn, Brecht and Ton, and they all seem eager to
> finally "fix" (for a lack of a better term)  OpenCL, which is a good sign.
> Technical details have not been discussed yet, but an open forum like
> bf-cycles is a better place for that.
> As a starter point of discussion, I'd like to comment about the main
> motivation of the kernel split.  As it is well known, the AMD OpenCL
> implementation has some problems compiling the current OpenCL kernel.  This
> has been mainly attributed to the length of the kernel, and problems with
> register allocation.  Although the above is correct, those causes fail to
> address the main issue, which is the fact that a huge kernel (like cycles)
> that is a straight-forward port of CPU code, does have a lot of code
> divergence.  Code divergence causes a lot of workitems go idle during
> kernel execution, which is not a good thing.
> Splitting the kernel allows for each (sub)-kernel to have better GPU
> utilization, and hence better performance.  As a side-effect, it decreases
> the size of each kernel, and makes things easier for the register
> allocator.  So, the current problems that the AMD OpenCL implementation has
> will not express themselves in a split kernel.  Our current thought is to
> have those individual kernels communicate via queues.
> Any questions / comments / etc. about our approach is welcome, of course.
> Appreciated,
> George Kyriazis
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20141027/416bd7dd/attachment.htm 

More information about the Bf-cycles mailing list