[Bf-cycles] OpenCL and AMD GPUs
martijn.berger at gmail.com
Mon Oct 27 22:43:06 CET 2014
First, welcome again and I must say I am very happy that you want to do
I assume your approach will be mostly in line with the split as suggested
in the "*Megakernels Considered Harmful*: Wavefront Path Tracing on GPUs"
paper. I am not sure how well this will map unaltered on our shading
approach but that is something that only trying it will actually answer.
I also want to draw you attention to our developer.blender.org site as it
allows you to publish and maintain your own set of changes in a so called
"differential". it is a very convenient way to publish a stack of patches
for code review.
On Mon, Oct 27, 2014 at 9:34 PM, Kyriazis, George <George.Kyriazis at amd.com>
> Greetings bf-cycles,
> I work for AMD, and we have been thinking about working in the OpenCL
> kernel (read: have started working on it). It is, I presume, a well-known
> fact that the OpenCL implementation has "issues" on AMD.
> Our current approach is to split up the OpenCL kernel into multiple
> (smaller) kernels, in order to get better utilization of the GPU. I've had
> brief discussions with Martijn, Brecht and Ton, and they all seem eager to
> finally "fix" (for a lack of a better term) OpenCL, which is a good sign.
> Technical details have not been discussed yet, but an open forum like
> bf-cycles is a better place for that.
> As a starter point of discussion, I'd like to comment about the main
> motivation of the kernel split. As it is well known, the AMD OpenCL
> implementation has some problems compiling the current OpenCL kernel. This
> has been mainly attributed to the length of the kernel, and problems with
> register allocation. Although the above is correct, those causes fail to
> address the main issue, which is the fact that a huge kernel (like cycles)
> that is a straight-forward port of CPU code, does have a lot of code
> divergence. Code divergence causes a lot of workitems go idle during
> kernel execution, which is not a good thing.
> Splitting the kernel allows for each (sub)-kernel to have better GPU
> utilization, and hence better performance. As a side-effect, it decreases
> the size of each kernel, and makes things easier for the register
> allocator. So, the current problems that the AMD OpenCL implementation has
> will not express themselves in a split kernel. Our current thought is to
> have those individual kernels communicate via queues.
> Any questions / comments / etc. about our approach is welcome, of course.
> George Kyriazis
> Bf-cycles mailing list
> Bf-cycles at blender.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Bf-cycles