[Bf-cycles] OpenCL and AMD GPUs

Mon Oct 27 23:26:04 CET 2014

Hi Martijn,

yes, it's similar to the approach described in that paper.  I agree that trying something first will show us what potential exists.

Thanks for the pointer on developer.blender.org<http://developer.blender.org>.  We'll take a look.

George

On Oct 27, 2014, at 4:43 PM, Martijn Berger wrote:

Hi George,

First, welcome again and I must say I am very happy that you want to do this.

I assume your approach will be mostly in line with the split as suggested in the "Megakernels Considered Harmful: Wavefront Path Tracing on GPUs" paper. I am not sure how well this will map unaltered on our shading approach but that is something that only trying it will actually answer.
I also want to draw you attention to our developer.blender.org<http://developer.blender.org/> site as it allows you to publish and maintain your own set of changes in a so called "differential". it is a very convenient way to publish a stack of patches for code review.

best regards,

Martijn Berger

On Mon, Oct 27, 2014 at 9:34 PM, Kyriazis, George <George.Kyriazis at amd.com<mailto:George.Kyriazis at amd.com>> wrote:
Greetings bf-cycles,

I work for AMD, and we have been thinking about working in the OpenCL kernel (read: have started working on it).  It is, I presume, a well-known fact that the OpenCL implementation has "issues" on AMD.

Our current approach is to split up the OpenCL kernel into multiple (smaller) kernels, in order to get better utilization of the GPU.  I've had brief discussions with Martijn, Brecht and Ton, and they all seem eager to finally "fix" (for a lack of a better term)  OpenCL, which is a good sign.

Technical details have not been discussed yet, but an open forum like bf-cycles is a better place for that.

As a starter point of discussion, I'd like to comment about the main motivation of the kernel split.  As it is well known, the AMD OpenCL implementation has some problems compiling the current OpenCL kernel.  This has been mainly attributed to the length of the kernel, and problems with register allocation.  Although the above is correct, those causes fail to address the main issue, which is the fact that a huge kernel (like cycles) that is a straight-forward port of CPU code, does have a lot of code divergence.  Code divergence causes a lot of workitems go idle during kernel execution, which is not a good thing.

Splitting the kernel allows for each (sub)-kernel to have better GPU utilization, and hence better performance.  As a side-effect, it decreases the size of each kernel, and makes things easier for the register allocator.  So, the current problems that the AMD OpenCL implementation has will not express themselves in a split kernel.  Our current thought is to have those individual kernels communicate via queues.

Any questions / comments / etc. about our approach is welcome, of course.

Appreciated,

George Kyriazis

_______________________________________________
Bf-cycles mailing list
Bf-cycles at blender.org<mailto:Bf-cycles at blender.org>
http://lists.blender.org/mailman/listinfo/bf-cycles

_______________________________________________
Bf-cycles mailing list
Bf-cycles at blender.org<mailto:Bf-cycles at blender.org>
http://lists.blender.org/mailman/listinfo/bf-cycles

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20141027/aa4d0b70/attachment.htm