[Bf-cycles] OpenCL and AMD GPUs

Tue Oct 28 02:44:49 CET 2014

I think it's great to see AMD getting involved like this.

Splitting the kernel is certainly a challenge, the Cycles megakernel
is quite complicated, and to be honest it's been pushed further than
it should be. We've had to do lots of workarounds and tweaks so that
it runs on NVidia, and it just barely works.

If the kernel splitting works out, that's likely to improve Cycles GPU
rendering for everyone, not just for AMD users, but also to better
support features like SSS or volumetrics, and reduce problems with
display responsiveness. So this work is very welcome.

Brecht.

On Tue, Oct 28, 2014 at 1:50 AM, Thomas Dinges <blender at dingto.org> wrote:
> Storm, that mail is totally inappropriate.
>
> AMD is offering their help, and whatever you might think about it, you should treat people with respect. Any other behavior is not welcome here and will be stopped.
>
> Thomas
>
> Am 28.10.2014 um 01:34 schrieb storm <kartochka22 at yandex.ru>:
>
>> В Пн, 27/10/2014 в 20:34 +0000, Kyriazis, George пишет:
>>> Greetings bf-cycles,
>>>
>>> I work for AMD, and we have been thinking about working in the OpenCL kernel (read: have started working on it).  It is, I presume, a well-known fact that the OpenCL implementation has "issues" on AMD.
>>>
>>> Our current approach is to split up the OpenCL kernel into multiple (smaller) kernels, in order to get better utilization of the GPU.  I've had brief discussions with Martijn, Brecht and Ton, and they all seem eager to finally "fix" (for a lack of a better term)  OpenCL, which is a good sign.
>>>
>>> Technical details have not been discussed yet, but an open forum like bf-cycles is a better place for that.
>>>
>>> As a starter point of discussion, I'd like to comment about the main motivation of the kernel split.  As it is well known, the AMD OpenCL implementation has some problems compiling the current OpenCL kernel.  This has been mainly attributed to the length of the kernel, and problems with register allocation.  Although the above is correct, those causes fail to address the main issue, which is the fact that a huge kernel (like cycles) that is a straight-forward port of CPU code, does have a lot of code divergence.  Code divergence causes a lot of workitems go idle during kernel execution, which is not a good thing.
>>>
>>> Splitting the kernel allows for each (sub)-kernel to have better GPU utilization, and hence better performance.  As a side-effect, it decreases the size of each kernel, and makes things easier for the register allocator.  So, the current problems that the AMD OpenCL implementation has will not express themselves in a split kernel.  Our current thought is to have those individual kernels communicate via queues.
>>>
>>> Any questions / comments / etc. about our approach is welcome, of course.
>>>
>>> Appreciated,
>>>
>>> George Kyriazis
>>>
>>> _______________________________________________
>>> Bf-cycles mailing list
>>> Bf-cycles at blender.org
>>> http://lists.blender.org/mailman/listinfo/bf-cycles
>>
>> Enough is enough.
>>
>> Some maybe believe in that 3+year BS, but not I am.
>>
>> Compiler related theory was not discovered 5 years ago, but you and all
>> other ATI/AMD predent that it is (again). You even have group that
>> mastring it in some sense, making Open64 compiler fork.
>>
>> Your company very well avare if all issues, especially dirty internal
>> workarounds because of hardware design complexity, deadlines, marketing
>> department pressure, etc.
>>
>> I am very doubt any "lawyer talk" sentencens as above after your make
>> not Turing complete hardware that can run only exterme limited size
>> program due your make (intentionally, now it is obvious), but same time
>> _claim that it is full complete and respect standards_. It was not, and
>> you very vell aware and still say opposite.
>>
>> I believe that too many years, keep trying tossing code lines, as 1970
>> kid was optimise first BASIC program trying shorten variable names,
>> exchange "case" by "if" chains, manually unroll floatX, damn i cannot
>> remember all that absurd shit i was trying _YEARS_ to make your half
>> assed compiler pass w/o segfault. I even not talk about execution time
>> here.
>>
>> ALl that years NVidia ppl just laughing in face.
>>
>> I wonder what happens if you fix at leaat 2-3 curent big bugs in you own
>> OpenCL compiler and hit another bug (undiscovered yet now because we
>> even cannot reach that corner case), and it will hit again in short
>> splitted slow (because of obviousextra  PCI transfer hit), that mantra
>> you will use again? Not "Split kernel", but ... "you must make you every
>> line contain Fibonacci number of utf8 chars" or maybe "you must
>> interleave every conditional move with fmad instruction, especially in
>> full moon phase"?
>>
>> Your company too greedy, pissing on every very loyal customer, even such
>> as I am, and you deserve to lose.
>>
>> You should climb Mount Fujiyama once in your life. Climb it twice and
>> you're a fool. (Japanese Proverb)
>>
>> _______________________________________________
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-cycles
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles