[Bf-committers] Blender and OpenCL

Wed Aug 25 13:36:42 CEST 2010

On 08/25/2010 01:07 PM, Arturo José Pérez Verdú wrote:
> I'm also playing with OpenCL and I think is a good thing to start exploring.
>
> I started playing with it recently and I consider that my knowledge on this area is still small but I will follow your discussion very closely.
>
> I also want to make some observations to what has already been said. Most of then made from my small knowledge side of OpenCL so don't be afraid to correct me.
>
> When you are talking about the fallback to normal CPU execution you are talking about an optimized CPU execution? I mean, as I understand OpenCL, there's always at least one OpenCL capable device, the CPU. So the same code that was written for the GPU version will run (maybe not optimized for that) even if the user doesn't have a GPU with OpenCL features. So when you talk of fallback to CPU you mean to provide special code for that instead of letting the GPU prepared kernels execute in the CPU?
>    
CPU fallback IMO is always CPU optimized code. The reason I come to this 
conclusion is that OpenCL is not always available. (meaning the machine 
has no OpenCL installed). Even some OpenCL implementations have no CPU 
fallback. and optimizing for CPU is different as you perform the 
calculation serial:

for example (this is not a real example, but gives the idea about the 
differences)
in a formula where E(n) = E(n-1) * n
a GPU is faster with the next implementation (independent calculation)
result = 0
for (int i = 1; i < n ; i++) result = result * n;
E[n] = result;

and a CPU is faster as you know for certain that the previous answer is 
already present.
if (n ==1) E[n] = 1;
else E[n]=E[n-1]*n;

Also small amount of work is always faster on CPU. Starting OpenCL 
kernel tasks take some time (initializing, memory copy, etc) before real 
speed-ups can be realized.

> I don't know if I like this approach. The maintenance price will raise and we even don't know if the GPU prepared kernels will be slower or not in the CPU, think that we will have an OpenCL multithreading code that will benefit all the compute units available in the CPU. Or if we start in the OpenCL world maybe the OpenCL context was created grouping more than one CPU, or a CPU + a small limited GPU.
>    
I agree about the maintenance price. But have not found any good 
solution what is suitable for both worlds.
> The Idea that seduces me more is to have standard OpenCL code that will ask for a desired configuration based on memory needs, image support or not for the OpenCL device, etc... and some kind of selector functions or context builder functions will try to build the best context for the calculation that is going to be performed. In this way in some machines one will end up with a context of GPU, CPU + GPU, GPU1 + GPU2, CPU, etc... depending on the hardware. If the "ideal" context can be build... perfect, if the "ideal" context can not be build... at least the code will work and will behave in the same way. Our work will be to provide a unique way of providing work to our "OpenCL calculus server" and efficiently try to deliver the work through all the created queues of the context. Different parts of Blender maybe desire different ideal context, so the context builder functions can change the OpenCL context and the way the queues are managed. If this idea is not quite clear and you think it could be interesting we can discuss it in future emails. If you think all this is in the wrong way please state it :-).
>
>    
This depends on the function you are doing. I think it should be the 
responsibility of the caller. only the caller can deside what the best 
option is. based on the dataset, the framework can give some advice on 
the possible scenarios, but the caller should always decide.

I will take this into account! Thanks for your remarks

Jeroen