[Bf-committers] Blender OpenCL compositor

Thu Jan 20 04:45:20 CET 2011

> Hi Ian,

> ----- Original Message -----

> because it supports multiple device architectures, a code optimized

> for the GPU won't run fast on the CPU.

> I thought you could write kernels optimised for various architectures, and
> choose the best one at run time. So each node could have one kernel for the
> GPU and one for the CPU. But an OpenCL GPU kernel will at least run on the
> CPU, even if it does so sub-optimally; and vice versa.

Yes ideally one would write different kernels optimized for different
architectures, and this is the goal of OpenCL. The main issue is when you
have a hammer everything looks like a nail, so we must be careful not think
OpenCL is a magic bullet, but rather a really nice tool for some situations.
The most dramatic speedups will be had at first with highly data parallel
algorithms which can be moved to the GPU, with slower but still accelerated
CPU versions taking advantage of multiple cores.

>
> > Then there is the question of user's having the hardware to even run

> it, necessitating a CPU only fall-back.

> Do you mean two entirely separate codes? Or could we have one
> implementation that uses OpenCL, but with a CPU kernel (also written in
> OpenCL) to fall back on? That would seem ideal, since you can develop just
> one kernel to start with, and add architecture-specific kernels at a later
> time.

> Is the problem that there are no free OpenCL libraries (e.g. for use
> without a GPU)?

I do mean two entirely separate codes, at least for a while. Keep in mind
that just about everything is already implemented on the CPU, with some
multiprocessor support from OpenMP. So if we want to accelerate some feature
it doesn't make sense to just throw away the existing code, just switch to
OpenCL if its available. This is especially true since OpenCL
implementations are not yet ubiquitous (they are free, from NVIDIA, ATI,
Intel and Apple to name a few) so we don't want to disadvantage any users
who don't have it yet.

In the future, if and when OpenCL is everywhere it would make sense to just
code a CPU kernel and a GPU kernel to switch between (or whatever kind of
kernel you make for a CPU+GPU chip like NVIDIA's project Denver, ATI's
Fusion or Intel's Sandy Bridge). Until then we should provide a solid
infrastructure for acceleration but not throw out the baby with the bath
water.

> Cheers,

Alex

-- 
Ian Johnson
http://enja.org