[Bf-committers] Blender OpenCL compositor

Aurel W. aurel.w at gmail.com
Thu Jan 20 09:11:06 CET 2011


I guess for renderfarms, we need at least CPU only support for the
next couple of years, not doing compositing directly when rendering
the shot could be annoying.

I have currently some concerns when I think of how opencl 'could' be
integrated into the compositor and what mistakes could be done. Tho
the overall concept of the compositor fits very well gpgpu, the
existing design, architecture and implementation doesn't. So I hope
the opencl code won't be too much bound to the current implementation.

Many of the performance problems of the current compositor is not due
to the fact that everything is done on the CPU, it's because some
nodes are ridiculously unefficiently implemented. If more performance
is the target this is also one main problem which has to be tackled.

On the other hand, it's still important to be able to integrate nodes
with CPU code with nodes, which are implemented with opencl. The main
issue here is, that this capability shouldn't cause much overhead,
when there are only opencl nodes. It is very important to really do
the entire evaluation of the compositing graph on gpu, and not to just
offload some computations. So buffers shouldn't be copied from main
ram to vram and vice versa all the time a node is executed. Also
previews, outputs, view nodes, should be converted to a framebuffer
object and displayed directly rather than copying to main ram again.
This would be really necessary to get full blown performance, but as I
said, without much overhead but still being able to integrate CPU
nodes and in this case do vram <-> ram.

So to some it up, if the target is to do reasonable fast compositing
for 4k footage in the future, a lot of things have to considered, not
just to implement some opencl kernels to reimplement current nodes.

aurel

On 20 January 2011 04:45, Ian Johnson <enjalot at gmail.com> wrote:
>> Hi Ian,
>
>
>> ----- Original Message -----
>
>> because it supports multiple device architectures, a code optimized
>
>> for the GPU won't run fast on the CPU.
>
>
>> I thought you could write kernels optimised for various architectures, and
>> choose the best one at run time. So each node could have one kernel for the
>> GPU and one for the CPU. But an OpenCL GPU kernel will at least run on the
>> CPU, even if it does so sub-optimally; and vice versa.
>
>
> Yes ideally one would write different kernels optimized for different
> architectures, and this is the goal of OpenCL. The main issue is when you
> have a hammer everything looks like a nail, so we must be careful not think
> OpenCL is a magic bullet, but rather a really nice tool for some situations.
> The most dramatic speedups will be had at first with highly data parallel
> algorithms which can be moved to the GPU, with slower but still accelerated
> CPU versions taking advantage of multiple cores.
>
>
>>
>> > Then there is the question of user's having the hardware to even run
>
>> it, necessitating a CPU only fall-back.
>
>
>> Do you mean two entirely separate codes? Or could we have one
>> implementation that uses OpenCL, but with a CPU kernel (also written in
>> OpenCL) to fall back on? That would seem ideal, since you can develop just
>> one kernel to start with, and add architecture-specific kernels at a later
>> time.
>
>
>> Is the problem that there are no free OpenCL libraries (e.g. for use
>> without a GPU)?
>
>
> I do mean two entirely separate codes, at least for a while. Keep in mind
> that just about everything is already implemented on the CPU, with some
> multiprocessor support from OpenMP. So if we want to accelerate some feature
> it doesn't make sense to just throw away the existing code, just switch to
> OpenCL if its available. This is especially true since OpenCL
> implementations are not yet ubiquitous (they are free, from NVIDIA, ATI,
> Intel and Apple to name a few) so we don't want to disadvantage any users
> who don't have it yet.
>
> In the future, if and when OpenCL is everywhere it would make sense to just
> code a CPU kernel and a GPU kernel to switch between (or whatever kind of
> kernel you make for a CPU+GPU chip like NVIDIA's project Denver, ATI's
> Fusion or Intel's Sandy Bridge). Until then we should provide a solid
> infrastructure for acceleration but not throw out the baby with the bath
> water.
>
>
>> Cheers,
>
> Alex
>
>
> --
> Ian Johnson
> http://enja.org
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>


More information about the Bf-committers mailing list