[Bf-committers] Blender and OpenCL

Sun Aug 29 11:12:18 CEST 2010

Hi Lukas,

Your explanation is a good one. Didn't come up to write it down that way.
The issue with memory during compositing is the way the nodes-editor 
works. When changing a node-value (like degree) only the rotate-node and 
all dependent nodes are re-calculated. The input-image is not 
re-calculated it is still in memory. This is a good optimization during 
editing time you only need to reevaluate a part of the node-system, but 
in complex node-systems I think this will not work for OpenCL due to the 
needed memory.

I am looking for a situation what is good during editing (decrease the 
feedback-time to the end-user) and rendering (overall performance of the 
system). But haven't found a good solution.

At the moment I am evaluating 2 things:
a. per viewer and compositor node a opencl kernel/program will be 
generated and executed.
b. per node a program and kernel is created. and evaluation is done as 
the current situation.

A question back. Have you seen any speed-up? My system (three years old 
dual core 2 at 2000Mhz laptop with 16 at 400Mhz nvidia cores and a bus of 
800Mhz) was not able to see big differences. I think that a desktop 
system with a faster Bus and more and powerful gpu cores would get much 
better performance.

Regards,
Jeroen

On 08/28/2010 09:40 PM, Lukas Tönne wrote:
> I have tried out your patch, nice work :)
>
> Here are some more thoughts on how to process data in the node tree. I
> hope i'm not getting too verbose or tell you guys obvious stuff ;)
>
> Basically when talking about data in the tree i see two different
> types of dependency:
> 1. Inter-node dependency ("vertical"):
> A node can only be executed (be it for a single pixel or the whole
> image) when all it's inputs are done. This dependency _always_ exists
> in node trees to a certain degree.
> 2. Inter-element dependency ("horizontal"):
> An element (pixel, sample, particle, vertex, etc.) depends on the
> state of other elements (neighbouring pixels, particles in a certain
> radius, connected vertices).
>
> Vertical dependency does not depend on the tree type, but only on the
> connectivity of the nodes (complexity of the tree). Here's a made-up
> example with strong connectivity in the middle part:
> http://www.pasteall.org/pic/5405
>
> Horizontal (inter-element dependency) on the other hand chiefly
> depends on the type of tree you're looking at:
> * Shader- and texture trees have _no_ horizontal dependency at all,
> the color of a material or texture sample does not depend on other
> samples. This is why shader trees can be evaluated per sample and do
> not need to store large amounts of data.
> * Compositor tree are the other extreme: while some nodes, such as
> Mix, operate per-pixel, others like Blur and Defocus heavily depend on
> neighbouring or even _all_ other pixels of the input images
> respectively.
> * Particles are not as extreme as compo trees (less neighbours to take
> into account), but they lack the inherent ordering of image pixels and
> need kd trees for finding neighbours.
>
> One relatively simple thing one could probably do to decrease memory
> usage is removing data that is not needed any more (I am not sure if
> the current compositors do something like this already, if so, just
> skip this section). As soon as all nodes, which use a certain socket
> for input, have been processed, that sockets data can be freed from
> memory. This of course only works as long as connectivity is
> relatively low and node relations are "local". In the example above
> the result of the Blur node would have to be kept in memory until all
> the mix nodes are finished, whereas the initial renderlayer node could
> free its buffer right after Blur is done. It might even be an option
> to bite the bullet, if memory usage gets dangerously high, and discard
> intermediate results used very late in the tree and recalculate them
> later.
>
> Another improvement i currently use in the simulation trees is
> splitting the large data blocks into smaller parts ("batches"). This
> has the advantage of making better use of available processing power,
> especially when some nodes need significantly more time than others.
> In the compositor nodes one thread processes the full image for one
> node at a time, which can lead to threads idly waiting for the result
> of one other (iirc Brecht recently coded internal multithreading for
> the especially heavy Defocus node though). At the same time by staying
> with one node for a range of elements instead of processing them
> one-by-one avoids the overhead of switching between nodes. Afaik this
> is basically the same concept as OpenCLs "work groups", have to read
> up on that again.
>
> Cheers
> Lukas
>
> On Tue, Aug 24, 2010 at 7:18 PM, Jeroen Bakker<j.bakker at atmind.nl>  wrote:
>    
>> Hi all
>>
>> I have been experimenting with OpenCL and are planning a basic framework
>> to support it in Blender.
>>
>> main features are:
>>   * OpenCL is disabled by default, CPU fall-back must ALWAYS be
>> available. OpenCL can be enabled with command-line parameter
>>   * Compiler directive to completely disable OpenCL in Blender.
>>   * Basic implementation to access and use GPU-devices
>>   * I am not targeting the blender-render, but other time-consuming
>> processes (fluids, node systems etc)
>>
>> I think this matches the basic blender principles:
>>   * can work on standard home PC's
>>   * blender installation is unzipping an zip
>>
>> Are other people also busy with this subject?
>>
>> Best regards,
>> Jeroen
>>
>> http://wiki.blender.org/index.php/User_talk:Jbakker
>> _______________________________________________
>> Bf-committers mailing list
>> Bf-committers at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-committers
>>
>>      
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>
>