[Bf-committers] Blender and OpenCL

Wed Sep 1 22:25:59 CEST 2010

Hi,

I have created an OpenCL implementation of the bokeh blur. I got a 
speedup (2 times faster) on my old hardware, but also some stability 
issues. I think that some issues concern my old NVidia + OpenCL drivers. 
But I would really like to know how other hardware setups will do.

I get a random out of resource issue, that I cannot influence by code 
and got a UI freeze for long calculations.

the current patch can be found on 
http://sicg.atmind.nl/media/patches/patch-opencl-bokeh.txt

Regards,
Jeroen

On 08/29/2010 01:10 PM, Vilem Novak wrote:
> Hello, maybe focusing on performance - heavier nodes would make sense?
> Rather performance heavy in my experience can be quality blurs(especially defocus), UV remap,
> bilateral blur.
> With these the advantages would be visible even with the bus problems.
> With regards,
> Vilem Novak
>
>    
>> ------------ Původní zpráva ------------
>> Od: Jeroen Bakker<j.bakker at atmind.nl>
>> Předmět: Re: [Bf-committers] Blender and OpenCL
>> Datum: 29.8.2010 11:19:15
>> ----------------------------------------
>> Hi Lukas,
>>
>> Your explanation is a good one. Didn't come up to write it down that way.
>> The issue with memory during compositing is the way the nodes-editor
>> works. When changing a node-value (like degree) only the rotate-node and
>> all dependent nodes are re-calculated. The input-image is not
>> re-calculated it is still in memory. This is a good optimization during
>> editing time you only need to reevaluate a part of the node-system, but
>> in complex node-systems I think this will not work for OpenCL due to the
>> needed memory.
>>
>> I am looking for a situation what is good during editing (decrease the
>> feedback-time to the end-user) and rendering (overall performance of the
>> system). But haven't found a good solution.
>>
>> At the moment I am evaluating 2 things:
>> a. per viewer and compositor node a opencl kernel/program will be
>> generated and executed.
>> b. per node a program and kernel is created. and evaluation is done as
>> the current situation.
>>
>> A question back. Have you seen any speed-up? My system (three years old
>> dual core 2 at 2000Mhz laptop with 16 at 400Mhz nvidia cores and a bus of
>> 800Mhz) was not able to see big differences. I think that a desktop
>> system with a faster Bus and more and powerful gpu cores would get much
>> better performance.
>>
>> Regards,
>> Jeroen
>>
>> On 08/28/2010 09:40 PM, Lukas Tönne wrote:
>>      
>>> I have tried out your patch, nice work :)
>>>
>>> Here are some more thoughts on how to process data in the node tree. I
>>> hope i'm not getting too verbose or tell you guys obvious stuff ;)
>>>
>>> Basically when talking about data in the tree i see two different
>>> types of dependency:
>>> 1. Inter-node dependency ("vertical"):
>>> A node can only be executed (be it for a single pixel or the whole
>>> image) when all it's inputs are done. This dependency _always_ exists
>>> in node trees to a certain degree.
>>> 2. Inter-element dependency ("horizontal"):
>>> An element (pixel, sample, particle, vertex, etc.) depends on the
>>> state of other elements (neighbouring pixels, particles in a certain
>>> radius, connected vertices).
>>>
>>> Vertical dependency does not depend on the tree type, but only on the
>>> connectivity of the nodes (complexity of the tree). Here's a made-up
>>> example with strong connectivity in the middle part:
>>> http://www.pasteall.org/pic/5405
>>>
>>> Horizontal (inter-element dependency) on the other hand chiefly
>>> depends on the type of tree you're looking at:
>>> * Shader- and texture trees have _no_ horizontal dependency at all,
>>> the color of a material or texture sample does not depend on other
>>> samples. This is why shader trees can be evaluated per sample and do
>>> not need to store large amounts of data.
>>> * Compositor tree are the other extreme: while some nodes, such as
>>> Mix, operate per-pixel, others like Blur and Defocus heavily depend on
>>> neighbouring or even _all_ other pixels of the input images
>>> respectively.
>>> * Particles are not as extreme as compo trees (less neighbours to take
>>> into account), but they lack the inherent ordering of image pixels and
>>> need kd trees for finding neighbours.
>>>
>>> One relatively simple thing one could probably do to decrease memory
>>> usage is removing data that is not needed any more (I am not sure if
>>> the current compositors do something like this already, if so, just
>>> skip this section). As soon as all nodes, which use a certain socket
>>> for input, have been processed, that sockets data can be freed from
>>> memory. This of course only works as long as connectivity is
>>> relatively low and node relations are "local". In the example above
>>> the result of the Blur node would have to be kept in memory until all
>>> the mix nodes are finished, whereas the initial renderlayer node could
>>> free its buffer right after Blur is done. It might even be an option
>>> to bite the bullet, if memory usage gets dangerously high, and discard
>>> intermediate results used very late in the tree and recalculate them
>>> later.
>>>
>>> Another improvement i currently use in the simulation trees is
>>> splitting the large data blocks into smaller parts ("batches"). This
>>> has the advantage of making better use of available processing power,
>>> especially when some nodes need significantly more time than others.
>>> In the compositor nodes one thread processes the full image for one
>>> node at a time, which can lead to threads idly waiting for the result
>>> of one other (iirc Brecht recently coded internal multithreading for
>>> the especially heavy Defocus node though). At the same time by staying
>>> with one node for a range of elements instead of processing them
>>> one-by-one avoids the overhead of switching between nodes. Afaik this
>>> is basically the same concept as OpenCLs "work groups", have to read
>>> up on that again.
>>>
>>> Cheers
>>> Lukas
>>>
>>> On Tue, Aug 24, 2010 at 7:18 PM, Jeroen Bakker<j.bakker at atmind.nl>   wrote:
>>>
>>>        
>>>> Hi all
>>>>
>>>> I have been experimenting with OpenCL and are planning a basic framework
>>>> to support it in Blender.
>>>>
>>>> main features are:
>>>>    * OpenCL is disabled by default, CPU fall-back must ALWAYS be
>>>> available. OpenCL can be enabled with command-line parameter
>>>>    * Compiler directive to completely disable OpenCL in Blender.
>>>>    * Basic implementation to access and use GPU-devices
>>>>    * I am not targeting the blender-render, but other time-consuming
>>>> processes (fluids, node systems etc)
>>>>
>>>> I think this matches the basic blender principles:
>>>>    * can work on standard home PC's
>>>>    * blender installation is unzipping an zip
>>>>
>>>> Are other people also busy with this subject?
>>>>
>>>> Best regards,
>>>> Jeroen
>>>>
>>>> http://wiki.blender.org/index.php/User_talk:Jbakker
>>>> _______________________________________________
>>>> Bf-committers mailing list
>>>> Bf-committers at blender.org
>>>> http://lists.blender.org/mailman/listinfo/bf-committers
>>>>
>>>>
>>>>          
>>> _______________________________________________
>>> Bf-committers mailing list
>>> Bf-committers at blender.org
>>> http://lists.blender.org/mailman/listinfo/bf-committers
>>>
>>>
>>>        
>> _______________________________________________
>> Bf-committers mailing list
>> Bf-committers at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-committers
>>
>>
>>
>>      
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers