[Bf-committers] Compositor speedup proposal [1/2]

Mon Feb 4 09:42:36 CET 2013

Hi,

there was an error in the design I proposed. Adding buffers by the 
system did not work good for a series of distort nodes.
Distort nodes in series work as they aren't buffered. We implemented a 
fix that the output of group-nodes will be buffered.
This way the user can control by grouping nodes that belong together. 
This means that speed increase only works for users that organize their 
tree in groups (hint :) )

As this was a fix for several bugs we already committed it last night. 
Please users test this!

Regards,
Jeroen

On 01/30/2013 12:08 AM, David wrote:
> Hi,
>
> thanks for taking a look at this. If I understand the problem correctly
> the slowdown is mostly due to simple nodes not caching at all, that
> means that for a fanout>1 on a socket the node has to be calculated
> at least that many times. For simple nodes that is usually not a problem
> until you string a lot of them together with multiple fanouts inbetween,
> then it very quickly adds up. I would say breaking these execution groups
> into pieces is likely to be most successful at points where nodes have
> a large fanout; this should reduce the number of calculations significantly.
> The longer I think about it the more I feel there is an interesting graph
> theory problem/algorithm here that might help...
>
> (Maybe this is all very obvious to you already, in which case
> disregard my input and do what you think is best ;)
>
> till then, David.
>
>
> On Jan 29, 2013, at 12:08 PM, Jeroen Bakker wrote:
>> This is a proposal to solve speed-issues of the compositor. It should
>> not be considered as the final solution, but should help the most common
>> issues.
>>
>> Problem statement.
>> The compositor works best when having a good mixture of simple and
>> complex nodes. If you have a lot of simple nodes the system is not able
>> to find a good balance when converting to execution groups (subprogram
>> that will be scheduled to a core of the CPU). It results in a few
>> execution groups with many simple operations and a small number of
>> buffers that store intermediate results. This slows down the system a
>> lot
>> [http://projects.blender.org/tracker/?func=detail&aid=33785&group_id=9&atid=498].
>> A workaround for this slowdown was to add a complex node (that doesn't
>> do anything, like blur 0) in the setup.
>>
>> First test shows that good result depends on the node tree setup and the
>> available memory of the system. We propose to split up execution groups
>> into smaller ones if they get too big. The split up will depend on two
>> variables:
>> 1. amount of memory in the system (not free memory)
>> 2. number of operations in an execution group
>>
>> As this mechanism does a lot of guesses, the user should be able to
>> manually control the number of cuts.
>>
>> During tests we saw the next results
>> Used file: file attached to issue #33785
>> Used system: Intel(R) Core(TM) i5 CPU M 580 @ 2.67GHz, with 8GB of
>> memory, ubuntu 12.04 64 bit:
>>   - Baseline (no changes to code): 861MB, 47.49 seconds
>>   - Limit execution group size to 10: 3424MB, 7.267 seconds
>>   - Limit execution group size to 15: 3289MB, 7.607 seconds
>>   - Limit execution group size to 20: 2884MB, 9.393 seconds
>>   - Limit execution group size to 25: 2884MB, 11.987 seconds
>>
>> Best regards,
>> Jeroen & Monique
>>   - At Mind -
>> _______________________________________________
>> Bf-committers mailing list
>> Bf-committers at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-committers
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>