[Bf-committers] Proposal: Blender OpenCL compositor

GSR gsr.b3d at infernal-iceberg.com
Sun Jan 23 06:43:28 CET 2011


Hi,
j.bakker at atmind.nl (2011-01-22 at 0952.44 +0100):
> image. The highest/lowest value is calculated once (not parallelized) 
> the pixel processor is parallelized.

Not very good example ;] as this searching problem is near as much
parallelizable as the pixel processor would be. Split the work into N
workers, every one gets total_pixels/N (or tiles or whatever), looking
for the local max and min. Then scan the N maxes to get the final max,
and the same with the N mins. Even if you have a system with 1024
"workers", that is only an extra non parallel pass of 1024 checks
(assuming you do not parallelize it again, having four workers doing
256 each and finally compare four results, for example).

So the questions if you want to process in pixel stacks (what is the
final result for X,Y pixel before X+1,Y is known) or buffers (work in
one set of tiles and never look at them except if something down the
node tree changes). If you want the final full image, you will do the
full work in both cases anyway. Exceptions aside, you probably want
buffers approach (with tiled internal organization, that is fine),
because that way the code cache gets lots and lots of hits, and data
one probably too. The other way you are trashing all caches.

GSR
 


More information about the Bf-committers mailing list