[Bf-cycles] How to mitigate Cycles final render buffer split tiles feature?

Thu Sep 12 23:14:54 CEST 2013

Please, provide ability to write pixel color in arbitrary pixel from
kernel, even outside of current tile and maybe not even allocated.

Just fixed last nasty bug that was in MIS in bidirectional integrator,
was turned off MIS for first direct and light pair as path started from
light. Path was tagged as PATH_RAY_MIS_SKIP. Now noise cleaning speed is
skyrocket (in comparison with old patch code, not trunk Cycles). It make
image so clean (very close area near light in fog) so few last days i
was in impression that it just bugged and one of contribution become
zero. But many later testing show it correct and converge to same
image. 

Anyway, i now have render that can render many scenes slightly faster
than trunk Cycles in indoor scenes (nothing new for ppl who use
Luxrender in pure bidir mode w/o mlt, it almost same), and going to
refactor it for more readable state, hunt for fireflies (there are
plenty of, i think it in cases where MIS weights go close to 0 and +inf,
going to set debug traps and track very bright pixels history). Many
unimplemented features need work, as asymmetric BSDF (smooth
interpolated normals, bump mapping go mad w/o it, DOF for paths started
from ligth, visibility flags, etc), make binary and call for wider
testing.

One of interesting bidir feature that responsible for caustics and fast
noise cleaning near light in fog, i call it "direct light tracing",
maybe wrong named, it work as photon tracer, every light bounce trying
to connect to camera and measure radiance. Unfortunately, it is reason
that final render will segfault. You can turn off and make safe render,
but lost maybe most powerful bidir feature, that is almost no point use
bidir in that case.

In short, i need to write to any pixel from inner kernel loop, maybe
from GPU, that have no access to main memory. But Cycles use tiles, and
allocate them on demand, and write to output on demand, so in case of
final render we have no one big picture in memory. Preview window
allocated as one, so it safe, and i use it and recommend same for any
who try that patch.

One of possible solutions maybe to allocate some list of delayed pixels,
(x,y, color) as one extra kernel parameter, and process them after
kernel exit. But it is lame, it will take huge memory, in theory we can
have max width*height*bounces samples. So instead of save memory using
tiles we will increase that at least multiply by bounce.

I think that best solution will be single allocated image in main host,
and from OpenCL POV, pass pointer to it as (shared memory between host
and GPU? not sure how it called exactly, recent GPU with united VM
suppport can do it). I am sure that even slow PCIE bus is enough to
serve that, as pixel producing rate will be low, at least for low cost
GPU with small number of threads.

But looking at Cycles session code and tile manager, i simple get lost,
it looks designed around idea to save memory using tiles and i not found
simple way to force it to keep all in one chunk. Can set tile size very
high, but then we get single threaded render.

Brecht, or maybe someone else who know Core Bender and Cycles callbacks
relation, can you please reconsider that tiling things and make a option
for united buffer? I am sure that many other external renderers will
need same if they want to make interactive preview during final render.
Maybe not "proper", GPU friendly, but for CPU only case as first, just
to stop that crashes when someone hit F12.

Fixed patch in attachment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blender_cycles_volume_mlt_spectral_60072.patch.bz2
Type: application/x-bzip
Size: 75873 bytes
Desc: not available
Url : http://lists.blender.org/pipermail/bf-cycles/attachments/20130913/7297d7c2/attachment.bin