[Bf-cycles] How to mitigate Cycles final render buffer split tiles feature?

Sat Sep 14 14:38:25 CEST 2013

Hi,

It's certainly possible to implement something like this, when you do
a viewport render it will also use a single big buffer but still
render in tiles.

I attached a patch that quickly hacks this in, it needs to have
progressive refine enabled to work though, and only works for the CPU.
Also in principle, if threads are going to work on the same buffer
they need to use atomic operations to add floats together. You
wouldn't notice it much in practice if this is not thread safe,
because you wouldn't end up at the same pixels often.

For GPU (or network render), I guess we'd need to merge together the
render buffers. Accessing the render buffer over PCIE would be slow
because you'd have to read the value, add, and write it back, and PCIE
has a long latency. I guess it could stream samples to the CPU
somehow, either way some function needs to merge it all together. The
choice between using a list of samples or a single buffer is then
mostly a choice of how you store things in memory to submit to some
function, one is more memory efficient for large resolution renders,
and the other is easier to implement.

Brecht.

On Thu, Sep 12, 2013 at 11:14 PM, storm <kartochka22 at yandex.ru> wrote:
> Please, provide ability to write pixel color in arbitrary pixel from
> kernel, even outside of current tile and maybe not even allocated.
>
>
> Just fixed last nasty bug that was in MIS in bidirectional integrator,
> was turned off MIS for first direct and light pair as path started from
> light. Path was tagged as PATH_RAY_MIS_SKIP. Now noise cleaning speed is
> skyrocket (in comparison with old patch code, not trunk Cycles). It make
> image so clean (very close area near light in fog) so few last days i
> was in impression that it just bugged and one of contribution become
> zero. But many later testing show it correct and converge to same
> image.
>
> Anyway, i now have render that can render many scenes slightly faster
> than trunk Cycles in indoor scenes (nothing new for ppl who use
> Luxrender in pure bidir mode w/o mlt, it almost same), and going to
> refactor it for more readable state, hunt for fireflies (there are
> plenty of, i think it in cases where MIS weights go close to 0 and +inf,
> going to set debug traps and track very bright pixels history). Many
> unimplemented features need work, as asymmetric BSDF (smooth
> interpolated normals, bump mapping go mad w/o it, DOF for paths started
> from ligth, visibility flags, etc), make binary and call for wider
> testing.
>
> One of interesting bidir feature that responsible for caustics and fast
> noise cleaning near light in fog, i call it "direct light tracing",
> maybe wrong named, it work as photon tracer, every light bounce trying
> to connect to camera and measure radiance. Unfortunately, it is reason
> that final render will segfault. You can turn off and make safe render,
> but lost maybe most powerful bidir feature, that is almost no point use
> bidir in that case.
>
> In short, i need to write to any pixel from inner kernel loop, maybe
> from GPU, that have no access to main memory. But Cycles use tiles, and
> allocate them on demand, and write to output on demand, so in case of
> final render we have no one big picture in memory. Preview window
> allocated as one, so it safe, and i use it and recommend same for any
> who try that patch.
>
> One of possible solutions maybe to allocate some list of delayed pixels,
> (x,y, color) as one extra kernel parameter, and process them after
> kernel exit. But it is lame, it will take huge memory, in theory we can
> have max width*height*bounces samples. So instead of save memory using
> tiles we will increase that at least multiply by bounce.
>
> I think that best solution will be single allocated image in main host,
> and from OpenCL POV, pass pointer to it as (shared memory between host
> and GPU? not sure how it called exactly, recent GPU with united VM
> suppport can do it). I am sure that even slow PCIE bus is enough to
> serve that, as pixel producing rate will be low, at least for low cost
> GPU with small number of threads.
>
> But looking at Cycles session code and tile manager, i simple get lost,
> it looks designed around idea to save memory using tiles and i not found
> simple way to force it to keep all in one chunk. Can set tile size very
> high, but then we get single threaded render.
>
> Brecht, or maybe someone else who know Core Bender and Cycles callbacks
> relation, can you please reconsider that tiling things and make a option
> for united buffer? I am sure that many other external renderers will
> need same if they want to make interactive preview during final render.
> Maybe not "proper", GPU friendly, but for CPU only case as first, just
> to stop that crashes when someone hit F12.
>
> Fixed patch in attachment.
>
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: one_big_buffer.patch
Type: application/octet-stream
Size: 2993 bytes
Desc: not available
Url : http://lists.blender.org/pipermail/bf-cycles/attachments/20130914/fdbd564c/attachment.obj