[Soc-2009-dev] Soc-2009-dev Digest, Vol 5, Issue 5

Mon Aug 10 16:09:43 CEST 2009

Hi Yves,

I have only started to care about SIMD since last week, this is because I
can't even be sure that SSE will be enabled on official builds.

So far optimizations tricks used during tree build make a creation of a QBVH
(or other structure with groupped childs) not that direct.

The build with SAH builds a 2-way tree.. but then the tree optimizations
(done in O(N) and O(NLogN)) (which reduce the expected number of BB-tests)
transform it into a tree where each node can have any number of childs.
This "any number of childs" make it harder to implement groupped childs.

eg.: some sample data of a subset of the overlap2 scene (witouth tree
optimization / with tree optimization):
BB tests per ray:   96.979 / 76.210
BB hits per ray:     64.598 / 33.964

About ray-coherency, I decided to only exploit it using hints to build
tree-cuts (like a LCTS), as other type of approach would be too troublesome
and a waste of time considering the modifications on BI code that will
probably be modified soon.

As so my initial approach to SIMD, was the simplest as possible, 1ray-4bvhs.
The current SoC tree structures was not organized for it, and so an initial
try was implemented by poping 4nodes at a time from the dfs stack.
(This allowed to test simd ray-bb code, improve BLI_arena for doing
16aligned mallocs, and for me to get use to simd code)

During this week I plan to optimize the data structure not only for grouping
childs of BVH's, but also to yeild better memory access for non-SIMD builds.

So I think I am going on a good direction!
thanks for the feedback (nice to know people read what I do :)

André

2009/8/8 Yves Poissant <ypoissant2 at videotron.ca>

> > Date: Fri, 7 Aug 2009 20:03:57 +0100
> > From: Andr? Pinto
>
> > *Tested some memory organization and started some SIMD stuff.
> > (SIMD recursion, 4 nodes are pop-ed from stack and theirs BB tested at
> > same
> > time, this doens't seems to scale that well, probably due to memory
> > reorganization time and somehow bad assembly code). As so I have tried
> > some
> > compile optimization flags to try to make it worth it.
>
> I'm not sure how you are doing the SSE data organization. Your description
> is not detailed enough.
> N-ary BVH traversal with SSE needs a priori data organization at tree build
> time. Not on demand data organization when traversing. Your 4 nodes BBox
> data should already be organized as structure of arrays. The leaf nodes
> should also have 4 triangled also organized as structure of arrays.
>
> Papers discussing this technique (Ernst & Greiner, Dammertz, Hanika &
> Keller
> and Wald, Bentin & Boulos) have reported between 1.5 to 2.5 traversal speed
> improvement using this technique witn 4-ary BVHs. It is less efficient than
> bundle traversal for coherent rays but much more efficient than bundle
> traversal for very incoherent rays. Current BI does not have a big use of
> incoherent rays except for AO and to a lesser extent for soft reflections.
>
> Another technique that adapts well to BVH and n-ary BVH for coherent rays
> is
> Multi-Level Ray Tracing Algorithm (MLRTA) by Reshetov where you can push a
> bundle of coherent rays inside the tree before starting to test the
> individual rays.
>
> Yves
>
>
> _______________________________________________
> Soc-2009-dev mailing list
> Soc-2009-dev at blender.org
> http://lists.blender.org/mailman/listinfo/soc-2009-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/soc-2009-dev/attachments/20090810/397915b0/attachment.htm