[Soc-2009-dev] Soc-2009-dev Digest, Vol 5, Issue 5
ypoissant2 at videotron.ca
Sat Aug 8 14:52:52 CEST 2009
> Date: Fri, 7 Aug 2009 20:03:57 +0100
> From: Andr? Pinto
> *Tested some memory organization and started some SIMD stuff.
> (SIMD recursion, 4 nodes are pop-ed from stack and theirs BB tested at
> time, this doens't seems to scale that well, probably due to memory
> reorganization time and somehow bad assembly code). As so I have tried
> compile optimization flags to try to make it worth it.
I'm not sure how you are doing the SSE data organization. Your description
is not detailed enough.
N-ary BVH traversal with SSE needs a priori data organization at tree build
time. Not on demand data organization when traversing. Your 4 nodes BBox
data should already be organized as structure of arrays. The leaf nodes
should also have 4 triangled also organized as structure of arrays.
Papers discussing this technique (Ernst & Greiner, Dammertz, Hanika & Keller
and Wald, Bentin & Boulos) have reported between 1.5 to 2.5 traversal speed
improvement using this technique witn 4-ary BVHs. It is less efficient than
bundle traversal for coherent rays but much more efficient than bundle
traversal for very incoherent rays. Current BI does not have a big use of
incoherent rays except for AO and to a lesser extent for soft reflections.
Another technique that adapts well to BVH and n-ary BVH for coherent rays is
Multi-Level Ray Tracing Algorithm (MLRTA) by Reshetov where you can push a
bundle of coherent rays inside the tree before starting to test the
More information about the Soc-2009-dev