[Soc-2009-dev] Soc-2009-dev Digest, Vol 5, Issue 5

Sat Aug 8 14:52:52 CEST 2009

> Date: Fri, 7 Aug 2009 20:03:57 +0100
> From: Andr? Pinto

> *Tested some memory organization and started some SIMD stuff.
> (SIMD recursion, 4 nodes are pop-ed from stack and theirs BB tested at 
> same
> time, this doens't seems to scale that well, probably due to memory
> reorganization time and somehow bad assembly code). As so I have tried 
> some
> compile optimization flags to try to make it worth it.

I'm not sure how you are doing the SSE data organization. Your description 
is not detailed enough.
N-ary BVH traversal with SSE needs a priori data organization at tree build 
time. Not on demand data organization when traversing. Your 4 nodes BBox 
data should already be organized as structure of arrays. The leaf nodes 
should also have 4 triangled also organized as structure of arrays.

Papers discussing this technique (Ernst & Greiner, Dammertz, Hanika & Keller 
and Wald, Bentin & Boulos) have reported between 1.5 to 2.5 traversal speed 
improvement using this technique witn 4-ary BVHs. It is less efficient than 
bundle traversal for coherent rays but much more efficient than bundle 
traversal for very incoherent rays. Current BI does not have a big use of 
incoherent rays except for AO and to a lesser extent for soft reflections.

Another technique that adapts well to BVH and n-ary BVH for coherent rays is 
Multi-Level Ray Tracing Algorithm (MLRTA) by Reshetov where you can push a 
bundle of coherent rays inside the tree before starting to test the 
individual rays.

Yves