<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=utf-8">

<META content="MSHTML 6.00.6000.16890" name=GENERATOR>

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT face=Arial size=2>Hi André</FONT></DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">

  <DIV>The build with SAH builds a 2-way tree.. but then the tree optimizations 

  (done in O(N) and O(NLogN)) (which reduce the expected number of BB-tests) 

  transform it into a tree where each node can have any number of 

  childs.<BR>This "any number of childs" make it harder to implement groupped 

  childs.</DIV></BLOCKQUOTE>

<DIV><FONT face=Arial size=2>I'd guess so. One thing I learned with SSE is that 

if the data is not already organized as SoA when needed, then any additional 

operation to organize it as SoA on demand will kill any little gain from 

processing SIMD way.</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>

<DIV><FONT face=Arial size=2>If your tree build algorithm does not allow to 

systematically pack 4 nodes as SoA then I'd say it is not worth pursuing a SIMD 

implementation with this tree build algorithm for speed improvement alone. For 

the acquired experience though, it is always goos to try that sort of 

thing.</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>

<DIV><FONT face=Arial size=2>Both Dammertz, Hanika &amp;&nbsp;Keller, and Ernst 

&amp;&nbsp;Greimer do the a priori 4-ary SoA&nbsp;systematically. Wald, 

Benthin&nbsp;&amp; Boulos are not so systematic but they use 16-wide SIMD (we 

can guess it is the Larrabee) and they never get nodes larger than 16 anyway. 

But because their SIMD is so wide, they nevertheless get significant 

gains</FONT></DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"><FONT 

  face=Arial size=2></FONT>

  <DIV>eg.: some sample data of a subset of the overlap2 scene (witouth tree 

  optimization / with tree optimization):<BR>BB tests per ray:&nbsp;&nbsp; 

  96.979 / 76.210<BR>BB hits per ray:&nbsp;&nbsp;&nbsp;&nbsp; 64.598 / 

  33.964</DIV></BLOCKQUOTE>

<DIV><FONT face=Arial size=2>That is very interesting results. Have you 

implemented an algorithm that you picked from a paper or is that an algorithm 

you developped yourself? Is this based on SAH, an extension of it or something 

else? Can you give information? (I'd understand if you keep it for a paper 

though)</FONT></DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"><FONT 

  face=Arial size=2></FONT>

  <DIV>About ray-coherency, I decided to only exploit it using hints to build 

  tree-cuts (like a LCTS), as other type of approach would be too troublesome 

  and a waste of time considering the modifications on BI code that will 

  probably be modified soon.</DIV></BLOCKQUOTE>

<DIV><FONT face=Arial size=2>When I checked the ray generation code in Blender, 

it seemed difficult to break the code so we can generate bundles of rays. I 

agree with you there.</FONT></DIV><FONT face=Arial size=2></FONT>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">

  <DIV>As so my initial approach to SIMD, was the simplest as possible, 

  1ray-4bvhs.<BR>The current SoC tree structures was not organized for it, and 

  so an initial try was implemented by poping 4nodes at a time from the dfs 

  stack.<BR>(This allowed to test simd ray-bb code, improve BLI_arena for doing 

  16aligned mallocs, and for me to get use to simd code)</DIV></BLOCKQUOTE>

<DIV><FONT face=Arial size=2>Well, that is good experience anyway. And the 

addition to BLI_arena is cool too.</FONT></DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"><FONT 

  face=Arial size=2></FONT>

  <DIV>During this week I plan to optimize the data structure not only for 

  grouping childs of BVH's, but also to yeild better memory access for non-SIMD 

  builds.<BR><BR>So I think I am going on a good direction!</DIV></BLOCKQUOTE>

<DIV><FONT face=Arial size=2>Definitely</FONT>.<BR></DIV>

<DIV><FONT face=Arial size=2>Yves</FONT></DIV></BODY></HTML>