[Bf-committers] BIH progress update

Sun Mar 23 05:59:30 CET 2008

> I've not read about SAH BVH (just glanced at BVH once a while back).  I'll 
> have to read up about it.

Nowadays, discussion about Surface Area Heuristic BVH assume the reader 
already knows what it's all about. You must read Kalpathi Raman Subramanian 
PhD thesis "Adapting Search Structures to Scene Characteristics for Ray 
Tracing" published in 1990 but still available in pdf on the web. He is the 
one who developped the whole SAH theory.

There are a few obvious heuristics for partitioning a hierarchy. The most 
comonly used is to take the longest axis and split it right in the middle. 
Or split it in the median which produces balanced trees. The SAH idea is 
based on the idea that the probability that any ray hits an object is 
proportional to its surface area. Intuitively, a large object have more 
chance to be hit by a ray than a small object. So the tree is split in a way 
that surface earas are mostly equal on both sides. This is very efficient 
for traversal but a bit slow at build time.

> Also, there are some common things about spatial trees that are often 
> missed in implementations.  A common technique in any sort of search tree 
> is links between branches; however this isn't done all that often in 
> spatial trees (at least as far as I could tell).  In the case of spatial 
> trees these links are basically pointers between adjacent nodes. The 
> advantages of these "neighboring" links is they reduce the time spent 
> traversing the tree.  The tree essentially becomes a cyclic graph, thus 
> once you've identified the starting node of a ray you can trace that ray 
> through the graph fairly quickly.  I once had the idea of even using a 3d 
> grid to speed up finding the starting node of a ray, but never tried it 
> (the idea being that each grid cell would have a reference to the smallest 
> node that contained the entire cell).

When you traverse a tree with a ray, you can already determine which 
partitions need to be visited. All you need is push unvisited branches on 
the stack for later visit. the skimming of tree branches is more efficient 
with kd-trees than with BVH because the kd-tree branches are ordered. But 
building kd-tree can be very long and consume a lot of memory because ther 
is no clear termination criteria. Primitives must be splited to fit in each 
partitions and they can thus be mulltiply visited which require managing a 
mailboxing mechanism. But multithreading the mailboxing mechanism is not 
funny at all.

BVH are more difficult to prune because the nodes overlap and sometime can 
overlap a lot but the nodes can be ordered if flags are kept so the ray 
direction can still be used as a heuristic. The BVH tree have a clear 
bounded memory usage. The empty space pruning is automaticaly handled. And 
intersecting a bounding box can be SIMDd although the current technique is 
to intersect 4 rays per traversal at once.

The BIH have the advantage of being ordered like the kd-tree without having 
to split the primitives and with a clear bounded memory usage like the BVH. 
But BIH needs to prune empty space to be more efficient. And that is done by 
adding pruning nodes. In the limit, we end up with the same number of AA 
plane intersections as the BVH but with the addition of having to traverse 
nodes in order to prune every dimension. That may be an explanation of why 
Jacco Bikker had a 26% improvement with a BVH compared to a BIH.

All those structures have trouble with bunches of long thin primitives like 
polygon modeled hair, though.

For efficiency, we usually don't want to bloat the nodes with not absolutely 
necessary data. So inter-branches links are not used. The main reason for 
avoiding that is that we want to cram as much nodes as possible in the 
memory cache and we want to organize the nodes so that a bunch a tree 
traversal steps can be executed in the same cache bloc. So nodes must be as 
small as possible and tricks are used to do that such as using bits here and 
there to store flags without adding bytes of data in the nodes. For example, 
if the nodes are 8 bytes long, then we can use the lower bits of the child 
address to store X,Y,Z,leaf flags. Vlastimil Havran's thesis "Heuristic Ray 
Shooting Algorithms" is the document to read concerning the usage of the 
memory cache and its influence on performance.

> My own tests with an octree I wrote appeared to show a massive speedup 
> with this technique (I'm not an expert, so I'm not sure how valid my 
> reference implementation was).  Links between nodes don't really give an 
> asymptotic advantage, but they do nonetheless give significant practical 
> benefits (at least in my somewhat amateurish tests :) ).  I think the 3d 
> grid idea might be worth checking out, too (the idea being to reduce the 
> tree traversal time to as far below the O(logN) upper bound as possible).

The current acceleration structure used in Blender renderer is called an 
Octree but I think it is more like a regular grid because it uses a DDA. But 
I haven't really took time to analyse it so I could be wrong.

The regular grid is very efficient for scenes where the primitives are well 
distributed because traversing it, given a ray, amounts to doing a 3D 
Digital Differential Analyser (DDA, the classic Bresenham line drawing 
algorithm). It have trouble with large almost empty space. The so called 
"teapot in the stadium" type of scene. In those scenes, the algorithm loose 
a lot of time traversing empty spaces to end up in nodes that contain 
hundreds if not thousands of primitives.

IMO, Blender should keep the octree/regular grid but add at least another 
acceleration structure for scenes for which the octree is not optimal. The 
way I see it, the UI octree setting could sugest the octree resolutions 64, 
128, 256, 512 as we have now, and then another option that would be BVH. It 
would actually be better if the name "octree" was changed for 
"Accelleration" in that case. But the user could have a choice of using BVH 
if that is more efficient for a given scene.