[Bf-committers] BIH progress update
Yves Poissant
ypoissant2 at videotron.ca
Sun Mar 23 05:59:30 CET 2008
> I've not read about SAH BVH (just glanced at BVH once a while back). I'll
> have to read up about it.
Nowadays, discussion about Surface Area Heuristic BVH assume the reader
already knows what it's all about. You must read Kalpathi Raman Subramanian
PhD thesis "Adapting Search Structures to Scene Characteristics for Ray
Tracing" published in 1990 but still available in pdf on the web. He is the
one who developped the whole SAH theory.
There are a few obvious heuristics for partitioning a hierarchy. The most
comonly used is to take the longest axis and split it right in the middle.
Or split it in the median which produces balanced trees. The SAH idea is
based on the idea that the probability that any ray hits an object is
proportional to its surface area. Intuitively, a large object have more
chance to be hit by a ray than a small object. So the tree is split in a way
that surface earas are mostly equal on both sides. This is very efficient
for traversal but a bit slow at build time.
> Also, there are some common things about spatial trees that are often
> missed in implementations. A common technique in any sort of search tree
> is links between branches; however this isn't done all that often in
> spatial trees (at least as far as I could tell). In the case of spatial
> trees these links are basically pointers between adjacent nodes. The
> advantages of these "neighboring" links is they reduce the time spent
> traversing the tree. The tree essentially becomes a cyclic graph, thus
> once you've identified the starting node of a ray you can trace that ray
> through the graph fairly quickly. I once had the idea of even using a 3d
> grid to speed up finding the starting node of a ray, but never tried it
> (the idea being that each grid cell would have a reference to the smallest
> node that contained the entire cell).
When you traverse a tree with a ray, you can already determine which
partitions need to be visited. All you need is push unvisited branches on
the stack for later visit. the skimming of tree branches is more efficient
with kd-trees than with BVH because the kd-tree branches are ordered. But
building kd-tree can be very long and consume a lot of memory because ther
is no clear termination criteria. Primitives must be splited to fit in each
partitions and they can thus be mulltiply visited which require managing a
mailboxing mechanism. But multithreading the mailboxing mechanism is not
funny at all.
BVH are more difficult to prune because the nodes overlap and sometime can
overlap a lot but the nodes can be ordered if flags are kept so the ray
direction can still be used as a heuristic. The BVH tree have a clear
bounded memory usage. The empty space pruning is automaticaly handled. And
intersecting a bounding box can be SIMDd although the current technique is
to intersect 4 rays per traversal at once.
The BIH have the advantage of being ordered like the kd-tree without having
to split the primitives and with a clear bounded memory usage like the BVH.
But BIH needs to prune empty space to be more efficient. And that is done by
adding pruning nodes. In the limit, we end up with the same number of AA
plane intersections as the BVH but with the addition of having to traverse
nodes in order to prune every dimension. That may be an explanation of why
Jacco Bikker had a 26% improvement with a BVH compared to a BIH.
All those structures have trouble with bunches of long thin primitives like
polygon modeled hair, though.
For efficiency, we usually don't want to bloat the nodes with not absolutely
necessary data. So inter-branches links are not used. The main reason for
avoiding that is that we want to cram as much nodes as possible in the
memory cache and we want to organize the nodes so that a bunch a tree
traversal steps can be executed in the same cache bloc. So nodes must be as
small as possible and tricks are used to do that such as using bits here and
there to store flags without adding bytes of data in the nodes. For example,
if the nodes are 8 bytes long, then we can use the lower bits of the child
address to store X,Y,Z,leaf flags. Vlastimil Havran's thesis "Heuristic Ray
Shooting Algorithms" is the document to read concerning the usage of the
memory cache and its influence on performance.
> My own tests with an octree I wrote appeared to show a massive speedup
> with this technique (I'm not an expert, so I'm not sure how valid my
> reference implementation was). Links between nodes don't really give an
> asymptotic advantage, but they do nonetheless give significant practical
> benefits (at least in my somewhat amateurish tests :) ). I think the 3d
> grid idea might be worth checking out, too (the idea being to reduce the
> tree traversal time to as far below the O(logN) upper bound as possible).
The current acceleration structure used in Blender renderer is called an
Octree but I think it is more like a regular grid because it uses a DDA. But
I haven't really took time to analyse it so I could be wrong.
The regular grid is very efficient for scenes where the primitives are well
distributed because traversing it, given a ray, amounts to doing a 3D
Digital Differential Analyser (DDA, the classic Bresenham line drawing
algorithm). It have trouble with large almost empty space. The so called
"teapot in the stadium" type of scene. In those scenes, the algorithm loose
a lot of time traversing empty spaces to end up in nodes that contain
hundreds if not thousands of primitives.
IMO, Blender should keep the octree/regular grid but add at least another
acceleration structure for scenes for which the octree is not optimal. The
way I see it, the UI octree setting could sugest the octree resolutions 64,
128, 256, 512 as we have now, and then another option that would be BVH. It
would actually be better if the name "octree" was changed for
"Accelleration" in that case. But the user could have a choice of using BVH
if that is more efficient for a given scene.
More information about the Bf-committers
mailing list