[Soc-2018-dev] Weekly Report #4 - Further Development for Cycles' Volume Rendering

Geraldine Chua chua.gsk at gmail.com
Sun Jun 10 18:04:41 CEST 2018


I am really sorry for the late report, but I was out all day yesterday and
this morning and I wanted to finish everything up before submitting the
report now.

Wiki link:
https://wiki.blender.org/index.php/User:Geraldine/GSoC_2018/Reports#Week_4

So in this week, I believe I am pretty much done with the tiling function
for volumes! The implementation should be completely finished for CPU and
mostly finished for GPU by now, as all that is left to do is GPU testing
and maybe add any other memory/speed optimizations we can think of. You can
check it out here (
https://developer.blender.org/diffusion/B/browse/soc-2018-cycles-volumes/).
Here is an example volume rendered with and without tiling:

(See wiki link for photos)

Memory usage of 'color' (__tex_image_float4_000) reduced from 3.06M to
2.50M (18.3% reduction)
Memory usage of 'density' (__tex_image_float_003) reduced from 783.75K to
644.00K (17.8% reduction)
Memory usage of 'flame' (__tex_image_float_011) reduced from 783.75K to
292.00K (62.7% reduction)
Memory usage of 'temperature' (__tex_image_float_019) reduced from 783.75K
to 290.00K (63.0% reduction)
Memory usage of 'velocity' (__tex_image_float4_008) increased from 3.06M to
3.06M, not using sparse grid. (0% reduction)
Render time increased from 8:46 to 10:07 (0.86x speed)

Color and velocity in particular are big contributors to memory usage, but
especially velocity had no inactive tiles. Presumably with different
settings, the memory usage can go as low as a third of the original texture.

Notice that there is nearly no noticeable visual difference between either
image.

*Last Week's To-do List*

So this week I finished all the tasks in last week's to-do list:

* Modify sparse tiles to support generic types (yay templates).
* Add support for all Volume attributes.
* Make compatible with Mesh Volume speedup. Now regardless of the
combination of sparse or dense textures per mesh, the Mesh Volume alogrithm
will run succesfully.
* Implement lookup for tricubic interpolation. However, the function used
is probably not that efficient.
* Make threshold value a user-inputted value. Thanks to Brecht for the
suggestion to simply use the same isovalue as used in Mesh Volume.
* Remove the SparseTile struct altogether and just treat tiles abstractly.
This was necessary for compatibility with CUDA's *tex3D()* and actually
resulted in much cleaner code all throughout.
* Add support for OpenCL and CUDA. In theory, this feature should work with
these two (because the code compiles XD), but I have not tested it yet nor
made comparison tests.

*A Few Notes on CUDA Implmentation*

In order to support CUDA's interpolation, I added a second version of
*create_sparse_grid()
*that pads every tile's 6 faces with their voxel neighbors. However, this
would result in a massive increase in memory usage, which depending on how
dense the volume is, may not be worth the increased lookup time. A quick
space optimization (thanks to Brecht for the advice!) is to remove padding
between adjacent tiles in the sparse grid if they are already neighbors in
the real image.

Another quirk of the interpolation is that *tex3D()* expects coordinates in
the (x, y, z) coordinate format as opposed to a flat index for array
access. While we could convert the offsets into cartesian coordinates, a
more efficient method would be to simply store the coordinates of the first
voxel in every tile. While this triples the size of *grid_info*, the lookup
savings from not having to compute coordinates from the index should be
worth it.

*Memory and Speed Optimizations*

Aside from the above, I also tried to make optimizations here and there to
speed up rendering or reduce space usage further. Unfortunately, there will
always be a speed penalty since extra calculations are required to get the
sparse space coordinates. Some optimizations made:

1. Created quicker copies of *compute_index*()* in the *kernel_*_image.h*
files

While having several lookup functions increases maintenance, implementing
directly in the kernel files saves a significant percentage of time. I will
keep one copy of *compute_index*()* in *util/util_sparse_grid.h* for use
outside the kernel and as a reference point for the kernel implementations.

2. Stored some resolution info of the image in tiles in *device_memory* and
*TextureInfo*

The resolution of the image in tiles as well as the dimensions of the last
tile in the grid are needed for lookup calculation. I implemented the above
to avoid having to recalculate them with every sampling. Thus, more
variables have been added to *device_memory*  and *TextureInfo*. It may be
better to implement a wrapper class / struct for sparse *device_memory /
TextureInfo*, but I don't think there is a way for *tex_alloc()* and
*TextureInterpolator* to detect if the passed *device_memory / TextureInfo*
is the wrapper aside from using dynamic cast.

3. Reduced sparse grid memory by truncating the width/height/depth of the
last tiles in each direction if the length is not exactly divisible by
*TILE_SIZE*

Consider a 9 x 9 x 9 volume, which would require 4 tiles to cover. Using
only 8 x 8 x 8 tiles, the sparse grid would be 8*8*8*4 = 2048 voxels large,
as compared to the original grid's 9*9*9 = 729 voxels. By implementing 2
checks in the voxel lookup, we can reduce the wasted space significantly by
truncating the tiles at the end of the grid. However, these extra checks do
result in an increase in lookup time.

4. New dimensions parameter

While I originally made the kernel calculate for the truncated end tiles
and discarded padding, this just resulted in significant slowdown in
rendering. A better solution was to store the tile dimension information
the same way tile indexes are already saved (as *device_memory*), and since
all of the information is boolean, they can all be stored in the bits of a
single int per tile and retrieved later through bit shifting.

*To-do next week*

I think that beginning next week I can start working on creating the
OpenVDB import function. An easy start would be adding a UI option for it,
butI will have to read up some more on the background of the problem before
I can say for certain what will need to be done for this feature.

For tiling, there may still be more optimal ways of implementing the tile
lookup. I will continue to modify the algorithm if we can come up with more
ideas. I am open to any suggestions on how to improve lookup in particular,
especially for OpenCL and CUDA as I am still not very familiar with them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.blender.org/pipermail/soc-2018-dev/attachments/20180611/28c4d32b/attachment.html>


More information about the Soc-2018-dev mailing list