<div dir="ltr"><div>I am really sorry for the late report, but I was out all day yesterday and this morning and I wanted to finish everything up before submitting the report now.</div><div><br></div><div>Wiki link: <a href="https://wiki.blender.org/index.php/User:Geraldine/GSoC_2018/Reports#Week_4">https://wiki.blender.org/index.php/User:Geraldine/GSoC_2018/Reports#Week_4</a></div><div><br></div><div>So in this week, I believe I am pretty much done with the tiling function for volumes! The implementation should be completely finished for CPU and mostly finished for GPU by now, as all that is left to do is GPU testing and maybe add any other memory/speed optimizations we can think of. You can check it out here (<a href="https://developer.blender.org/diffusion/B/browse/soc-2018-cycles-volumes/">https://developer.blender.org/diffusion/B/browse/soc-2018-cycles-volumes/</a>). Here is an example volume rendered with and without tiling:<br><br></div><div>(See wiki link for photos)<br></div><div><br>Memory usage of 'color' (__tex_image_float4_000) reduced from 3.06M to 2.50M (18.3% reduction)<br>Memory usage of 'density' (__tex_image_float_003) reduced from 783.75K to 644.00K (17.8% reduction)<br>Memory usage of 'flame' (__tex_image_float_011) reduced from 783.75K to 292.00K (62.7% reduction)<br>Memory usage of 'temperature' (__tex_image_float_019) reduced from 783.75K to 290.00K (63.0% reduction)<br>Memory usage of 'velocity' (__tex_image_float4_008) increased from 3.06M to 3.06M, not using sparse grid. (0% reduction)<br>Render time increased from 8:46 to 10:07 (0.86x speed)<br><br>Color and velocity in particular are big contributors to memory usage, but especially velocity had no inactive tiles. Presumably with different settings, the memory usage can go as low as a third of the original texture.<br><br>Notice that there is nearly no noticeable visual difference between either image.<br><br><b>Last Week's To-do List</b><br><br>So this week I finished all the tasks in last week's to-do list:<br><br>* Modify sparse tiles to support generic types (yay templates).<br>* Add support for all Volume attributes.<br>* Make compatible with Mesh Volume speedup. Now regardless of the combination of sparse or dense textures per mesh, the Mesh Volume alogrithm will run succesfully.<br>* Implement lookup for tricubic interpolation. However, the function used is probably not that efficient.<br>* Make threshold value a user-inputted value. Thanks to Brecht for the suggestion to simply use the same isovalue as used in Mesh Volume.<br>* Remove the SparseTile struct altogether and just treat tiles abstractly. This was necessary for compatibility with CUDA's <i>tex3D()</i> and actually resulted in much cleaner code all throughout.<br>* Add support for OpenCL and CUDA. In theory, this feature should work with these two (because the code compiles XD), but I have not tested it yet nor made comparison tests.<br><br><b>A Few Notes on CUDA Implmentation</b><br><br>In order to support CUDA's interpolation, I added a second version of <i>create_sparse_grid() </i>that pads every tile's 6 faces with their voxel neighbors. However, this would result in a massive increase in memory usage, which depending on how dense the volume is, may not be worth the increased lookup time. A quick space optimization (thanks to Brecht for the advice!) is to remove padding between adjacent tiles in the sparse grid if they are already neighbors in the real image.<br><br>Another quirk of the interpolation is that <i>tex3D()</i> expects coordinates in the (x, y, z) coordinate format as opposed to a flat index for array access. While we could convert the offsets into cartesian coordinates, a more efficient method would be to simply store the coordinates of the first voxel in every tile. While this triples the size of <i>grid_info</i>, the lookup savings from not having to compute coordinates from the index should be worth it.<br><br><b>Memory and Speed Optimizations</b><br><br>Aside from the above, I also tried to make optimizations here and there to speed up rendering or reduce space usage further. Unfortunately, there will always be a speed penalty since extra calculations are required to get the sparse space coordinates. Some optimizations made:<br><br>1. Created quicker copies of <i>compute_index*()</i> in the <i>kernel_*_image.h</i> files<br><br>While having several lookup functions increases maintenance, implementing directly in the kernel files saves a significant percentage of time. I will keep one copy of <i>compute_index*()</i> in <i>util/util_sparse_grid.h</i> for use outside the kernel and as a reference point for the kernel implementations.</div><div><br></div><div>2. Stored some resolution info of the image in tiles in <i>device_memory</i> and <i>TextureInfo</i><br><br>The resolution of the image in tiles as well as the dimensions of the last tile in the grid are needed for lookup calculation. I implemented the above to avoid having to recalculate them with every sampling. Thus, more variables have been added to <i>device_memory</i>  and <i>TextureInfo</i>. It may be better to implement a wrapper class / struct for sparse <i>device_memory / TextureInfo</i>, but I don't think there is a way for <i>tex_alloc()</i> and <i>TextureInterpolator</i> to detect if the passed <i>device_memory / TextureInfo</i> is the wrapper aside from using dynamic cast.<br><br>3. Reduced sparse grid memory by truncating the width/height/depth of the last tiles in each direction if the length is not exactly divisible by <i>TILE_SIZE</i><br><br>Consider a 9 x 9 x 9 volume, which would require 4 tiles to cover. Using only 8 x 8 x 8 tiles, the sparse grid would be 8*8*8*4 = 2048 voxels large, as compared to the original grid's 9*9*9 = 729 voxels. By implementing 2 checks in the voxel lookup, we can reduce the wasted space significantly by truncating the tiles at the end of the grid. However, these extra checks do result in an increase in lookup time.<br><br>4. New dimensions parameter<br><br>While I originally made the kernel calculate for the truncated end tiles and discarded padding, this just resulted in significant slowdown in rendering. A better solution was to store the tile dimension information the same way tile indexes are already saved (as <i>device_memory</i>), and since all of the information is boolean, they can all be stored in the bits of a single int per tile and retrieved later through bit shifting.<br><br><b>To-do next week</b><br><br>I think that beginning next week I can start working on creating the OpenVDB import function. An easy start would be adding a UI option for it, butI will have to read up some more on the background of the problem before I can say for certain what will need to be done for this feature.<br><br>For tiling, there may still be more optimal ways of implementing the tile lookup. I will continue to modify the algorithm if we can come up with more ideas. I am open to any suggestions on how to improve lookup in particular, especially for OpenCL and CUDA as I am still not very familiar with them.<br></div><br></div>