[Bf-cycles] Tiled Texture Caching for Cycles

Stefan Werner stewreo at gmail.com
Wed Jun 14 18:06:43 CEST 2017


> On 10. Jun 2017, at 18:47, Brecht Van Lommel <brechtvanlommel at pandora.be> wrote:
> 
> These are some very promising numbers. Is that with the overhead of the more expensive OIIO texture filtering included?

Yes, that is included. 

> For texture cache misses on the GPU, one way to handle that is to record the missing texture tiles, let the CPU load them into memory, and then re-execute just the shaders that had missing textures. Maybe there's a better way with unified virtual memory on newer cards, ideally the CPU could dynamically load a texture tile from disk when the GPU has a page fault, but I'm not sure if that's possible.

In Nvidia’s unified memory, Pascal hardware can page fault and automatically go to system memory, but to my knowledge there you’re still limited to pinned memory, that is, physical RAM. I don’t think there’s a way to intercept GPU page faults directly. In my experiments, I only saw CUDA with managed memory doing things entirely in system RAM and not uploading anything to VRAM, with a significant speed impact - slower than doing it manually (D2056). I may have been doing it wrong though.
I don’t have any AMD hardware, so I can’t speak for that platform.

Takahiro Harada has a paper about how to move existing GPU code to out-of-core, and I would think that this the method being used for Radeon ProRender:
https://github.com/takahiroharada/takahiroharada.github.io/raw/master/publications/2016_i3d.pdf
https://github.com/takahiroharada/takahiroharada.github.io/raw/master/publications/2016_i3d_sup.pdf

> Regarding branches, I guess there will be no more 2.7x release after 2.79. But code could still be committed to the 2.7x branch I think, if that's useful to someone and as long as it's easy to merge into 2.8.

I’ll keep it as a branch of 2.79 for now then. I can always rebase if necessary.

> Regarding Embree, that's very interesting as well. Certainly we need a better motion BVH that does interpolation of the inner node bounding boxes instead of what we have now. Ideally we can avoid having two BVH backends because it's harder to maintain and the GPU, and instead just integrate the optimizations into our own BVH, but I'm not sure how hard it is.

The same method for motion blur should be ported to Cycles’ BVH, I agree - embree won’t benefit any GPU rendering. It may be possible to use embree’s BVH as is and traverse it from the GPU directly as well. Overall though, I think that ray/scene intersections should be abstracted to the point that one can switch between backends and the only noticeable difference will be performance and/or memory consumption. There isn’t really much in a spatial subdivision that’s renderer specific and if needed, we can plug Cycles’ ray/primitive intersections into embree to get pixel perfect results.

It’s hard to predict the future, but it may be less work to maintain an embree backend and get their improvements for free rather than trying to re-invent the same optimisations for Cycles. That’s getting a bit into a philosophical discussion, for sure, but speaking for myself, I’m not smart enough to BVH that can can compete with embree.

-Stefan



More information about the Bf-cycles mailing list