[Bf-committers] "Official" CUDA Benchmark/Implementation Thread

Thu Dec 18 23:18:18 CET 2008

>
> As you can see the first 7 functions consume more than 90% of the total
> time during a rendering...
>

That brings about an interesting idea. If really that much of the code
is spent in ray intersection, then the question is can that part be
pulled out on its own and turned somehow into batches? We can't simply
just call a CUDA function for each ray, but we may be able to offload
the ray tracing code to the GPU, if we were to create lists of rays to
be computed. For instance at the begining of the render we'd need to
fire one ray for each pixel (if we ignore OSA for the moment). So we
could upload a list of 800x600=480k rays to the GPU for calculation.
Once that task is done, the results could be downloaded. From there
the CPU could calculate the vectors of the next set of rays. These
rays would be for reflection, and lighting (ignoring advanced
rendering effects). So now we have 800x600*(1+numoflights). Upload
this batch, the GPU computes, and continue.

This might actually work...

>From there we should be able to do a tad more tweaking for

We'd be thrashing the GPU memory, but it would still probably be
faster than the CPU.

Timothy

-- 
Two wrights don't make a rong, they make an airplane. Or bicycles.