[Bf-committers] "Official" CUDA Benchmark/Implementation Thread

Wed Dec 17 15:58:35 CET 2008

Okay, since we have several conversations going on in several threads
about CUDA, I thought I'd cause some more mahem and start an
"official" thread on CUDA benchmarking. Please keep discussion in this
thread restricted to benchmarking CUDA, or methods of implementing
CUDA in Blender. Granted we will switch to OpenCL when it comes out,
but at this point, there is no OpenCL spec, so CUDA/OpenCL discussions
are out for the time being.

I did some initial tests last night on my MBP and come up with some
surprising results. I used the bandwidthTest program that comes with
the CUDA SDK. Here's the spec of my system and the general speeds I
got:

2.4Ghz Core2 Duo
OSX 10.5.6
2GB DDR 667Mhz Dual Cannel RAM
GF8600

Host->Device Bandwidth: 1.1GB/sec
Device->Host Bandwidth: 1.1GB/sec
Device->Device Bandwidth: 11.2GB/sec

This is with large, multi-megabyte blocks of memory.

A tad lower than I was hoping, but still usable.

I was wrong in an earlier comment when I stated that CUDA could not
transfer data directly to OpenGL. It can! You can link a CUDA data
block to a OpenGL buffer.

So, what we need is some suggestions for simple tests we could
implement in blender. I was going to say the subdivision surface
calculator, assuming the bandwidth is high enough to allow the
subdivided data do be downloaded back into the CPU and into the
modifier stack. What would be nice, would be if we could just keep the
subdivided data in memory, and then convert that into a VBO and feed
that into the opengl display routines. That would be the best way, as
we'd only need to transfer the original mesh data to the GPU and then
keep it there for the subdivision and display.

Timothy

-- 
Two wrights don't make a rong, they make an airplane. Or bicycles.