[Bf-committers] Problem compiling cycles+cuda on windows / slow GPU rendering

Christian Monfort monfort.c at gmail.com
Sun Jan 22 18:46:11 CET 2012


Hi,

I spent some times trying to compile trunk (r43573) with cycles+cuda
support and was unable to compile .cubin files using CUDA toolkit 4.0,
compilation always fails on sm_13 with out of memory error (with
either scons or cmake/nmake or cmake/visualstudio)

"D:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.0/bin/nvcc.exe"
-arch=sm_13 -m64 --cubin -use_fast_math --ptxas-options="-v"
--maxrregcount=24 --opencc-options -OPT:Olimit=0
-DCCL_NAMESPACE_BEGIN= -DCCL_NAMESPACE_END= -DNVCC -I
"intern\cycles\kernel\../util" -I "intern\cycles\kernel\../svm"
"intern\cycles\kernel\kernel.cu" -o
"Q:\Blender26_Dev\build\blender25-win64-vc\intern/cycles/kernel\kernel_sm_13.cubin"
kernel.cu
kernel.cu
tmpxft_000018a8_00000000-3_kernel.cudafe1.gpu
tmpxft_000018a8_00000000-10_kernel.cudafe2.gpu
./q:\blender26_dev\blender\intern\cycles\kernel\svm\svm_texture.h(45):
Warning:Pointer parameters must be inlined, so overriding noinline
attribute on '_Z7voronoi6float318NodeDistanceMetricfPfPS_'
q:\blender26_dev\blender\intern\cycles\kernel\svm/svm.h(154): Warning:
Pointer parameters must be inlined, so overriding noinline attribute
on '_Z14svm_eval_nodesP13KernelGlobalsP10ShaderData10ShaderTypefi'
C:/Users/CHRIST~1/AppData/Local/Temp/tmpxft_000018a8_00000000-11_kernel.cpp3.i(0):
Warning: Optimizing huge function kernel_cuda_path_trace because
Olimit has been overridden;
        compiler may run out of memory or run very slowly
C:/Users/CHRIST~1/AppData/Local/Temp/tmpxft_000018a8_00000000-11_kernel.cpp3.i(0):
### Compiler Error (user routine 'kernel_cuda_path_trace') during
PU_adjust_addr_flags phase:
### Out of memory in MEM_POOL_Alloc
nvopencc ERROR: D:/Program Files/NVIDIA GPU Computing
Toolkit/CUDA/v4.0/bin/../open64/lib//be.exe returned non-zero status 1
scons: *** [Q:\Blender26_Dev\build\blender25-win64-vc\intern\cycles\kernel\kernel_sm_13.cubin]
Error 2
scons: building terminated because of errors.

I then downloaded CUDA toolkit 4.1 which was able to generate the 3
.cubin files, but find out that rendering with cycles+GPU was about 3
times slower than official Blender 2.61 release (win 64).

To be sure it was not my system / or toolkit 4.1, I downloaded release
2.61 sources and compile with Toolkit 4.0 and 4.1, and found that
speed was about the same as official build...

To make it short:
trunk + toolkit 4.0: out of memory compiling kernel_sm_13.cubin
trunk + toolkit 4.1: ok, but GPU rendering 3x times slower than official 2.61
2.61 + toolkit 4.0: ok
2.61 + toolkit 4.1: ok
so, there was something comited since 2.61 release that prevent
compling on windows with CUDA toolkit 4.0 and slows down GPU
rendering.

Christian.

Specs: Win7 64, 6GB Ram, nVidia GTX570, VS2008, CUDA Toolkit 4.0 & 4.1


More information about the Bf-committers mailing list