[Bf-committers] Problem compiling cycles+cuda on windows / slow GPU rendering

Brecht Van Lommel brechtvanlommel at pandora.be
Mon Jan 23 18:36:25 CET 2012


Hi,

Thanks for the reports.

Regarding the slowdown, that is due to 4.1 probably, it has a new
compiler there that doesn't generate code as well, did not find a good
solution for that yet. You did mention you compiled the 2.61 from
source but I'm guessing it still somehow using the kernel compiled
with 4.0 ..

Running out of memory I'll look into, may have to disable this
functionality on sm 1.3. Also interesting would be to know if you used
the 64 bit toolkit? I think 32 bit toolkit can be installed on win64,
but it wouldn't allow using more than 2GB of memory.

Brecht.

On Sun, Jan 22, 2012 at 6:46 PM, Christian Monfort <monfort.c at gmail.com> wrote:
> Hi,
>
> I spent some times trying to compile trunk (r43573) with cycles+cuda
> support and was unable to compile .cubin files using CUDA toolkit 4.0,
> compilation always fails on sm_13 with out of memory error (with
> either scons or cmake/nmake or cmake/visualstudio)
>
> "D:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.0/bin/nvcc.exe"
> -arch=sm_13 -m64 --cubin -use_fast_math --ptxas-options="-v"
> --maxrregcount=24 --opencc-options -OPT:Olimit=0
> -DCCL_NAMESPACE_BEGIN= -DCCL_NAMESPACE_END= -DNVCC -I
> "intern\cycles\kernel\../util" -I "intern\cycles\kernel\../svm"
> "intern\cycles\kernel\kernel.cu" -o
> "Q:\Blender26_Dev\build\blender25-win64-vc\intern/cycles/kernel\kernel_sm_13.cubin"
> kernel.cu
> kernel.cu
> tmpxft_000018a8_00000000-3_kernel.cudafe1.gpu
> tmpxft_000018a8_00000000-10_kernel.cudafe2.gpu
> ./q:\blender26_dev\blender\intern\cycles\kernel\svm\svm_texture.h(45):
> Warning:Pointer parameters must be inlined, so overriding noinline
> attribute on '_Z7voronoi6float318NodeDistanceMetricfPfPS_'
> q:\blender26_dev\blender\intern\cycles\kernel\svm/svm.h(154): Warning:
> Pointer parameters must be inlined, so overriding noinline attribute
> on '_Z14svm_eval_nodesP13KernelGlobalsP10ShaderData10ShaderTypefi'
> C:/Users/CHRIST~1/AppData/Local/Temp/tmpxft_000018a8_00000000-11_kernel.cpp3.i(0):
> Warning: Optimizing huge function kernel_cuda_path_trace because
> Olimit has been overridden;
>        compiler may run out of memory or run very slowly
> C:/Users/CHRIST~1/AppData/Local/Temp/tmpxft_000018a8_00000000-11_kernel.cpp3.i(0):
> ### Compiler Error (user routine 'kernel_cuda_path_trace') during
> PU_adjust_addr_flags phase:
> ### Out of memory in MEM_POOL_Alloc
> nvopencc ERROR: D:/Program Files/NVIDIA GPU Computing
> Toolkit/CUDA/v4.0/bin/../open64/lib//be.exe returned non-zero status 1
> scons: *** [Q:\Blender26_Dev\build\blender25-win64-vc\intern\cycles\kernel\kernel_sm_13.cubin]
> Error 2
> scons: building terminated because of errors.
>
> I then downloaded CUDA toolkit 4.1 which was able to generate the 3
> .cubin files, but find out that rendering with cycles+GPU was about 3
> times slower than official Blender 2.61 release (win 64).
>
> To be sure it was not my system / or toolkit 4.1, I downloaded release
> 2.61 sources and compile with Toolkit 4.0 and 4.1, and found that
> speed was about the same as official build...
>
> To make it short:
> trunk + toolkit 4.0: out of memory compiling kernel_sm_13.cubin
> trunk + toolkit 4.1: ok, but GPU rendering 3x times slower than official 2.61
> 2.61 + toolkit 4.0: ok
> 2.61 + toolkit 4.1: ok
> so, there was something comited since 2.61 release that prevent
> compling on windows with CUDA toolkit 4.0 and slows down GPU
> rendering.
>
> Christian.
>
> Specs: Win7 64, 6GB Ram, nVidia GTX570, VS2008, CUDA Toolkit 4.0 & 4.1
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers


More information about the Bf-committers mailing list