[Bf-committers] Some CUDA ideas.

Marko Radojcic sambucuself at gmail.com
Wed Dec 17 02:31:09 CET 2008


There has been a heated conversation about wheather or not CUDA will
accelerate Blender performance. My idea was to first attack the most
time consuming actions such as fluid baking and ray tracing
afterwards. There, I think, it would be possible to get the most
acceleration.

On 12/17/08, bf-committers-request at blender.org
<bf-committers-request at blender.org> wrote:
> Send Bf-committers mailing list submissions to
> 	bf-committers at blender.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.blender.org/mailman/listinfo/bf-committers
> or, via email, send a message with subject or body 'help' to
> 	bf-committers-request at blender.org
>
> You can reach the person managing the list at
> 	bf-committers-owner at blender.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bf-committers digest..."
>
>
> Today's Topics:
>
>    1. Re: CUDA backend implementation for GSoC? (Giuseppe Ghib?)
>    2. Re: CUDA backend implementation for GSoC? (Timothy Baldridge)
>    3. Re: CUDA backend implementation for GSoC? (Timothy Baldridge)
>    4. Re: CUDA backend implementation for GSoC? (Martin Poirier)
>    5. Re: CUDA backend implementation for GSoC? (Timothy Baldridge)
>    6. Re: blenderplayer.exe with option request (patrick)
>    7. Re: CUDA backend implementation for GSoC? (Giuseppe Ghib?)
>    8. Re: CUDA backend implementation for GSoC? (Timothy Baldridge)
>    9. Re: Shrinkwrap constraint (Joshua Leung)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 16 Dec 2008 17:24:16 +0100
> From: Giuseppe Ghib? <ghibo at mandriva.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: bf-blender developers <bf-committers at blender.org>
> Message-ID: <4947D630.7000107 at mandriva.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Timothy Baldridge ha scritto:
>> There's several issues involved in getting CUDA/OpenCL working with
>> Blender. The biggest is memory bandwidth. Let me explain: a Core i7
>> Processor can pull about 20GB/sec from the system memory. The Max a
>> PCIe bus can push through is 4GB/sec. Internally the high end GF8
>>
> Indeed with PCIe 2.0 you have doubled the bandwidth, and have a 0.5GB/s
> per lane, thus
> allowing 16GB/s (consider also you have configuration with SLI or quad-SLI).
>
> Bye
> Giuseppe.
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 16 Dec 2008 10:31:56 -0600
> From: "Timothy Baldridge" <tbaldridge at gmail.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: "bf-blender developers" <bf-committers at blender.org>
> Message-ID:
> 	<b33fdb110812160831j19895ccax17d3f6bb5b507010 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
>> Indeed with PCIe 2.0 you have doubled the bandwidth, and have a 0.5GB/s
>> per lane, thus
>> allowing 16GB/s (consider also you have configuration with SLI or
>> quad-SLI).
>
> Right, but the comment still stands....in many (perhaps most) cases,
> going from memory->PCIe->GPU->Stream Processor->GPU->PCIe->memory is
> going to be slower or at least have more overhead than
> memory->CPU->memory.
>
> Perhaps that's the best starting point. Can we get some solid
> benchmarks that show overhead (latency and bandwidth) for transfering
> data to and from the CPU (and setting up a simple program on the GPU)
> vs doing it all in memory. Don't forget, in Blender you will have to
> grab data from and insert data back into the Blender structures,
> unless you plan on handing data to CUDA/OpenCL in the format Blender
> uses it in.
>
> >From what I last heard, there is no good way to get data from CUDA
> driectly into OpenGL without taking it out of the GPU and inserting it
> back in. I think OpenCL allows inserting data into textures from
> OpenCL. So if we were going to use this for Subdivision surfaces,
> you'd have to upload the data to the GPU then stream the verticies out
> of the GPU and back into the GPU. Whereas the current method only
> streams them to the CPU.
>
> Timothy
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 16 Dec 2008 10:33:21 -0600
> From: "Timothy Baldridge" <tbaldridge at gmail.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: "bf-blender developers" <bf-committers at blender.org>
> Message-ID:
> 	<b33fdb110812160833u42fb47c2r5183e16869495711 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> That's not to say I'm not interested in this, I am...if someone wants
> to start doing some testing/planning on this let me know. I'd be glad
> to help.
>
> Timothy
> --
> Two wrights don't make a rong, they make an airplane. Or bicycles.
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 16 Dec 2008 08:50:42 -0800 (PST)
> From: Martin Poirier <theeth at yahoo.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: bf-blender developers <bf-committers at blender.org>
> Message-ID: <354754.42062.qm at web51312.mail.re2.yahoo.com>
> Content-Type: text/plain; charset=us-ascii
>
>
>
>
> --- On Tue, 12/16/08, Timothy Baldridge <tbaldridge at gmail.com> wrote:
>
>> Perhaps that's the best starting point. Can we get some
>> solid
>> benchmarks that show overhead (latency and bandwidth) for
>> transfering
>> data to and from the CPU (and setting up a simple program
>> on the GPU)
>> vs doing it all in memory. Don't forget, in Blender you
>> will have to
>> grab data from and insert data back into the Blender
>> structures,
>> unless you plan on handing data to CUDA/OpenCL in the
>> format Blender
>> uses it in.
>
> I've worked in the GPGPU field dbefore, oing real time processing (still
> under NDA, so I can't say much), I can tell you one thing: all latency issue
> are much worth the vast advantage in throughput.
>
> IMHO, the thing that will yield the greatest speed advantage and be the
> easiest to do would be moving the sequencer and compositor to the GPU, other
> parts of Blender being much less suited for conversion (not to say
> impossible, of course).
>
> As far as CUDA vs OpenCL vs whatever, I don't really have an opinion. CUDA
> was barely starting when I did this work, but from what I remember, memory
> transfer benchmarks were much better with CUDA buffers than with
> DirectX/OpenGL straight texture buffers.
>
> Martin
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 16 Dec 2008 11:12:01 -0600
> From: "Timothy Baldridge" <tbaldridge at gmail.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: "bf-blender developers" <bf-committers at blender.org>
> Message-ID:
> 	<b33fdb110812160912s67e1e56cp41b035038dc7f677 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Okay, then if we were going to move forward on this, I see one
> remaining hang-up (minus the coding needed) we need to find a way that
> we can use OpenCL style code in the CPU. IMO, we're going to run into
> issues in the future if we have two sets of code, unacellerated C and
> OpenCL. If we can get a software OpenCL implementation for Blender we
> get 2 things:
>
> 1) a standardized code base - no need for C and OpenCL code
> 2) a performance boost for multi-cpu systems as OpenCL would be very
> easy to scale across multiple cores.
>
> I don't hear any talk (besides from Apple via Grand Central) of such a
> project. So perhaps we need to start one? I took a look at the APIs
> released by Khronos, and it's fairly simple to implement on a CPU. The
> only real interesting part would be translating the kernel sourcecode
> into C and compiling it into CPU code on the fly. We could either
> write our own parser for the OpenCL kernel code and use LLVM to
> translate it into machine code, or translate the kernel code into C
> and compile it via GCC/MSVC. Either way we're talking a separate
> project.
>
> I'm interested in heading this up, and I'm going to try to find some
> support at the LLVM forums. Anyone here interested?
>
> Timothy
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 16 Dec 2008 12:14:19 -0500
> From: "patrick" <patrick at 11h11.com>
> Subject: Re: [Bf-committers] blenderplayer.exe with option request
> To: "bf-blender developers" <bf-committers at blender.org>
> Message-ID: <018201c95fa1$bbdce880$0d02a8c0 at audio>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
>
> hi,
>
>>> I *think* this was the standard behaiviour some time ago (therefore the
>>> -c option to keep the console OPEN).
>
> i think it's not like that anymore:
>
> http://projects.blender.org/plugins/scmsvn/viewcvs.php/trunk/blender/source/gameengine/GamePlayer/ghost/GPG_ghost.cpp?root=bf-blender&view=markup
>
> #ifdef WIN32
> #ifdef NDEBUG
>     if (closeConsole)
>     {
>     //::FreeConsole();    // Close a console window
>     }
> #endif // NDEBUG
> #endif // WIN32
>
> would be very neat to have the standard behaviour back (blenderplayer.exe
> closing console if -c is not set). anyone can make the modification?
> pat
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 16 Dec 2008 19:06:12 +0100
> From: Giuseppe Ghib? <ghibo at mandriva.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: bf-blender developers <bf-committers at blender.org>
> Message-ID: <4947EE14.7080100 at mandriva.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Timothy Baldridge ha scritto:
>>> Indeed with PCIe 2.0 you have doubled the bandwidth, and have a 0.5GB/s
>>> per lane, thus
>>> allowing 16GB/s (consider also you have configuration with SLI or
>>> quad-SLI).
>>>
>>
>> Right, but the comment still stands....in many (perhaps most) cases,
>> going from memory->PCIe->GPU->Stream Processor->GPU->PCIe->memory is
>> going to be slower or at least have more overhead than
>> memory->CPU->memory.
>>
> yep of course the memory bus is always faster than the PCIe bus, but
> IMHO at this point we don't know yet whether
> these bottenlecks are visible and how much affecting. IMHO also the
> 20GB/s are theoretical. Furthermore there are also other memory
> situations, whether the memory controller is internal/external,
> availability of NUMA (e.g. Opterons up to 8*quad core = 32 way) etc.,
> and all sort of of memory hogs that current day multicore systems
> are affected of (see for instance this paper:
> http://www.usenix.org/events/sec07/tech/full_papers/moscibroda/moscibroda.pdf).
>> Perhaps that's the best starting point. Can we get some solid
>> benchmarks that show overhead (latency and bandwidth) for transfering
>> data to and from the CPU (and setting up a simple program on the GPU)
>> vs doing it all in memory.
> yep, probably the best is to do some benchmark approach. Any volunteer?
>>  Don't forget, in Blender you will have to
>> grab data from and insert data back into the Blender structures,
>> unless you plan on handing data to CUDA/OpenCL in the format Blender
>> uses it in.
>>
>>
> probably you'll have a graph showing the speedup of CUDA/OpenCL vs
> multithreaded CPU for
> increasing DATA size values. At a certain point as DATA size further
> increases, this gain
> will fall to 1 or even less: the benchmark should find this
> "size"/crossing point.
>> >From what I last heard, there is no good way to get data from CUDA
>> driectly into OpenGL without taking it out of the GPU and inserting it
>> back in. I think OpenCL allows inserting data into textures from
>> OpenCL. So if we were going to use this for Subdivision surfaces,
>> you'd have to upload the data to the GPU then stream the verticies out
>> of the GPU and back into the GPU. Whereas the current method only
>> streams them to the CPU.
>>
> are you saying that the OpenGL part of the video card is not able to
> talk "directly" to the
> OpenCL|CUDA part without passing from the CPU over and over (so not DMA?)?
>
> Bye
> Giuseppe.
>
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 16 Dec 2008 13:17:11 -0600
> From: "Timothy Baldridge" <tbaldridge at gmail.com>
> Subject: Re: [Bf-committers] CUDA backend implementation for GSoC?
> To: "bf-blender developers" <bf-committers at blender.org>
> Message-ID:
> 	<b33fdb110812161117j7e2f968dha6bd02815aeede06 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I got a reply back from the LLVM developers:
>
> "Yes, we started implementing OpenCL on top of gallium using LLVM and Clang.
> We're planning to publish a public repository in the next week or so."
>
> I'm going to whip out a framework this week/weekend that should allow
> us to start benchmarking some GPU vs CPU comparisons. Once I get that
> sourcecode up perhaps we can get Ton to set us up a repository where
> we could start testing some ideas. I'm wondering if we couldn't
> leverage the new RNA to develop a somewhat simple API for
> uploading/downloading data from the GPU so code writers would only
> need to worry about writing kernels.
>
> Once I get the benchmarks done, I'd like to setup a simple database
> (like the blender benchmark site) that will allow us to track how
> various system stats effect the performance of the routines.
>
> Timothy
>
>
> --
> Two wrights don't make a rong, they make an airplane. Or bicycles.
>
>
> ------------------------------
>
> Message: 9
> Date: Wed, 17 Dec 2008 12:49:13 +1300
> From: "Joshua Leung" <aligorith at gmail.com>
> Subject: Re: [Bf-committers] Shrinkwrap constraint
> To: "bf-blender developers" <bf-committers at blender.org>
> Message-ID:
> 	<c3b983a20812161549g18d86747icb28f2f7e20d751d at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> Here is a little review based on constraints system requirements:
> 1) shrinkwrap_new_data() is not needed. MEM_callocN() is used to allocate
> the data anyway, and since those values are all 0, it should be fine
> without. Only define this callback if there's any custom settings (i.e.
> non-zero values) which need to be set for the constraint by default.
>
> 2) Currently this is missing an entry to the Ctrl-Alt-C (add constraint)
> menu. See editconstraint.c -> addconstraint(). Make sure you add the entry
> in a similar order to the other menu and only for relevant type of target is
> selected (I guess this would be for meshes only, right?)
>
> 3) Type menu - capitialise all the entries in that menu - each item should
> begin with capital letter. Probably that menu string can be declared const
> in the code too...
>
> 4) Distance to target option works in a rather odd fashion. In the quick
> test I was doing (object shrinkwrapped to grid with some quick PET hills),
> this setting only seems to offset the object downwards. It would be more
> useful if negative values were allowed too, so that object/bone could be
> moved to sit on top of the ground surface, instead of sinking into it.
>
> 5) Adding this constraint to a bone will clear the rotation/scale of the
> bone. This is not acceptable. Seeing as you're just modifying the location
> now, perhaps it would be better to just copy the location component of the
> 'target' matrix (i.e. result of shrinkwrap) to the 'owner' matrix in the
> shrinkwrap_evaluate() callback.
>
> Alternatively (or in addition), it might be useful to explore having an
> option to use the vector along which the owner was moved to lie on the grid,
> to define the rotation of the owner. This could be useful for having a bone,
> etc. to align with the normals of the target mesh (i.e. when using this
> constraint to keep eyelid control bones glued to an eyeball).
>
>
> 6) Small formatting nitpicks - with comments, use the /* */ style comments
> instead of //. I generally reserve // for TODO/FIXME type of temporary
> comments that should be addressed ASAP.
>
> ---
>
> Anyways, overall the default shrinkwrapping behaviour is great! It works out
> of the box like a treat, so if you just fix the issues noted here, it should
> be great to go in.   :-)
>
> +1 from me
>
> Regards,
> Joshua
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.blender.org/pipermail/bf-committers/attachments/20081217/52576940/attachment.htm
>
> ------------------------------
>
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>
>
> End of Bf-committers Digest, Vol 53, Issue 28
> *********************************************
>

-- 
Sent from my mobile device


More information about the Bf-committers mailing list