[Bf-committers] OpenGL optimisation
trip
bf-committers@blender.org
Sun, 23 May 2004 19:17:48 -0400
Thank you this one works. Though I do not see any difference just yet
from older versions in speed.
On May 23, 2004, at 5:31 PM, Richard Berry wrote:
> A few choice expletives and a fresh CVS later:
>
> http://www.warwick.ac.uk/student/R.J.Berry/customblender.zip
>
> This is exactly the same as the previous version but links against the
> OS X Python framework instead of the Fink libraries. Thanks for the
> clarification about the build systems, I had a suspicion that it was
> something like this, but I'm new to scons...
>
> Okay, a better explanation of what I'm doing. Firstly, this
> optimisation only works when drawing subdivided meshes in "OpenGL
> Solid" mode (or anything else that uses DispListMesh but this is the
> only path that I'm looking at for the moment).
>
> The benchmark is a simple 50 frame animation of 3 cubes subdivided to
> level 6. Using the original Blender code this animation (via Alt-A)
> was nowhere near realtime on my system (PowerBook with ATI Mobility
> 9600), whereas the version with OpenGL display lists is. If you want
> to see a relative comparsion then switch between "Wire" display mode
> and "OpenGL Solid" mode.
>
> Using the Ctrl-Alt-T benchmark I get:
>
> Blender 2.33a:
> draw: 2216 ms
> draw+swap: 2286 ms
> displist: 2298 ms
> Custom Blender:
> draw: 187 ms
> draw+swap: 222 ms
> displist: 2467 ms
>
>
>
>
>
>
> The next bit might seem redundant to those of you who know OpenGL but
> I thought I'd justify my ideas. There are several ways of submitting
> vertex data to OpenGL:
>
> "Immediate Mode":
> This is currently what Blender uses, i.e. submitting vertices using
> something like:
> glBegin(GL_TRIANGLES);
> glNormal3fv(normal1);
> glVertex3fv(vertex1);
> ...
> glEnd();
> While this is really flexible with formats etc. it is also the slowest
> way to submit vertices: firstly because you have to make a function
> call for each vertex (and extra ones for stuff like the vertex normal
> / texture coordinates) and secondly because the data has to be
> uploaded to the card each time. Basically only useful for specifying
> small amounts of dynamic geometry.
>
> Check out:
>
> http://www.warwick.ac.uk/student/R.J.Berry/without-displists.png
>
> Nearly 21 million calls to glVertex3fv! For what I'm guessing is about
> 800 frames or less (from the number of calls to glFinish, but I think
> glFinish is called more than once per frame...).
>
> Compare this to when the objects are drawn using glCallList:
>
> http://www.warwick.ac.uk/student/R.J.Berry/with-displists.png
>
>
>
> Vertex Arrays:
> Basically a way of submitting a large amount of vertex data with very
> little function call overhead (i.e. one call to something like
> glDrawElements). Also, depending on the implementation, this is
> probably more efficient than immediate mode (vertex data is in an
> array in a predictable format). There are also a number of array
> related extensions that are specifically tailored for performance with
> AGP memory and DMA transfers to the graphics card (e.g.
> APPLE_vertex_array_range, ARB_vertex_buffer_object etc). Also, because
> OpenGL knows the size and format of the data it is possible to cache
> the data on the graphics card in some circumstances.
>
> The problem with this is that the vertex data has to be in a format
> optimised for the card (i.e. an array). For best performance the data
> should be interleaved, e.g.
> normal 1
> texture coordinates 1
> position 1
> normal 2
> texture coordinates 2
> position 2
>
>
>
> Display Lists:
> A way of encoding a number of OpenGL commands for efficient use by
> OpenGL. In modern implementations these commands will get compiled
> into an efficient format which is uploaded to the graphics card. In a
> best case scenario these will probably be the fastest method for
> drawing geometry. Older graphics cards will probably not upload into
> graphics card memory but will optimise the data and put it into AGP
> memory for fast upload, so this is still a win (probably at least as
> fast as vertex arrays). In the worst case with a software renderer
> probably no optimisation is done and it will be equivalent to
> immediate mode.
>
> However, even though you can put a load of state changes in the
> command list (e.g. changing textures, shading model, activing /
> deactivating lights etc) this can stall the graphics card so it's
> better to create display lists containing only geometry and put the
> state changes outside the display list compilation.
>
> Also, so the display list compiler / optimiser doesn't get confused
> the geometry should be submitted in a "consistent" way, in a similar
> way to the vertex array interleaved formats.
>
>
>
>
>
> I think the preference for the way that we submit data to the graphics
> card should be (from fastest to slowest):
> 1) OpenGL display lists for static data (i.e. most things in object
> mode, and data that isn't being edited in edit mode),
> 2) vertex array extensions for dynamic data (possibly some static
> data),
> 3) standard vertex arrays (as a fall back for when the extensions
> aren't supported),
> 4) immediate mode (probably only useful for a few GUI elements and
> non-mesh type objects, such as cameras, lamps and possibly really
> small meshes where the overhead of setting up display lists / vertex
> arrays is bad for performance).
>
> Further optimisations would probably only be applicable to the game
> engine (e.g. creating triangle / quad strips before drawing geometry,
> don't know if the game engine does this already).
>
> I think display lists are the way to go as they are the most
> compatible (requiring no extensions) and require much less work on our
> part (we basically need to wrap existing OpenGL draw commands in a
> list, whilst also optimising to reduce state changes). I think the
> fact that it probably only took about an hour for me to convert the
> code to use display lists for such a massive performance benefit
> speaks for itself.
>
> I admit that I have absolutely no idea what rendering method the game
> engine uses, but if we want to make it competitive in any way then we
> have to use display lists / vertex arrays.
>
>
>
>
> In response to Ton:
>
>> I rather look at a redesign for the Blender displaylists first...
>> this will - even
>> without ogl displaylists - improve performance quite some already.
>
> Hmmm... I doubt it. Optimising Blender display lists will result in
> less CPU work but won't make uploading vertex data to the graphics
> card any faster. The bottleneck with immediate mode (and vertex arrays
> that don't cache / upload data to the graphics card) is the bus
> bandwidth between the CPU and the GPU. There is no way that this is
> ever going to be faster than the GPU reading data out of graphics
> memory that is in an optimised format.
>
> Not only that but using display lists etc. will allow the GPU to do
> the work, not the CPU, so converting to OpenGL display lists will
> probably relieve the CPU much better than any amount of Blender
> display list optimisation, simply because the CPU doesn't have to do
> the work, it simply says to the GPU "draw this list", which the GPU
> already has in it's memory as opposed to uploading all the data again.
>
> You can see this in the screenshots that I showed. Without display
> lists the time to draw for the application and OpenGL (note the
> application is involved because of calling the OpenGL functions) is
> the total of all the times for glVertex3fv, glNormal3fv, glBegin and
> glEnd calls. With display lists the time to draw is only that of
> glCallList.
>
>> Is there any way to see what added memory usage is?
>
> Yes, but I'll probably have to use the OpenGL Driver Monitor to figure
> that out as the memory will be used by the graphics card, not the CPU.
> Memory usage will probably be be exactly the same as the size of the
> vertex data that has been submitted plus a tiny bit of overhead, so
> I'm guessing about 8 * 4 = 24 bytes per vertex (for floating point
> format with normal, 2 texture coordinates and position).
>
> If the display list is uploaded to the card then I don't see how this
> is going to be much of a problem for us. If the card does run low on
> memory then OpenGL will do a much better job of juggling textures,
> display lists, etc than we'll be able to do.
>
> If memory usage is a concern then at the very least I think we should
> convert to using vertex arrays and similar extensions. I think in many
> cases these can draw straight out of the application's memory and then
> cache data on the card if it's not altered. However, this is going to
> involve using a lot of extensions and doing appropriate fallback if
> they're not there.
>
> r i c k
> _______________________________________________
> Bf-committers mailing list
> Bf-committers@blender.org
> http://www.blender.org/mailman/listinfo/bf-committers
>