[Bf-committers] OpenGL optimisation

Sun, 23 May 2004 19:17:48 -0400

Thank you this one works. Though I do not see any difference just yet 
from older versions in speed.

On May 23, 2004, at 5:31 PM, Richard Berry wrote:

> A few choice expletives and a fresh CVS later:
>
> http://www.warwick.ac.uk/student/R.J.Berry/customblender.zip
>
> This is exactly the same as the previous version but links against the 
> OS X Python framework instead of the Fink libraries. Thanks for the 
> clarification about the build systems, I had a suspicion that it was 
> something like this, but I'm new to scons...
>
> Okay, a better explanation of what I'm doing. Firstly, this 
> optimisation only works when drawing subdivided meshes in "OpenGL 
> Solid" mode (or anything else that uses DispListMesh but this is the 
> only path that I'm looking at for the moment).
>
> The benchmark is a simple 50 frame animation of 3 cubes subdivided to 
> level 6. Using the original Blender code this animation (via Alt-A) 
> was nowhere near realtime on my system (PowerBook with ATI Mobility 
> 9600), whereas the version with OpenGL display lists is. If you want 
> to see a relative comparsion then switch between "Wire" display mode 
> and "OpenGL Solid" mode.
>
> Using the Ctrl-Alt-T benchmark I get:
>
> Blender 2.33a:
> 	draw: 2216 ms
> 	draw+swap: 2286 ms
> 	displist: 2298 ms
> Custom Blender:
> 	draw: 187 ms
> 	draw+swap: 222 ms
> 	displist: 2467 ms
>
>
>
>
>
>
> The next bit might seem redundant to those of you who know OpenGL but 
> I thought I'd justify my ideas. There are several ways of submitting 
> vertex data to OpenGL:
>
> "Immediate Mode":
> This is currently what Blender uses, i.e. submitting vertices using 
> something like:
> 	glBegin(GL_TRIANGLES);
> 	glNormal3fv(normal1);
> 	glVertex3fv(vertex1);
> 	...
> 	glEnd();
> While this is really flexible with formats etc. it is also the slowest 
> way to submit vertices: firstly because you have to make a function 
> call for each vertex (and extra ones for stuff like the vertex normal 
> / texture coordinates) and secondly because the data has to be 
> uploaded to the card each time. Basically only useful for specifying 
> small amounts of dynamic geometry.
>
> Check out:
>
> http://www.warwick.ac.uk/student/R.J.Berry/without-displists.png
>
> Nearly 21 million calls to glVertex3fv! For what I'm guessing is about 
> 800 frames or less (from the number of calls to glFinish, but I think 
> glFinish is called more than once per frame...).
>
> Compare this to when the objects are drawn using glCallList:
>
> http://www.warwick.ac.uk/student/R.J.Berry/with-displists.png
>
>
>
> Vertex Arrays:
> Basically a way of submitting a large amount of vertex data with very 
> little function call overhead (i.e. one call to something like 
> glDrawElements). Also, depending on the implementation, this is 
> probably more efficient than immediate mode (vertex data is in an 
> array in a predictable format). There are also a number of array 
> related extensions that are specifically tailored for performance with 
> AGP memory and DMA transfers to the graphics card (e.g. 
> APPLE_vertex_array_range, ARB_vertex_buffer_object etc). Also, because 
> OpenGL knows the size and format of the data it is possible to cache 
> the data on the graphics card in some circumstances.
>
> The problem with this is that the vertex data has to be in a format 
> optimised for the card (i.e. an array). For best performance the data 
> should be interleaved, e.g.
> 	normal 1
> 	texture coordinates 1
> 	position 1
> 	normal 2
> 	texture coordinates 2
> 	position 2
>
>
>
> Display Lists:
> A way of encoding a number of OpenGL commands for efficient use by 
> OpenGL. In modern implementations these commands will get compiled 
> into an efficient format which is uploaded to the graphics card. In a 
> best case scenario these will probably be the fastest method for 
> drawing geometry. Older graphics cards will probably not upload into 
> graphics card memory but will optimise the data and put it into AGP 
> memory for fast upload, so this is still a win (probably at least as 
> fast as vertex arrays). In the worst case with a software renderer 
> probably no optimisation is done and it will be equivalent to 
> immediate mode.
>
> However, even though you can put a load of state changes in the 
> command list (e.g. changing textures, shading model, activing / 
> deactivating lights etc) this can stall the graphics card so it's 
> better to create display lists containing only geometry and put the 
> state changes outside the display list compilation.
>
> Also, so the display list compiler / optimiser doesn't get confused 
> the geometry should be submitted in a "consistent" way, in a similar 
> way to the vertex array interleaved formats.
>
>
>
>
>
> I think the preference for the way that we submit data to the graphics 
> card should be (from fastest to slowest):
> 1) OpenGL display lists for static data (i.e. most things in object 
> mode, and data that isn't being edited in edit mode),
> 2) vertex array extensions for dynamic data (possibly some static 
> data),
> 3) standard vertex arrays (as a fall back for when the extensions 
> aren't supported),
> 4) immediate mode (probably only useful for a few GUI elements and 
> non-mesh type objects, such as cameras, lamps and possibly really 
> small meshes where the overhead of setting up display lists / vertex 
> arrays is bad for performance).
>
> Further optimisations would probably only be applicable to the game 
> engine (e.g. creating triangle / quad strips before drawing geometry, 
> don't know if the game engine does this already).
>
> I think display lists are the way to go as they are the most 
> compatible (requiring no extensions) and require much less work on our 
> part (we basically need to wrap existing OpenGL draw commands in a 
> list, whilst also optimising to reduce state changes). I think the 
> fact that it probably only took about an hour for me to convert the 
> code to use display lists for such a massive performance benefit 
> speaks for itself.
>
> I admit that I have absolutely no idea what rendering method the game 
> engine uses, but if we want to make it competitive in any way then we 
> have to use display lists / vertex arrays.
>
>
>
>
> In response to Ton:
>
>> I rather look at a redesign for the Blender displaylists first... 
>> this will - even
>> without ogl displaylists - improve performance quite some already.
>
> Hmmm... I doubt it. Optimising Blender display lists will result in 
> less CPU work but won't make uploading vertex data to the graphics 
> card any faster. The bottleneck with immediate mode (and vertex arrays 
> that don't cache / upload data to the graphics card) is the bus 
> bandwidth between the CPU and the GPU. There is no way that this is 
> ever going to be faster than the GPU reading data out of graphics 
> memory that is in an optimised format.
>
> Not only that but using display lists etc. will allow the GPU to do 
> the work, not the CPU, so converting to OpenGL display lists will 
> probably relieve the CPU much better than any amount of Blender 
> display list optimisation, simply because the CPU doesn't have to do 
> the work, it simply says to the GPU "draw this list", which the GPU 
> already has in it's memory as opposed to uploading all the data again.
>
> You can see this in the screenshots that I showed. Without display 
> lists the time to draw for the application and OpenGL (note the 
> application is involved because of calling the OpenGL functions) is 
> the total of all the times for glVertex3fv, glNormal3fv, glBegin and 
> glEnd calls. With display lists the time to draw is only that of 
> glCallList.
>
>> Is there any way to see what added memory usage is?
>
> Yes, but I'll probably have to use the OpenGL Driver Monitor to figure 
> that out as the memory will be used by the graphics card, not the CPU. 
> Memory usage will probably be be exactly the same as the size of the 
> vertex data that has been submitted plus a tiny bit of overhead, so 
> I'm guessing about 8 * 4 = 24 bytes per vertex (for floating point 
> format with normal, 2 texture coordinates and position).
>
> If the display list is uploaded to the card then I don't see how this 
> is going to be much of a problem for us. If the card does run low on 
> memory then OpenGL will do a much better job of juggling textures, 
> display lists, etc than we'll be able to do.
>
> If memory usage is a concern then at the very least I think we should 
> convert to using vertex arrays and similar extensions. I think in many 
> cases these can draw straight out of the application's memory and then 
> cache data on the card if it's not altered. However, this is going to 
> involve using a lot of extensions and doing appropriate fallback if 
> they're not there.
>
> r i c k
> _______________________________________________
> Bf-committers mailing list
> Bf-committers@blender.org
> http://www.blender.org/mailman/listinfo/bf-committers
>