[osg-users] [build] How to properly use the OSG_GL3_AVAILABLE CMake option?

John Price john.price00 at gmail.com
Thu Nov 26 19:21:32 PST 2009

Hi Robert,

My understanding of the way bindness graphics actually achieves its speedup is by not requiring CPU dereferencing of graphic object pointers and the likely L2 cache misses this causes.

nVidia states:

OpenGL has evolved in a way that allows applications to replace many of the original state machine variables with blocks of user-defined data. For example, the current vertex state has been augmented by vertex buffer objects, fixed-function shading state and parameters have been replaced by shaders/programs and constant buffers, etc.. Applications switch between coarse sets of state by binding objects to the context or to other container objects (e.g. vertex array objects) instead of manipulating state variables of the context. In terms of the number of GL commands required to draw an object, this enables applications to be an order of magnitude more efficient. However, this explosion of objects bound to other objects has led to a new bottleneck - pointer chasing and CPU L2 cache misses in the driver, and general L2 cache pollution. 

Recent OpenGL graphics applications tend to change state at roughly these frequencies:

for  (...) { // cold
        data downloads, render target changes, etc.
        for (...) { // warm
            bind textures
            for (...) { // hot
                bind constants
                bind vertex buffers

The most frequent state changes are binding vertex buffer objects (every draw), followed closely by binding constant buffers. Vertex buffer and constant buffer binds are significantly more expensive than one might expect. These binds require several reads from the driver internal object data structure to accomplish what the driver actually needs to do. In an OpenGL driver, it looks like this:

name->obj (lookup object by name) 
obj->{refcount, GPU address, state, etc.} (dereference object to reference count it, to get its GPU virtual address, and validate its state). 

Each of these dereferences has a high probability of causing a CPU L2 cache miss due to the inherently LRU-eviction-unfriendly nature of graphics applications (each frame starts over at the beginning). These L2 cache misses are a huge bottleneck in modern drivers, and a penalty paid for every frame rendered.

End nVidia states.

I think these extensions address new bottlenecks created by the switch to gl3 style vertex and constant buffers, shaders for everything, etc. It seems to be a graphic driver bottleneck, not a scenegraph problem. But what nVidia is doing is admitting the problem and is trying to provide OpenGL users a way to take advantage of an optimization technique.
The OpenGL extensions are GL_NV_shader_buffer_load and GL_NV_vertex_buffer_unified_memory.

I am not competent enough with gl3 yet to begin to implement these in code, but it seems it may be worth doing that at some point. I like to add quality in things I pursue. As I progress, I will keep in touch. For now I am a toddler.

Thank you!


Read this topic online here:

More information about the osg-users mailing list