[osg-users] Would someone be willing to help me diagnose a performance issue?
Brian R Hill
bhill22 at csc.com
Tue Nov 10 10:06:47 PST 2009
At the highest level:
Update - cpu processing your application is performing
Cull - cpu processing performed by osg in traversing the scene graph and
packaging drawing state
Draw - cpu processing for sending drawing state to gpu and gpu processing
of the drawing state
If Update stage is large, you need to determine what part of your
application is taking the time and optimize it.
If Cull stage is large the scene hierarchy is too complex and needs to be
optimized - spatial optimization, flatten static hierarchies, ...
If Draw stage is large then you need to determine if you are vertex or
pixel limited. If you grab the window corner and shrink/expand it's size
and the performance increases/decreases significantly, then you are
pixel/fill limited (too much depth complexity/too much pixel processing).
If the performance doesn't change then you are probably vertex/transform
limited (too many verts/too much vertex processing).
It's actually much more complex than this, but it's how I start diagnosing
Run your scene in osgViewer to see how fast it renders. You might have to
build the scene in your regular app and then save the scene root to an .ive
file, which you can load into osgViewer.
A really good tool to use is an OpenGL command logger. It logs all the
actual OpenGL commands that OSG is issuing and can help identify problem
-----osg-users-bounces at lists.openscenegraph.org wrote: -----
To: osg-users at lists.openscenegraph.org
From: "Frank Sullivan" <knarf.navillus at gmail.com>
Sent by: osg-users-bounces at lists.openscenegraph.org
Date: 11/10/2009 12:41PM
Subject: [osg-users] Would someone be willing to help me diagnose a
I have two applications, both of which are displaying basically the same
scene. One application runs at 230+ fps while the other runs at about 45
fps. I'm trying to figure out what is causing this performance difference.
Using high-precision timers, I've been able to determine that the
difference occurs somewhere in the rendering of the scene graph, but I'm
not 100% sure where. I have a couple of ideas, but each will take some
amount of time to investigate, and so I was hoping someone might be able to
lead me towards the most-correct answer.
The first idea I had concerns differences in how the scene graphs are
structured in each application. The quick app works simply by loading the
three models that it needs (from FLT files, so these 'models' are in fact
complex sub-graphs) and attaches them to the root node, and sets that root
node as the scene data.
The slow app loads every model that could possibly ever be used (52 in all,
and again each 'model' could actually be a complex sub-graph). These 52
nodes are then attached to the root, and their visibility is turned off by
setting their node mask to 0. Then, if the user of the application wants to
see a model, the app will copy the node (and all of it's children) and then
add this copy to the scene root group, with the visibility turned on. This
way, if the user of the app wants to populate the scene with many instances
of the same model, they can do so, because each time they do it, a separate
copy of the node is made.
I realize that there are a lot of things that can be done to make the slow
app more memory-efficient. For instance, it could use lazy loading to load
a model only when it is needed (although this may cause a noticeable delay,
but that would probably be fine). And if the user wants to see several
instances of this model, this could be accomplished without copying the
model's entire subgraph. Instead, we could simply create a new matrix
transform, and add THAT to the root, and add the model's node as a child of
this new matrix transform (at which point, the model's node will have more
than one Matrix Transform parent).
However, these issues seem to pertain more towards memory efficiency than
rendering efficiency, so I'm not sure if this is going to solve my
immediate problem (although it is almost certainly something I will
implement later on).
Related to this, I was wondering if anyone had an explanation as to what
the Camera / View statistics referred to. I read the Quick Start Guide, and
it had excellent information about the Event/Update/Cull/Draw/GPU chart at
the top of the statistics screen, but I'm not exactly sure what the
statistics in the Camera / View windows refer to. For instance, does the
Vertices stat refer to the total number of vertices in all of the
drawables, whether those drawables are visible or not? The reason I ask is
that, in terms of these statistics, both the Quick App and the Slow App
have nearly-identical numbers in the View section, but in the Camera
section, the Slow App's numbers are way, way, way higher. I wonder if this
tells me something about how to optimize the Slow App to bring it up to
The other major difference I noticed was in the threading model. The Quick
App uses DrawThreadPerContext and the Slow App uses SingleThreaded. I tried
getting the slow app to use DrawThreadPerContext by setting the environment
variable, but it ignored that value and chose the SingleThreaded model for
me. I can probably figure out why this happens, but I'm curious to know if
you think this will affect performance much?
Thanks so much to whomever has patience to read all this!
Read this topic online here:
osg-users mailing list
osg-users at lists.openscenegraph.org
More information about the osg-users