[osg-users] Using SSE within OSG

James Killian James_Killian at hotmail.com
Tue Jul 29 19:44:47 PDT 2008

Ok I thought it was the collision detection but that is not the case here 
are some of the numbers with collision disabled:

CS:EIP      Symbol + Offset 
64-bit  Timer samples
0x10083cc0  osg::Group::traverse 
0x10083d60  osg::Group::computeBound 
0x10099ca0  osg::Matrixf::mult 
0x1001a9d0  osg::PositionAttitudeTransform::accept 
0x10099370  osg::Matrixf::preMult 
0x1000e840  osg::AnimationPathCallback::update 
0x1009bb50  osg::Node::dirtyBound 
0x100dcee0  osg::Transform::computeBound 
0x100a9df0  osg::PositionAttitudeTransform::computeLocalToWorldMatrix 
0x100126f0  osg::AnimationPath::getInterpolatedControlPoint 
0x10009c70  osg::AnimationPathCallback::setPause 
0x1000c8e0  osg::StateSet::requiresUpdateTraversal 

12 functions, 806 instructions, Total: 6542 samples, 50.85% of samples in 
the module, 16.36% of total session samples

Ok here is with collision detection:
CS:EIP      Symbol + Offset 
64-bit  Timer samples
0x10083cc0  osg::Group::traverse 
0x10083d60  osg::Group::computeBound 
0x10099ca0  osg::Matrixf::mult 
0x10099370  osg::Matrixf::preMult 
0x1001a9d0  osg::PositionAttitudeTransform::accept 
0x1000e840  osg::AnimationPathCallback::update 
0x100dcee0  osg::Transform::computeBound 
0x1009bb50  osg::Node::dirtyBound 
0x100126f0  osg::AnimationPath::getInterpolatedControlPoint 
0x100a9df0  osg::PositionAttitudeTransform::computeLocalToWorldMatrix 
0x10009c70  osg::AnimationPathCallback::setPause 
0x10002e00  osg::Matrixf::preMult 

12 functions, 846 instructions, Total: 6332 samples, 51.35% of samples in 
the module, 15.83% of total session samples

Here is with both matrixf and invert4x4 optimized:
CS:EIP      Symbol + Offset 
64-bit  Timer samples
0x10083cb0  osg::Group::traverse 
0x10083d50  osg::Group::computeBound 
0x1009a180  osg::Matrixf::mult 
0x1001ac70  osg::PositionAttitudeTransform::accept 
0x1000e650  osg::AnimationPathCallback::update 
0x100dcf30  osg::Transform::computeBound 
0x1009bcf0  osg::Node::dirtyBound 
0x100124f0  osg::AnimationPath::getInterpolatedControlPoint 
0x1009a340  osg::Matrixf::invert_4x3 
0x10009bb0  osg::GraphicsContext::ScreenIdentifier::~ScreenIdentifier 
0x100a9b20  osg::PositionAttitudeTransform::computeLocalToWorldMatrix 
0x10002d00  osg::Matrixf::preMult 
0x10002c70  osg::Matrixf::preMult 
0x1000c6b0  osg::StateSet::requiresUpdateTraversal 

14 functions, 829 instructions, Total: 6332 samples, 54.18% of samples in 
the module, 15.84% of total session samples

For the optimized profile it did push down the Invert4x4 way to the bottom 
(I did not want to show that here).  If you want the complete list let me 
know and I'll resend as attachments.  Actually you cannot really use this to 
see how much better the performance is, because the Matrixf Mult is still 
needed just as much, the actual way to tell would be to show the framerate 
of the game; however here is where I can show the optimization:
Avarage time using the D3DXMATRIX class:  402.54
Avarage time using the SPMatrix class:    277.69
Avarage time using the Matrixf class:    297.40
Avarage time using the ScalarDP class:    400.21
Avarage time using the DPMatrix class:    1418.11
Avarage time using the Matrixd class:    471.69

Here is the result for postMult where matrixf use to be the same as Matrixd. 
The 277.69 is what would have been for Matrixf is it was aligned.

Avarage time using the D3DXMATRIX class:  1035.63
Avarage time using the SPMatrix class:    365.36
Avarage time using the Matrixf class:    706.09
Avarage time using the ScalarDP class:    664.13
Avarage time using the DPMatrix class:    2052.29
Avarage time using the Matrixd class:    2125.93

Here is the results for Invert4x4 where Matrixf also was the same as Marixd, 
and the 365 is what it would have been if the data was aligned.

This stress code is part of the matlib2 with a little tweaking of the osg 
code to add into the mix.

James Killian
----- Original Message ----- 
From: "Mathias Fröhlich" <M.Froehlich at science-computing.de>
To: "OpenSceneGraph Users" <osg-users at lists.openscenegraph.org>
Sent: Tuesday, July 29, 2008 10:14 AM
Subject: Re: [osg-users] Using SSE within OSG


On Tuesday 29 July 2008 16:59, James Killian wrote:
> Paul asked me the same question a few days ago, and I just realized that 
> we
> took that offline so I'll repost here:
> One of the things I should add is the actual profile dump, since that 
> shows
> a more comprehensive picture.  The actual game demo is free to download 
> and
> play here:
> http://www.fringe-online.com/
> The current installer of the game does not have my optimization in it yet,
> but it should be noted even with the optimization the postmult is still at
> the top.  The Invert4x4() however got pushed way down to the bottom (which
> is great).  I'll post my profiles when I get home.
> ---------------------------------snip--------------------------------------
>- ---
> That is a good question, and I believe the answer is collision detection.
> I should disable it and run the numbers again to confirm.  All ships fire
> machine guns at a fast rate, and each bullet that gets close enough to a
> bounding box/sphere region has to go through the osg code to get the
> precise point where it hit.  Rick would probably have a better explanation
> of this and other factors since he coded the bulk of the collision
> detection (and osg integration).  Most of my time development in the game
> has been spent on the physics and flight dynamics (and now optimization).
> It may turn out that we could find some caching technique to reduce the
> collision stress (like the KBDtree), but in the mean time, matrix
> optimizations can benefit the whole community if we do them right, and I
> would like to make some contribution to the community.

Ok, you can do here much for the collision detection.
I expect that you should optimize that algorithmically and gain magnitudes
without sse.

So the question is more if such optimizations will bring performance
improovements for the usual scenegraph case.



Dr. Mathias Fröhlich, science + computing ag, Software Solutions
Hagellocher Weg 71-75, D-72070 Tuebingen, Germany
Phone: +49 7071 9457-268, Fax: +49 7071 9457-511
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

osg-users mailing list
osg-users at lists.openscenegraph.org

More information about the osg-users mailing list