[osg-users] Using SSE within OSG

James Killian James_Killian at hotmail.com
Tue Jul 29 19:44:47 PDT 2008


Ok I thought it was the collision detection but that is not the case here 
are some of the numbers with collision disabled:

CS:EIP      Symbol + Offset 
64-bit  Timer samples
0x10083cc0  osg::Group::traverse 
1434
0x10083d60  osg::Group::computeBound 
1391
0x10099ca0  osg::Matrixf::mult 
833
0x1001a9d0  osg::PositionAttitudeTransform::accept 
409
0x10099370  osg::Matrixf::preMult 
407
0x1000e840  osg::AnimationPathCallback::update 
352
0x1009bb50  osg::Node::dirtyBound 
340
0x100dcee0  osg::Transform::computeBound 
318
0x100a9df0  osg::PositionAttitudeTransform::computeLocalToWorldMatrix 
294
0x100126f0  osg::AnimationPath::getInterpolatedControlPoint 
285
0x10009c70  osg::AnimationPathCallback::setPause 
251
0x1000c8e0  osg::StateSet::requiresUpdateTraversal 
228

12 functions, 806 instructions, Total: 6542 samples, 50.85% of samples in 
the module, 16.36% of total session samples

Ok here is with collision detection:
=======================
CS:EIP      Symbol + Offset 
64-bit  Timer samples
0x10083cc0  osg::Group::traverse 
1382
0x10083d60  osg::Group::computeBound 
1237
0x10099ca0  osg::Matrixf::mult 
924
0x10099370  osg::Matrixf::preMult 
600
0x1001a9d0  osg::PositionAttitudeTransform::accept 
394
0x1000e840  osg::AnimationPathCallback::update 
292
0x100dcee0  osg::Transform::computeBound 
284
0x1009bb50  osg::Node::dirtyBound 
280
0x100126f0  osg::AnimationPath::getInterpolatedControlPoint 
274
0x100a9df0  osg::PositionAttitudeTransform::computeLocalToWorldMatrix 
230
0x10009c70  osg::AnimationPathCallback::setPause 
225
0x10002e00  osg::Matrixf::preMult 
210

12 functions, 846 instructions, Total: 6332 samples, 51.35% of samples in 
the module, 15.83% of total session samples


Here is with both matrixf and invert4x4 optimized:
=================================
CS:EIP      Symbol + Offset 
64-bit  Timer samples
0x10083cb0  osg::Group::traverse 
1362
0x10083d50  osg::Group::computeBound 
1142
0x1009a180  osg::Matrixf::mult 
922
0x1001ac70  osg::PositionAttitudeTransform::accept 
381
0x1000e650  osg::AnimationPathCallback::update 
354
0x100dcf30  osg::Transform::computeBound 
306
0x1009bcf0  osg::Node::dirtyBound 
274
0x100124f0  osg::AnimationPath::getInterpolatedControlPoint 
257
0x1009a340  osg::Matrixf::invert_4x3 
252
0x10009bb0  osg::GraphicsContext::ScreenIdentifier::~ScreenIdentifier 
248
0x100a9b20  osg::PositionAttitudeTransform::computeLocalToWorldMatrix 
245
0x10002d00  osg::Matrixf::preMult 
214
0x10002c70  osg::Matrixf::preMult 
197
0x1000c6b0  osg::StateSet::requiresUpdateTraversal 
178

14 functions, 829 instructions, Total: 6332 samples, 54.18% of samples in 
the module, 15.84% of total session samples

For the optimized profile it did push down the Invert4x4 way to the bottom 
(I did not want to show that here).  If you want the complete list let me 
know and I'll resend as attachments.  Actually you cannot really use this to 
see how much better the performance is, because the Matrixf Mult is still 
needed just as much, the actual way to tell would be to show the framerate 
of the game; however here is where I can show the optimization:
Avarage time using the D3DXMATRIX class:  402.54
Avarage time using the SPMatrix class:    277.69
Avarage time using the Matrixf class:    297.40
Avarage time using the ScalarDP class:    400.21
Avarage time using the DPMatrix class:    1418.11
Avarage time using the Matrixd class:    471.69

Here is the result for postMult where matrixf use to be the same as Matrixd. 
The 277.69 is what would have been for Matrixf is it was aligned.

Avarage time using the D3DXMATRIX class:  1035.63
Avarage time using the SPMatrix class:    365.36
Avarage time using the Matrixf class:    706.09
Avarage time using the ScalarDP class:    664.13
Avarage time using the DPMatrix class:    2052.29
Avarage time using the Matrixd class:    2125.93

Here is the results for Invert4x4 where Matrixf also was the same as Marixd, 
and the 365 is what it would have been if the data was aligned.

This stress code is part of the matlib2 with a little tweaking of the osg 
code to add into the mix.








James Killian
----- Original Message ----- 
From: "Mathias Fröhlich" <M.Froehlich at science-computing.de>
To: "OpenSceneGraph Users" <osg-users at lists.openscenegraph.org>
Sent: Tuesday, July 29, 2008 10:14 AM
Subject: Re: [osg-users] Using SSE within OSG



James,

On Tuesday 29 July 2008 16:59, James Killian wrote:
> Paul asked me the same question a few days ago, and I just realized that 
> we
> took that offline so I'll repost here:
> One of the things I should add is the actual profile dump, since that 
> shows
> a more comprehensive picture.  The actual game demo is free to download 
> and
> play here:
> http://www.fringe-online.com/
>
> The current installer of the game does not have my optimization in it yet,
> but it should be noted even with the optimization the postmult is still at
> the top.  The Invert4x4() however got pushed way down to the bottom (which
> is great).  I'll post my profiles when I get home.
>
>
> ---------------------------------snip--------------------------------------
>- ---
> That is a good question, and I believe the answer is collision detection.
> I should disable it and run the numbers again to confirm.  All ships fire
> machine guns at a fast rate, and each bullet that gets close enough to a
> bounding box/sphere region has to go through the osg code to get the
> precise point where it hit.  Rick would probably have a better explanation
> of this and other factors since he coded the bulk of the collision
> detection (and osg integration).  Most of my time development in the game
> has been spent on the physics and flight dynamics (and now optimization).
>
> It may turn out that we could find some caching technique to reduce the
> collision stress (like the KBDtree), but in the mean time, matrix
> optimizations can benefit the whole community if we do them right, and I
> would like to make some contribution to the community.

Ok, you can do here much for the collision detection.
I expect that you should optimize that algorithmically and gain magnitudes
without sse.

So the question is more if such optimizations will bring performance
improovements for the usual scenegraph case.

Greetings

Mathias

-- 
Dr. Mathias Fröhlich, science + computing ag, Software Solutions
Hagellocher Weg 71-75, D-72070 Tuebingen, Germany
Phone: +49 7071 9457-268, Fax: +49 7071 9457-511
-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196


_______________________________________________
osg-users mailing list
osg-users at lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org




More information about the osg-users mailing list