[osg-submissions] Matrixf multiply Optimization

James Killian James_Killian at hotmail.com
Thu Jul 31 08:38:31 PDT 2008


"
The problem with that kind of optimization is that it might be fast with
your
current cpu. But might be slow with an other one.
So all in all I am not really sure if we should include this kind of
optimization.
"

To be honest I have not seen anyone use __m128d here at work, but we do use
__m128 (sse)  and __128i (sse2).  I think this is because there is a real
gain when you can do 4 (or more) instructions in one clock cycle than only 2
with 64 bit elements.  In regards to being fast on some cpu's and slower on
others.  I've tested the PIII 600 (SSE only)  PIV 2.4 myself and we have
shipped real-time video production software using intrinsics for the past 5
years.  I feel confident for anyone using Visual Studio or the Intel
compiler to get similar numbers to what I get.  However, I cannot speak for
other platforms, and for this I say we macro all cases using cmake so users
can choose to enable it for 8088 (e.g. intel amd) family, or not for others
(e.g. motorola 68000 ppc etc).

One other epiphany I realized this morning is that OSG does indeed have its
own Vec4f.  I am thinking to compare this against the F32vec4 and see how
compatible they are.  What would be awesome, is if I could in fact put the
intrinsics work inside the Vec4f itself (we may need an aligned version for
others to start using).  It is conceivable to have various platforms have
their optimization cases here, and default to c code if the intrinsics
option is off.   I'm going to investigate this, and if successful I may make
another submission with Vec4f.



----- Original Message ----- 
From: "Mathias Fröhlich" <M.Froehlich at science-computing.de>
To: "OpenSceneGraph Submissions" <osg-submissions at lists.openscenegraph.org>
Sent: Thursday, July 31, 2008 3:05 AM
Subject: Re: [osg-submissions] Matrixf multiply Optimization



Hi,

On Thursday 31 July 2008 05:52, James Killian wrote:
> I'm going to call it a night for now, but I'll test the other code and try
> to run some numbers, including the game fps and profiles.
> I will be curious to see the how well the Matrixd multiply went, but due
to
> the nature of 64 bit precision and how many multiplies and adds it can do,
> it should not have the same gain as matrixf, as you can see from the
> DPMatrix it was worse.
Thanks for comparing.
The problem with that kind of optimization is that it might be fast with
your
current cpu. But might be slow with an other one.
So all in all I am not really sure if we should include this kind of
optimization.

> BTW thanks for the other response which suggested some optimization
> techniques to try.  I'll pass that on to Rick, and we'll let you know what
> we find.
Well, what I described is partly implemented in simgear. I also needed to
look
if this is already checked in or if this still hangs around on my local
disk :)

Greetings

Mathias

-- 
Dr. Mathias Fröhlich, science + computing ag, Software Solutions
Hagellocher Weg 71-75, D-72070 Tuebingen, Germany
Phone: +49 7071 9457-268, Fax: +49 7071 9457-511
-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196


_______________________________________________
osg-submissions mailing list
osg-submissions at lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org



More information about the osg-submissions mailing list