<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16674" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2>Hi David</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2>My company makes very heavy use of SSE in our main
products, and there are vast speed improvements to be gained, sadly I don't have
permission to provide profiling data</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2>We use SSE's for heavy heavy matrix work outside of OSG, we
use some we have added to our OSG/OGL apps such as for normal generations,
fast sqr root routines, texture operations, the clock cycles saved can
mount up quickly</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2>I would say adding SSE operation in the right
places would be highly beneficial for the OSG core in performance
gains.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=544401412-29072008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff
size=2></FONT> </DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> osg-users-bounces@lists.openscenegraph.org
[mailto:osg-users-bounces@lists.openscenegraph.org] <B>On Behalf Of </B>David
Spilling<BR><B>Sent:</B> Tuesday, July 29, 2008 8:05 AM<BR><B>To:</B>
OpenSceneGraph Users<BR><B>Subject:</B> [osg-users] Using SSE within
OSG<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV dir=ltr>Dear All,<BR><BR>There's a discussion going on at the moment over
in osg-submissions, and it has been raised that this ought to be opened up to
the non-submissions community for feedback. Note that the following is my
reading of the issues, and certainly doesn't represent the consensus view of the
osg-submissions crowd, so feel free to challenge what I'm
saying!<BR><BR><B>Background</B><BR>Several people already use SSE instructions
(<A href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions"
target=_blank>http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions</A>)
alongside OSG to obtain speed improvements through parallelising math
operations. The general point that has been raised is that under-the-hood, OSG
does quite a lot that could benefit from the potential performance boost given
by SSE operations. Obvious targets include some of the Vec/Matrix routines, for
example. SSE is now sufficiently mainstream that the risk of processor
incompatibility is felt to be low.<BR><BR><B>Question 1 : Where could the core
OSG include SSE?</B><BR>Most people follow the sensible approach of profiling to
determine their bottlenecks, and then optimising particular methods in order to
gain speed-up. This would be a sensible approach to follow, as SSEing all
methods would probably be a waste of effort. It would therefore be
instructive firstly to know if anybody is using SSE with OSG, and where.
Secondly, for those who have profiling data and know how much time they spend in
Vec/Matrix/whatever methods, it would be useful to know which methods the
community considered good targets for SSEing. Any other maths "heavy lifting"
going on? (e.g. Intersection testing? Delauney triangulation?
etc.)<BR><BR><B>Question 2 : How could the core OSG include SSE?</B><BR>SSE code
benefits from aligned data. Hence there are several ways in which OSG
could include SSE:<BR><BR>a) Provide an aligned Vec4f and aligned Matrix4f
class, which support SSE operations. This would appear (to me) to be the least
intrusive.<BR><BR>b) Provide branching code within the existing Vec4/Matrix4
methods for detecting whether data is aligned, and performing the correct
operations. This would appear to me to be the most user-transparent. Although it
would appear to be a performance hit, testing so far on some specific code would
support the argument that the speed gains from SSE outweigh the branch cost;
more testing needed, I guess.<BR><BR>c) Robert suggested that SSE enabled array
operators (e.g. providing a cross-product operator for Vec3Array) might be
appropriate and provide the best speed improvement for those who want it.
Certainly using SSE on large array type data sets is where one gains the most
performance improvement.<BR><BR>This question includes the possibility of
linking out to, or pulling source code our of, an external optimised math
library.<BR><BR>Any other suggestions?<BR><BR><B>Question 3 : (possibly the
biggest) Should the core OSG include SSE?</B><BR>There are several downsides to
including SSE. Firstly, x-platform provision of SSE may be tricky due to the way
different compilers define aligned data, and how SSE instructions are used
within the code. I personally don't have much experience here, so any feedback
on x-plaform issues is useful.<BR><BR>Secondly, the code readability drops, and
the "use the source" argument may be trickier when many might not know much
SSE.<BR><BR><BR>So - your opinion, experience and suggestions
welcome!<BR><BR>David<BR><BR><BR><BR><BR><BR><BR></DIV></BODY></HTML>