[osg-users] Using SSE within OSG
david.spilling at gmail.com
Tue Jul 29 05:04:59 PDT 2008
There's a discussion going on at the moment over in osg-submissions, and it
has been raised that this ought to be opened up to the non-submissions
community for feedback. Note that the following is my reading of the issues,
and certainly doesn't represent the consensus view of the osg-submissions
crowd, so feel free to challenge what I'm saying!
Several people already use SSE instructions (
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) alongside OSG to
obtain speed improvements through parallelising math operations. The general
point that has been raised is that under-the-hood, OSG does quite a lot that
could benefit from the potential performance boost given by SSE operations.
Obvious targets include some of the Vec/Matrix routines, for example. SSE is
now sufficiently mainstream that the risk of processor incompatibility is
felt to be low.
*Question 1 : Where could the core OSG include SSE?*
Most people follow the sensible approach of profiling to determine their
bottlenecks, and then optimising particular methods in order to gain
speed-up. This would be a sensible approach to follow, as SSEing all methods
would probably be a waste of effort. It would therefore be instructive
firstly to know if anybody is using SSE with OSG, and where. Secondly, for
those who have profiling data and know how much time they spend in
Vec/Matrix/whatever methods, it would be useful to know which methods the
community considered good targets for SSEing. Any other maths "heavy
lifting" going on? (e.g. Intersection testing? Delauney triangulation? etc.)
*Question 2 : How could the core OSG include SSE?*
SSE code benefits from aligned data. Hence there are several ways in which
OSG could include SSE:
a) Provide an aligned Vec4f and aligned Matrix4f class, which support SSE
operations. This would appear (to me) to be the least intrusive.
b) Provide branching code within the existing Vec4/Matrix4 methods for
detecting whether data is aligned, and performing the correct operations.
This would appear to me to be the most user-transparent. Although it would
appear to be a performance hit, testing so far on some specific code would
support the argument that the speed gains from SSE outweigh the branch cost;
more testing needed, I guess.
c) Robert suggested that SSE enabled array operators (e.g. providing a
cross-product operator for Vec3Array) might be appropriate and provide the
best speed improvement for those who want it. Certainly using SSE on large
array type data sets is where one gains the most performance improvement.
This question includes the possibility of linking out to, or pulling source
code our of, an external optimised math library.
Any other suggestions?
*Question 3 : (possibly the biggest) Should the core OSG include SSE?*
There are several downsides to including SSE. Firstly, x-platform provision
of SSE may be tricky due to the way different compilers define aligned data,
and how SSE instructions are used within the code. I personally don't have
much experience here, so any feedback on x-plaform issues is useful.
Secondly, the code readability drops, and the "use the source" argument may
be trickier when many might not know much SSE.
So - your opinion, experience and suggestions welcome!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the osg-users