[osg-users] Using SSE within OSG

Benjamin Eikel osg at eikel.org
Tue Jul 29 06:06:25 PDT 2008


Am Dienstag, 29. Juli 2008 14:28:18 schrieb Benjamin Eikel:
> Am Dienstag, 29. Juli 2008 14:04:59 schrieb David Spilling:
> > Dear All,
>
> [...]
>
> > Any other suggestions?
> >
> > *Question 3 : (possibly the biggest) Should the core OSG include SSE?*
> > There are several downsides to including SSE. Firstly, x-platform
> > provision of SSE may be tricky due to the way different compilers define
> > aligned data, and how SSE instructions are used within the code. I
> > personally don't have much experience here, so any feedback on x-plaform
> > issues is useful.
> >
> > Secondly, the code readability drops, and the "use the source" argument
> > may be trickier when many might not know much SSE.
>
> Hello David,
>
> may I suggest that you check the assembler code that the compilers create
> when compiling the OSG code? I have not done it for the OSG code, but for
> another project I have done some time ago. There I tried to optimize the
> performance for composing depth-buffer attached images for sort-last
> rendering. Somehow I was not able to be much better than the compiler was.
> In some rare cases my procedures were faster, but most of the time the
> compiler was the winner. The code created by the compilers consider so many
> things - e. g. branch prediction by the processer, code reordering - that
> it is quite hard for a human programmer to beat them.
> For example if you use g++ with -march=core2 -O3 (see man page for
> description of parameters) the compiler automatically uses SSE or even
> SSE2, 3dNOW!, etc. instructions. In some cases the compiler generates much
> better assembler code than a normal programmer would do. There are some
> case though were manual improvements could yield better results.
> I heard that the Intel C++ compiler is able to optimize even better.
> Furthermore the use of profiling first is a good approach. Maybe it would
> be reasonable to compare profiling data of the Math/Vector/Matrix classes
> with and without compiler optimizations and see if some bottlenecks
> disappear when using the optimizations.
>
> Regards,
> Benjamin
Hello,

I have an addition:
With gcc/g++ you can use profiling (option -fprofile-generate) to help the 
compiler to do better optimizations (option -fprofile-use, e. g. loop 
unrolling). Maybe this can improve the performance further.
If you want performance and sacrifice safety and precision for it, you may 
even think about -ffast-math (may be dangerous).
The options are explained on the gcc/g++ man page or in the online manual [1].
There may be similar options for other compilers.
And please do not get me wrong. I do not want to stop your efforts to improve 
the performance of OSG; far from it! But putting assembler code into the 
project decrease the readability and serviceability of the code. Furthermore 
it might be that it does not improve the speed at all. I just want to suggest 
that you try to exhaust the possibility of modern compilers as much as 
possible. If you see any bottlenecks after that, it might make sense to 
include manual performance tuning.

Regards,
Benjamin

[1] 
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Optimize-Options.html#Optimize-Options

>
> > So - your opinion, experience and suggestions welcome!
> >
> > David
>
> _______________________________________________
> osg-users mailing list
> osg-users at lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org





More information about the osg-users mailing list