[osg-users] [osg-submissions] Matrixf multiply Optimization

Gordon Tomlinson gordon at gordon-tomlinson.com
Sat Jul 26 19:05:29 PDT 2008


Can you not use an alignment #pragma around the struct to force alignment
size ?


#pragma pack( push, 16 )

 union
 {
    struct
    {
        __m128 _R0,_R1,_R2,_R3;
    };
    value_type _mat[4][4];
 }

#pragma pack( pop )


__________________________________________________________
Gordon Tomlinson 
__________________________________________________________


-----Original Message-----
From: osg-submissions-bounces at lists.openscenegraph.org
[mailto:osg-submissions-bounces at lists.openscenegraph.org] On Behalf Of James
Killian
Sent: Saturday, July 26, 2008 7:23 PM
To: OpenSceneGraph Submissions
Subject: Re: [osg-submissions] Matrixf multiply Optimization


That is cool if that is all that needs to be fixed... I'll make a generic 
version of F32vec4, and include it next submission to see if it can build on

other platforms.

James Killian
----- Original Message ----- 
From: "David Guthrie" <davidguthrie at cox.net>
To: "OpenSceneGraph Submissions" <osg-submissions at lists.openscenegraph.org>
Sent: Friday, July 25, 2008 9:07 PM
Subject: Re: [osg-submissions] Matrixf multiply Optimization


>I looked at the code, and it should work cross platform, at least for 
>intel CPU's.  the fvec.h header doesn't seem to exist, but from what I  can

>tell, it doesn't have an magic in it.  The few types you used may  be easy 
>to just replace.  They seemed just to be unions, anyway.
>
> David
>
> On Jul 25, 2008, at 5:49 PM, James Killian wrote:
>
>>
>> It is good to hold off as this is still work in progress.  In the  mean 
>> time
>> what would be cool is for others to code review the work I've  checked in
>> thus far.  If I recall the FFmpeg community has found a way to use
>> intrinsics in a way that is platform independent, once I get the win32
>> version polished I may research that.
>>
>> For anyone interested the C version of the matrix multiply uses 64
>> multiplies and adds, while the SSE version uses only 16 of each.
>>
>> In regards to going in and out of SSE I tried this:
>> union
>> {
>>    struct
>>    {
>>        __m128 _R0,_R1,_R2,_R3;
>>    };
>>    value_type _mat[4][4];
>> }
>>
>> And this works as it forces the array to be 16 byte aligned 
>> implicitly...
>> unfortunately I ran into problems where some code was using the  matrix 
>> in a
>> vector would throw compiler errors saying it can't align it.  (I may 
>> revisit
>> that case and see why that is)
>>
>>
>> What I am hoping will happen is that this new code will work out,  and we

>> can
>> gradually transition some of the most used pieces to take advantage  of 
>> the
>> instruction set. (platform independent of course).
>>
>>
>>
>> ----- Original Message -----
>> From: "Robert Osfield" <robert.osfield at gmail.com>
>> To: "OpenSceneGraph Submissions" 
>> <osg-submissions at lists.openscenegraph.org
>> >
>> Sent: Friday, July 25, 2008 3:09 PM
>> Subject: Re: [osg-submissions] Matrixf multiply Optimization
>>
>>
>>> Hi James,
>>>
>>> I will put this submission on hold till after 2.6 as we now at  feature
>> freeze.
>>>
>>> W.r.t SSE optimizations, in the past I have consider the possibility,
>>> but haven't taken the step - there's always been bigger bottlenecks  to
>>> address.  One concern I have is the cost of going in and out of SEE
>>> mode.  I suspect the most efficient way to do it would be to provide
>>> array operators.
>>>
>>> It think these type of optimizations would be worth raising on the
>>> mailing lists as there is lot of knowledge out there and whole range
>>> of topics.
>>>
>>> Robert.
>>>
>>> On Fri, Jul 25, 2008 at 8:55 PM, James Killian
>>> <James_Killian at hotmail.com> wrote:
>>>>
>>>> Attached is the 3 matrix cpp files that are merged with 8686.  For
>> non-win32
>>>> platforms there is no change, for win32 platforms I've added SSE
>>>> optimization for Matrix::mult  premult and postmult.  This  currently 
>>>> is
>> the
>>>> first draft which will yield about 35-40% improvement over matrixf  or
>>>> matrixd.  I may pursue alignment strategies which have yielded 50%
>>>> improvement (this is yet to come).   I also may want to look to 
>>>> improve
>>>> premult.
>>>>
>>>> Our game uses approximately 25% of all processing to these functions
>> (the
>>>> KBDtree optimization is enabled), so if anyone else is doing the  same
>> kind
>>>> of stresses hopefully you should see improvement as well.
>>>>
>>>> There may be a way to enable intrinsic code across all platforms.  if 
>>>> so
>> we
>>>> may want to pursue that.
>>>> You should be able to drop these files right in and build. (Win32 
>>>> users
>> be
>>>> sure to use matrix float in the cmake configuration).
>>>> I did not try to optimize Matrixd I don't think intrinsics can offer
>> much
>>>> improvement for it (yet). so it has not changed.
>>>>
>>>> _______________________________________________
>>>> osg-submissions mailing list
>>>> osg-submissions at lists.openscenegraph.org
>>>>
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>>>>
>>>>
>>> _______________________________________________
>>> osg-submissions mailing list
>>> osg-submissions at lists.openscenegraph.org
>>>
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>>>
>>
>> _______________________________________________
>> osg-submissions mailing list
>> osg-submissions at lists.openscenegraph.org
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>
> _______________________________________________
> osg-submissions mailing list
> osg-submissions at lists.openscenegraph.org
>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
> 

_______________________________________________
osg-submissions mailing list
osg-submissions at lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org




More information about the osg-users mailing list