[osg-users] [osg-submissions] Matrixf multiply Optimization
Gordon Tomlinson
gordon at gordon-tomlinson.com
Sat Jul 26 19:05:29 PDT 2008
Can you not use an alignment #pragma around the struct to force alignment
size ?
#pragma pack( push, 16 )
union
{
struct
{
__m128 _R0,_R1,_R2,_R3;
};
value_type _mat[4][4];
}
#pragma pack( pop )
__________________________________________________________
Gordon Tomlinson
__________________________________________________________
-----Original Message-----
From: osg-submissions-bounces at lists.openscenegraph.org
[mailto:osg-submissions-bounces at lists.openscenegraph.org] On Behalf Of James
Killian
Sent: Saturday, July 26, 2008 7:23 PM
To: OpenSceneGraph Submissions
Subject: Re: [osg-submissions] Matrixf multiply Optimization
That is cool if that is all that needs to be fixed... I'll make a generic
version of F32vec4, and include it next submission to see if it can build on
other platforms.
James Killian
----- Original Message -----
From: "David Guthrie" <davidguthrie at cox.net>
To: "OpenSceneGraph Submissions" <osg-submissions at lists.openscenegraph.org>
Sent: Friday, July 25, 2008 9:07 PM
Subject: Re: [osg-submissions] Matrixf multiply Optimization
>I looked at the code, and it should work cross platform, at least for
>intel CPU's. the fvec.h header doesn't seem to exist, but from what I can
>tell, it doesn't have an magic in it. The few types you used may be easy
>to just replace. They seemed just to be unions, anyway.
>
> David
>
> On Jul 25, 2008, at 5:49 PM, James Killian wrote:
>
>>
>> It is good to hold off as this is still work in progress. In the mean
>> time
>> what would be cool is for others to code review the work I've checked in
>> thus far. If I recall the FFmpeg community has found a way to use
>> intrinsics in a way that is platform independent, once I get the win32
>> version polished I may research that.
>>
>> For anyone interested the C version of the matrix multiply uses 64
>> multiplies and adds, while the SSE version uses only 16 of each.
>>
>> In regards to going in and out of SSE I tried this:
>> union
>> {
>> struct
>> {
>> __m128 _R0,_R1,_R2,_R3;
>> };
>> value_type _mat[4][4];
>> }
>>
>> And this works as it forces the array to be 16 byte aligned
>> implicitly...
>> unfortunately I ran into problems where some code was using the matrix
>> in a
>> vector would throw compiler errors saying it can't align it. (I may
>> revisit
>> that case and see why that is)
>>
>>
>> What I am hoping will happen is that this new code will work out, and we
>> can
>> gradually transition some of the most used pieces to take advantage of
>> the
>> instruction set. (platform independent of course).
>>
>>
>>
>> ----- Original Message -----
>> From: "Robert Osfield" <robert.osfield at gmail.com>
>> To: "OpenSceneGraph Submissions"
>> <osg-submissions at lists.openscenegraph.org
>> >
>> Sent: Friday, July 25, 2008 3:09 PM
>> Subject: Re: [osg-submissions] Matrixf multiply Optimization
>>
>>
>>> Hi James,
>>>
>>> I will put this submission on hold till after 2.6 as we now at feature
>> freeze.
>>>
>>> W.r.t SSE optimizations, in the past I have consider the possibility,
>>> but haven't taken the step - there's always been bigger bottlenecks to
>>> address. One concern I have is the cost of going in and out of SEE
>>> mode. I suspect the most efficient way to do it would be to provide
>>> array operators.
>>>
>>> It think these type of optimizations would be worth raising on the
>>> mailing lists as there is lot of knowledge out there and whole range
>>> of topics.
>>>
>>> Robert.
>>>
>>> On Fri, Jul 25, 2008 at 8:55 PM, James Killian
>>> <James_Killian at hotmail.com> wrote:
>>>>
>>>> Attached is the 3 matrix cpp files that are merged with 8686. For
>> non-win32
>>>> platforms there is no change, for win32 platforms I've added SSE
>>>> optimization for Matrix::mult premult and postmult. This currently
>>>> is
>> the
>>>> first draft which will yield about 35-40% improvement over matrixf or
>>>> matrixd. I may pursue alignment strategies which have yielded 50%
>>>> improvement (this is yet to come). I also may want to look to
>>>> improve
>>>> premult.
>>>>
>>>> Our game uses approximately 25% of all processing to these functions
>> (the
>>>> KBDtree optimization is enabled), so if anyone else is doing the same
>> kind
>>>> of stresses hopefully you should see improvement as well.
>>>>
>>>> There may be a way to enable intrinsic code across all platforms. if
>>>> so
>> we
>>>> may want to pursue that.
>>>> You should be able to drop these files right in and build. (Win32
>>>> users
>> be
>>>> sure to use matrix float in the cmake configuration).
>>>> I did not try to optimize Matrixd I don't think intrinsics can offer
>> much
>>>> improvement for it (yet). so it has not changed.
>>>>
>>>> _______________________________________________
>>>> osg-submissions mailing list
>>>> osg-submissions at lists.openscenegraph.org
>>>>
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>>>>
>>>>
>>> _______________________________________________
>>> osg-submissions mailing list
>>> osg-submissions at lists.openscenegraph.org
>>>
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>>>
>>
>> _______________________________________________
>> osg-submissions mailing list
>> osg-submissions at lists.openscenegraph.org
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>
> _______________________________________________
> osg-submissions mailing list
> osg-submissions at lists.openscenegraph.org
>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>
_______________________________________________
osg-submissions mailing list
osg-submissions at lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
More information about the osg-users
mailing list