[osg-users] [osg-submissions] Matrixf multiply Optimization

Philip Taylor philipjt at ntlworld.com
Sun Jul 27 03:00:12 PDT 2008


As I understand it, the the pragma alignment only applies to predefined
objects and data.

Dynamically allocated objects are created at the mercy of the memory
allocator (via new) which uses byte alignment, as it does not know anything
about pragma definitions.

The only approach is to dynamically create data with extra padding, and then
do some alignment coding, something like:

	char* pBuffer = new char[ (200 * sizeof(int32)) + sizeof(int64) ];
      int32* p = new (pBuffer) int32[ 200 ];

      and then later

      delete [] pBuffer;  // DO NOT delete p
	pBuffer = 0;
	p = 0;

instead of the simpler but unaligned

	int32* p = new int32[ 200 ];

Youw will have to experiment the actual code required, as I am not too
certain, and my brain is still in bed.


PhilT

-----Original Message-----
From: osg-users-bounces at lists.openscenegraph.org
[mailto:osg-users-bounces at lists.openscenegraph.org]On Behalf Of Gordon
Tomlinson
Sent: 27 July 2008 03:05
To: osg-users at lists.openscenegraph.org
Subject: Re: [osg-users] [osg-submissions] Matrixf multiply Optimization



Can you not use an alignment #pragma around the struct to force alignment
size ?


#pragma pack( push, 16 )

 union
 {
    struct
    {
        __m128 _R0,_R1,_R2,_R3;
    };
    value_type _mat[4][4];
 }

#pragma pack( pop )


__________________________________________________________
Gordon Tomlinson
__________________________________________________________


-----Original Message-----
From: osg-submissions-bounces at lists.openscenegraph.org
[mailto:osg-submissions-bounces at lists.openscenegraph.org] On Behalf Of James
Killian
Sent: Saturday, July 26, 2008 7:23 PM
To: OpenSceneGraph Submissions
Subject: Re: [osg-submissions] Matrixf multiply Optimization


That is cool if that is all that needs to be fixed... I'll make a generic
version of F32vec4, and include it next submission to see if it can build on

other platforms.

James Killian
----- Original Message -----
From: "David Guthrie" <davidguthrie at cox.net>
To: "OpenSceneGraph Submissions" <osg-submissions at lists.openscenegraph.org>
Sent: Friday, July 25, 2008 9:07 PM
Subject: Re: [osg-submissions] Matrixf multiply Optimization


>I looked at the code, and it should work cross platform, at least for
>intel CPU's.  the fvec.h header doesn't seem to exist, but from what I  can

>tell, it doesn't have an magic in it.  The few types you used may  be easy
>to just replace.  They seemed just to be unions, anyway.
>
> David
>
> On Jul 25, 2008, at 5:49 PM, James Killian wrote:
>
>>
>> It is good to hold off as this is still work in progress.  In the  mean
>> time
>> what would be cool is for others to code review the work I've  checked in
>> thus far.  If I recall the FFmpeg community has found a way to use
>> intrinsics in a way that is platform independent, once I get the win32
>> version polished I may research that.
>>
>> For anyone interested the C version of the matrix multiply uses 64
>> multiplies and adds, while the SSE version uses only 16 of each.
>>
>> In regards to going in and out of SSE I tried this:
>> union
>> {
>>    struct
>>    {
>>        __m128 _R0,_R1,_R2,_R3;
>>    };
>>    value_type _mat[4][4];
>> }
>>
>> And this works as it forces the array to be 16 byte aligned
>> implicitly...
>> unfortunately I ran into problems where some code was using the  matrix
>> in a
>> vector would throw compiler errors saying it can't align it.  (I may
>> revisit
>> that case and see why that is)
>>
>>
>> What I am hoping will happen is that this new code will work out,  and we

>> can
>> gradually transition some of the most used pieces to take advantage  of
>> the
>> instruction set. (platform independent of course).
>>
>>
>>
>> ----- Original Message -----
>> From: "Robert Osfield" <robert.osfield at gmail.com>
>> To: "OpenSceneGraph Submissions"
>> <osg-submissions at lists.openscenegraph.org
>> >
>> Sent: Friday, July 25, 2008 3:09 PM
>> Subject: Re: [osg-submissions] Matrixf multiply Optimization
>>
>>
>>> Hi James,
>>>
>>> I will put this submission on hold till after 2.6 as we now at  feature
>> freeze.
>>>
>>> W.r.t SSE optimizations, in the past I have consider the possibility,
>>> but haven't taken the step - there's always been bigger bottlenecks  to
>>> address.  One concern I have is the cost of going in and out of SEE
>>> mode.  I suspect the most efficient way to do it would be to provide
>>> array operators.
>>>
>>> It think these type of optimizations would be worth raising on the
>>> mailing lists as there is lot of knowledge out there and whole range
>>> of topics.
>>>
>>> Robert.
>>>
>>> On Fri, Jul 25, 2008 at 8:55 PM, James Killian
>>> <James_Killian at hotmail.com> wrote:
>>>>
>>>> Attached is the 3 matrix cpp files that are merged with 8686.  For
>> non-win32
>>>> platforms there is no change, for win32 platforms I've added SSE
>>>> optimization for Matrix::mult  premult and postmult.  This  currently
>>>> is
>> the
>>>> first draft which will yield about 35-40% improvement over matrixf  or
>>>> matrixd.  I may pursue alignment strategies which have yielded 50%
>>>> improvement (this is yet to come).   I also may want to look to
>>>> improve
>>>> premult.
>>>>
>>>> Our game uses approximately 25% of all processing to these functions
>> (the
>>>> KBDtree optimization is enabled), so if anyone else is doing the  same
>> kind
>>>> of stresses hopefully you should see improvement as well.
>>>>
>>>> There may be a way to enable intrinsic code across all platforms.  if
>>>> so
>> we
>>>> may want to pursue that.
>>>> You should be able to drop these files right in and build. (Win32
>>>> users
>> be
>>>> sure to use matrix float in the cmake configuration).
>>>> I did not try to optimize Matrixd I don't think intrinsics can offer
>> much
>>>> improvement for it (yet). so it has not changed.
>>>>
>>>> _______________________________________________
>>>> osg-submissions mailing list
>>>> osg-submissions at lists.openscenegraph.org
>>>>
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>>>>
>>>>
>>> _______________________________________________
>>> osg-submissions mailing list
>>> osg-submissions at lists.openscenegraph.org
>>>
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>>>
>>
>> _______________________________________________
>> osg-submissions mailing list
>> osg-submissions at lists.openscenegraph.org
>>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>
> _______________________________________________
> osg-submissions mailing list
> osg-submissions at lists.openscenegraph.org
>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org
>

_______________________________________________
osg-submissions mailing list
osg-submissions at lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.
org

_______________________________________________
osg-users mailing list
osg-users at lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org




More information about the osg-users mailing list