Homesource Forums

Homeworld Source Editing Talk
It is currently Fri Sep 10, 2010 3:33 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: Proposed assembly changes
PostPosted: Sat Jan 16, 2010 12:21 am 
Offline
coder

Joined: Wed Oct 01, 2008 2:55 pm
Posts: 102
Location: Michigan
OS X requires position independent assembly code (this should be the same as compiling with the -fPIC flag) This means the register ebx must be saved and restored before being used. After looking it over I can change most of the homeworld assembly code to meet this requirement. rather than ifdefing the modified code to only be used by os x, I would rather just change it as the changes should not effect linux. I will post diffs here before committing so that people can point out problems. In the end the assembly should optimize better, as Position independent code supposedly does. Comments are appreciated. Heres the first diff.

Code:
Index: /Users/axcess/Documents/Xcode Projects/trunk/src/Game/Matrix.c
===================================================================
--- /Users/axcess/Documents/Xcode Projects/trunk/src/Game/Matrix.c   (revision 695)
+++ /Users/axcess/Documents/Xcode Projects/trunk/src/Game/Matrix.c   (working copy)
@@ -445,7 +445,7 @@
         pop       esi
         pop       edi
     }
-#elif defined (__GNUC__) && defined (__i386__) && !defined (_MACOSX_FIX_86)
+#elif defined (__GNUC__) && defined (__i386__) && !defined (_MACOSX_86)
/* This block of code is the modified version of the code above.
  * It was safe to use upto gcc 4.1, but seems to generate a
  * problem once we use -O2 with gcc 4.3  */
@@ -457,7 +457,11 @@
     matrix matResult[]={{0,0,0, 0,0,0, 0,0,0}};
 
     __asm__ __volatile__ (
+      "    pushl     %%edi\n"
+      "    pushl      %%esi\n"
         "    pushl     %%ebp\n"
+      "    pushl      %%ebx\n"
+      "    movl      %0, %%ebx\n"
         "    movl      $-3, %%eax\n"
         "    jmp       mat_x_mat_l1%=\n"
         "    .align    4\n"
@@ -501,14 +505,17 @@
         "    incl      %%eax\n"
         "    jne       mat_x_mat_l2%=\n"
         "    fstps     32(%%ebx, %%eax, "FSIZE_STR")\n"
+      "    popl      %%ebx\n"
         "    popl      %%ebp\n"
+      "    popl      %%esi\n"
+      "    popl      %%edi\n"
         :
-        : "b" (matResult), "c" (first), "d" (second)
-        : "eax", "edi", "esi" );
+        : "r" (matResult), "c" (first), "d" (second)
+        : "eax");

     memcpy(result, matResult, sizeof(struct matrix));

-#elif defined (__GNUC__) && defined (_X86_64)
+#elif defined (__GNUC__) && defined (_X86_64) || defined (_MACOSX_86)
/* This is the AMD64 version of the above code but using the
  * xmm 128-bit SSE registers. It looks longer but should be a
  * lot quicker as most of the operations are in parallel.
@@ -724,8 +731,10 @@
         pop     edi
         pop     esi
     }
-#elif defined (__GNUC__) && defined (__i386__) && !defined (_MACOSX_FIX_86)
+#elif defined (__GNUC__) && defined (__i386__)
     __asm__ __volatile__ (
+      "    pushl    "DEST"\n"
+      "    movl    %1, "DEST"\n"
         "    flds    0*"FSIZE_STR"("SOURCE")\n"               /*s0*/
         "    fmuls   (0+0*3)*"FSIZE_STR"("MATRIX")\n"         /*a0*/
         "    flds    1*"FSIZE_STR"("SOURCE")\n"               /*s1 a0*/
@@ -771,8 +780,9 @@
         "    fxch    %%st(1)\n"                               /*d0 d2*/
         "    fstps   0*"FSIZE_STR"("DEST")\n"                 /*d2*/
         "    fstps   2*"FSIZE_STR"("DEST")\n"
+      "    popl    "DEST"\n"
         :
-        : "S" (vector), "b" (result), "D" (matrix) );
+        : "S" (vector), "r" (result), "D" (matrix) );
#else
     result->x = matrixdot(matrix->m11,matrix->m12,matrix->m13,vector->x,vector->y,vector->z);
     result->y = matrixdot(matrix->m21,matrix->m22,matrix->m23,vector->x,vector->y,vector->z);
@@ -855,8 +865,10 @@
         pop     edi
         pop     esi
     }
-#elif defined (__GNUC__) && defined (__i386__) && !defined (_MACOSX_FIX_86)
+#elif defined (__GNUC__) && defined (__i386__)
     __asm__ __volatile__ (
+      "    pushl    "DEST"\n"
+      "    movl    %1,"DEST"\n"
         "    flds    0*"FSIZE_STR"("SOURCE")\n"               /*s0*/
         "    fmuls   (0+0*3)*"FSIZE_STR"("MATRIX")\n"         /*a0*/
         "    flds    1*"FSIZE_STR"("SOURCE")\n"               /*s1 a0*/
@@ -902,8 +914,9 @@
         "    fxch    %%st(1)\n"                               /*d0 d2*/
         "    fstps   0*"FSIZE_STR"("DEST")\n"                 /*d2*/
         "    fstps   2*"FSIZE_STR"("DEST")\n"
+      "    popl    "DEST"\n"
         :
-        : "S" (vector), "b" (result), "D" (matrix) );
+        : "S" (vector), "r" (result), "D" (matrix) );
#else
     result->x = matrixdot(vector->x,vector->y,vector->z,matrix->m11,matrix->m21,matrix->m31);
     result->y = matrixdot(vector->x,vector->y,vector->z,matrix->m12,matrix->m22,matrix->m32);


Top
 Profile  
 
 Post subject: Re: Proposed assembly changes
PostPosted: Sat Jan 16, 2010 10:01 am 
Offline
coder

Joined: Tue Nov 07, 2006 4:40 am
Posts: 231
Hi.

Can you confirm what defines you'd see when compiling for os x for me please? I'm not sufficiently au fait with the os x hardware to try and guess. Are you catering just for the intel version of os x?

I have a couple if problems with how you've re-written some of the code. When you've used the "r" constraint within the asm you are requesting an undefined general purpose register, so it is very likely it might still try to use the b register. I'm not sure how this will interact with the PIC flags but I would suggest not leaving it to chance.

Does the code I put in for the X86_64 asm work for you? I have a variant using just SSE and not the general purpose registers. To be honest it might be better to look at what's required and see if it is worth modifying the existing code or using C.
I only did the X86_64 as a programming exercise. It works perfectly well with the C code. :)

Aunxx.


Top
 Profile  
 
 Post subject: Re: Proposed assembly changes
PostPosted: Sat Jan 16, 2010 3:42 pm 
Offline
coder

Joined: Wed Oct 01, 2008 2:55 pm
Posts: 102
Location: Michigan
Yes, I am only catering to the intel versions of os x, but as the last point release of os x only worked on intel, the ppc code will matter less and less with the next few years. (mac users are more likely to keep their computers up to date)

aunxx wrote:
I have a couple if problems with how you've re-written some of the code. When you've used the "r" constraint within the asm you are requesting an undefined general purpose register, so it is very likely it might still try to use the b register. I'm not sure how this will interact with the PIC flags but I would suggest not leaving it to chance.


in os x, assembly will not compile if ebx is used, because it is reserved for a special purpose. I assume the "r" constraint does not use ebx when compiled on os x, but when compiled on other platforms, ebx points to the Global Offset Table, thus, the compiler refuses to compile code that specifies the "b" constraint, because it will clobber ebx. (why it cant just save ebx automatically, i dont know)

here is some reference:http://www.greyhat.ch/lab/downloads/pic.html

supposedly compiling on linux with the -fPIC flag on gcc will produce the same results as os x. I have not been able to confirm this.

Yes, your SSE code works, but that only relieves me of the need to fix that one block.

Likewise this is mostly an exercise for me, largely because it bothers me that I am using an x86 platform, and the assembly does not work. I mean if the C was really good enough, why does the assembly remain in the code?

Also, if what I'm reading about PIC is true this should provide some benefit to all x86 platforms, not just os x. However if it really is that much trouble i can forget it. (I've already pretty much finished, I just need to test some more)

edit: Oh, and I'm not sure what you mean by defines, but I'll give it a guess. __GNUC__ is defined as well as _MACOSX, as well as __i386__ or __ppc__ depending on processor.
also, these are defined in Homeworld_prefix.h
Code:
#ifdef _MACOSX
#define GENERIC_ETGCALLFUNCTION
   #define _MACOSX_FIX_ANIM 1
   //#define _MACOSX_FIX_SOUND 1
   #define _MACOSX_FIX_LAN 1
   #define _MACOSX_FIX_GL 1
   #define _MACOSX_FIX_MISC 1
   //#ifndef _MACOSX_FIX_ME
   //   #define _MACOSX_FIX_ME 1
   //#endif
   #ifdef __ppc__
      #define _MACOSX_PPC 1
        #define _MACOSX_FIX_GL 1
      #define _MACOSX_FIX_PPC 1
      #define _MACOSX_FIX_SOUND 1
   #endif
   #ifdef __i386__
      #define _MACOSX_86 1
      #define _MACOSX_FIX_86 1
   #endif
#endif


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group