*---------------------------------------------------------------* Quick Guide to AT&T Inline Assembly (AT&T is the UNIX standard) *---------------------------------------------------------------* Register names are prefixed with "%". To reference eax: AT&T: %eax Intel: eax Source/Destination Ordering: The source is always on the left and the destination is always on the right. AT&T: movl %eax, %ebx Intel: mov ebx, eax Constant value/immediate value format: Prefix all constant/immediate values with "$". # load eax with contents of static C variable thing1 AT&T: movl $_thing1, %eax Intel: mov eax, _thing1 # load ebx with 0xf00d: AT&T: movl $0xf00d, %ebx Intel: mov ebx, f00dh Operator size specification: Suffix instruction with b (byte), w (word), or l (longword) to specify width of destination register; if omitted , GAS (GNU assembler) will guess AT&T: movw %ax, %bx Intel: mov bx, ax In Intel mov b,w,l is indicated by byte ptr, word ptr, and dword ptr. Canonical format for 32-bit addressing: AT&T: immed32(baseptr,indexptr,indexscale) Intel: [baseptr + indexptr*indexscale + immed32] Formula to calculate the address: immed32+baseptr+indexptr*indexscale Note: all fields are not necessary but you do have to have at least 1 of immed32, baseptr and you MUST add the size suffix to the operator. Use underscore to address a static C variable: AT&T: _thing1 Intel: [_thing1] The underscore ("_") is how to directly reference static (global) C variables from assembler. Use extended asm to have variables preloaded into registers. Addressing what a register points to: AT&T: (%eax) Intel: [eax] Addressing a variable offset by a value in a register: AT&T: _variable(%eax) Intel: [eax + _variable] Addressing a value in an array of integers (scaling up by 4): AT&T: _array(,%eax,4) Intel: [eax*4 + array] Calculating an offset with an immediate value: C code: *(p+1) where p is a char * AT&T: 1(%eax) where eax has the value of p Intel: [eax + 1] You can do simple math on an immediate value: AT&T: _myptr+8 *--------------------------* ( BASIC INLINE ASSEMBLY ) *--------------------------* asm ("statements"); OR __asm__("statements); /* push a register onto stack, use it, pop to restore it */ asm ("pushl %eax\n\t" "movl $0, %eax\n\t" "popl %eax"); Note: the \n's and \t's makes the .s file generated by GCC easy to read. /* push a register before using it unless you want to clobber it: asm ("movl %eax, %ebx"); # clobber ebx asm ("xorl %ebx, %edx"); # clobber edx asm ("movl $0, _thing1"); # clobber thing1 *-----------------------------* ( Extended Inline Assembly ) *-----------------------------* asm ("statements" : output_registers : input_registers : clobbered_registers); ----------------------- Register loading codes ----------------------- a eax b ebx c ecx d edx S esi D edi I constant value (0 to 31) q,r dynamically allocated register* g eax, ebx, ecx, edx or variable in memory A eax and edx combined into a 64-bit integer (long long) * Use of "q" allocates from eax, ebx, ecx, and edx; Use of "r" adds esi,edi /* Example 1 */ asm ("cld\n\t" /* these 3 instructions clear direction bit of flags register */ "rep\n\t" "stosl" : /* no output registers */ : "c" (count), "a" (mystuff), "D" (result) /* Load ecx with count; eax with mystuff, edi with result */ : "%ecx", "%edi" ); /* clobber list */ You cannot directly refer to byte registers (ah, al, etc.) or to word registers (ax, bx, etc.) when loading. Once in the register you can specify ax, al, etc. Register codes must be in double quotes and variables to load in parentheses. In the clobber list specify the registers with %. If you write to a variable include "memory" in the clobbered list in case you write to a variable that gcc thought it had in a register. This is the same as clobbering all registers. Forcing gcc to use specific registers (as above) can be inefficient if gcc must stash those registers first. Generally better to let gcc pick the register. /* load effective address is in form (baseptr,indexptr,indexscale) * the address is computed as: baseptr + indexptr * indexscale */ asm ("leal (%1,%1,4), %0" /* quick and dirty x = x*5 : "=r" (x) /* let gcc pick the output register */ : "0" (x) ); /* do this if you wish to reuse a register The above codes performs x * 5 in 1 cycle on the Pentium and gcc picks the available register. Unless we really need a specific register; e.g., rep movsl or rep stosl, which must use ecx, edi, and esi; let gcc pick it. The %n tokens are allocated to the arguments in a first-come-first-served, left -to-right mapping to the "q"'s and "r"'s. To reuse a register allocated with a "q" or "r", you use "0", "1", "2"... etc. /* example 2: z = x * 5 */ int x; int z; asm ("leal (%1,%1,4), %0" : "=r" (z) /* specifies an output register : "r" (x) ); If you want 1 variable to stay in the same register for in and out you have to respecify the register allocated to it on the way in with the "0" type codes as in code above. asm ("leal (%0,%0,4), %0" : "=r" (x) : "0" (x) ); This also works: asm ("leal (%%ebx,%%ebx,4), %%ebx" /*preface registers with %% to help parser : "=b" (x) * differentiate between %ebx and %0 */ : "b" (x) ); Note that ebx is not on the clobberlist since gcc knows it goes into x; since gcc can know the value of ebx, it isn't considered clobbered. Use keyword volatile to force gcc to leave your assembly statement exactly where you put to (sometimes gcc moves things out of a loop for optimization). __asm__ __volatile__ (...whatever...); Generally you should let gcc perform optimization. -------------- Some Examples -------------- #define disable() __asm__ __volatile__ ("cli"); /* disables all interrupts */ #define enable() __asm__ __volatile__ ("sti"); Of course, libc has these defined too. #define times3(arg1, arg2) \ __asm__ ( \ "leal (%0,%0,2),%0" \ : "=r" (arg2) \ : "0" (arg1) ); #define times5(arg1, arg2) \ __asm__ ( \ "leal (%0,%0,4),%0" \ : "=r" (arg2) \ : "0" (arg1) ); #define times9(arg1, arg2) \ __asm__ ( \ "leal (%0,%0,8),%0" \ : "=r" (arg2) \ : "0" (arg1) ); The above multiplies arg1 by 3, 5, or 9 and puts product in arg2; e.g.,: times5(x,x); #define rep_movsl(src, dest, numwords) \ __asm__ __volatile__ ( \ "cld\n\t" \ "rep\n\t" \ "movsl" \ : : "S" (src), "D" (dest), "c" (numwords) \ : "%ecx", "%esi", "%edi" ) Hint: If you say memcpy() with a constant length parameter, GCC will inline it to a rep movsl like above. But if you need a variable length version that inlines and you're always moving dwords. #define rep_stosl(value, dest, numwords) \ __asm__ __volatile__ ( \ "cld\n\t" \ "rep\n\t" \ "stosl" \ : : "a" (value), "D" (dest), "c" (numwords) \ : "%ecx", "%edi" ) Reads the TimeStampCounter on Pentium and puts the 64 bit result into llptr. #define RDTSC(llptr) ({ \ __asm__ __volatile__ ( \ ".byte 0x0f; .byte 0x31" \ : "=A" (llptr) \ : : "eax", "edx"); }) Additional Notes. Intel real-mode addressing has restrictions on which register has what default segment, which registers can be base or index ptrs. The ebp (base ptr register) can be used as a GPR just make sure you restore it yourself or compile with -fomit-frame-pointer.