System V ABI AMD64

This document describes concisely the subset of the amd64 ABI as it is implemented in QBE. The subset can handle correctly arbitrary standard C-like structs containing float and integer types. Structs that have unaligned members are also supported through opaque types, see the IL description document for more information about them.

ABI Subset Implemented

Data classes of interest as defined by the ABI:

INTEGER
SSE
MEMORY

Classification

The size of each argument gets rounded up to eightbytes. (It keeps the stack always 8 bytes aligned.)
_Bool, char, short, int, long, long long and pointers are in the INTEGER class. In the context of QBE, it means that 'l' and 'w' are in the INTEGER class.
float and double are in the SSE class. In the context of QBE, it means that 's' and 'd' are in the SSE class.
If the size of an object is larger than two eightbytes or if contains unaligned fields, it has class MEMORY. In the context of QBE, those are big aggregate types and opaque types.
Otherwise, recursively classify fields and determine the class of the two eightbytes using the classes of their components. If any is INTEGER the result is INTEGER, otherwise the result is SSE.

Passing

Classify arguments in order.
INTEGER arguments use in order %rdi %rsi %rdx %rcx %r8 %r9.
SSE arguments use in order %xmm0 - %xmm7.
MEMORY gets passed on the stack. They are "pushed" in the right-to-left order, so from the callee's point of view, the left-most argument appears first on the stack.
When we run out of registers for an aggregate, revert the assignment for the first eightbytes and pass it on the stack.
When all registers are taken, write arguments on the stack from right to left.
When calling a variadic function, %al stores the number of vector registers used to pass arguments (it must be an upper bound and does not have to be exact).
Registers %rbx, %r12 - %r15 are callee-save.

Returning

Classify the return type.
Use %rax and %rdx in order for INTEGER return values.
Use %xmm0 and %xmm1 in order for SSE return values.
If the return value's class is MEMORY, the first argument of the function %rdi was a pointer to an area big enough to fit the return value. The function writes the return value there and returns the address (that was in %rdi) in %rax.

Alignment on the Stack

The ABI is unclear on the alignment requirement of the stack. What must be ensured is that, right before executing a 'call' instruction, the stack pointer %rsp is aligned on 16 bytes. On entry of the called function, the stack pointer is 8 modulo 16. Since most functions will have a prelude pushing %rbp, the frame pointer, upon entry of the body code of the function is also aligned on 16 bytes (== 0 mod 16).

Here is a diagram of the stack layout after a call from g() to f().

               |             |
               | g() locals  |
               +-------------+
          ^    |             | \
          |    | stack arg 2 |  '
          |    |xxxxxxxxxxxxx|  | f()'s MEMORY
 growing  |    +-------------+  |  arguments
addresses |    | stack arg 1 |  ,
          |    |xxxxxxxxxxxxx| /
          |    +-------------+ -> 0 mod 16
          |    |   ret addr  |
               +-------------+
               |  saved %rbp |
               +-------------+ -> f()'s %rbp
               | f() locals  |    0 mod 16
               |   ...       |
                               -> %rsp

Legend:

xxxxx Optional padding.

Remarks

A struct can be returned in registers in one of three ways. Either %rax, %rdx are used, or %xmm0, %xmm1, or finally %rax, %xmm0. The last case happens when a struct is returned with one half classified as INTEGER and the other as SSE. This is a consequence of the Returning section above.
The size of the arguments area of the stack needs to be computed first, then arguments are packed starting from the bottom of the argument area, respecting alignment constraints. The ABI mentions "pushing" arguments in right-to-left order, but I think it's a mistaken view because of the alignment constraints.

Example: If three 8 bytes MEMORY arguments are passed to the callee and the caller's stack pointer is 16 bytes algined, the layout will be like this.
```
           +-------------+
           |xxxxxxxxxxxxx|    padding
           | stack arg 3 |
           | stack arg 2 |
           | stack arg 1 |
           +-------------+ -> 0 mod 16
```
The padding must not be at the end of the stack area. A "pushing" logic would put it at the end.