This document describes concisely the subset of the amd64 ABI as it is implemented in QBE. The subset can handle correctly arbitrary standard C-like structs containing float and integer types. Structs that have unaligned members are also supported through opaque types, see the IL description document for more information about them.
Data classes of interest as defined by the ABI:
%rdi %rsi %rdx
%rcx %r8 %r9.
%xmm0 - %xmm7.
%rbx, %r12 - %r15 are callee-save.
%rax and %rdx in order for INTEGER return
values.
%xmm0 and %xmm1 in order for SSE return values.
%rdi was a pointer to an
area big enough to fit the return value. The function
writes the return value there and returns the address
(that was in %rdi) in %rax.
The ABI is unclear on the alignment requirement of the
stack. What must be ensured is that, right before
executing a 'call' instruction, the stack pointer %rsp
is aligned on 16 bytes. On entry of the called
function, the stack pointer is 8 modulo 16. Since most
functions will have a prelude pushing %rbp, the frame
pointer, upon entry of the body code of the function is
also aligned on 16 bytes (== 0 mod 16).
Here is a diagram of the stack layout after a call from g() to f().
| |
| g() locals |
+-------------+
^ | | \
| | stack arg 2 | '
| |xxxxxxxxxxxxx| | f()'s MEMORY
growing | +-------------+ | arguments
addresses | | stack arg 1 | ,
| |xxxxxxxxxxxxx| /
| +-------------+ -> 0 mod 16
| | ret addr |
+-------------+
| saved %rbp |
+-------------+ -> f()'s %rbp
| f() locals | 0 mod 16
| ... |
-> %rsp
Legend:
xxxxx Optional padding.
%rax, %rdx are used, or %xmm0,
%xmm1, or finally %rax, %xmm0. The last case
happens when a struct is returned with one half
classified as INTEGER and the other as SSE. This
is a consequence of the Returning section above.
The size of the arguments area of the stack needs to be computed first, then arguments are packed starting from the bottom of the argument area, respecting alignment constraints. The ABI mentions "pushing" arguments in right-to-left order, but I think it's a mistaken view because of the alignment constraints.
Example: If three 8 bytes MEMORY arguments are passed to the callee and the caller's stack pointer is 16 bytes algined, the layout will be like this.
+-------------+
|xxxxxxxxxxxxx| padding
| stack arg 3 |
| stack arg 2 |
| stack arg 1 |
+-------------+ -> 0 mod 16
The padding must not be at the end of the stack area. A "pushing" logic would put it at the end.