This document describes concisely the subset of the amd64 ABI as it is implemented in QBE. The subset can handle correctly arbitrary standard C-like structs containing float and integer types. Structs that have unaligned members are also supported through opaque types, see the IL description document for more information about them.
Data classes of interest as defined by the ABI:
%rdi
%rsi
%rdx
%rcx
%r8
%r9
.
%xmm0
- %xmm7
.
%rbx
, %r12
- %r15
are callee-save.
%rax
and %rdx
in order for INTEGER return
values.
%xmm0
and %xmm1
in order for SSE return values.
%rdi
was a pointer to an
area big enough to fit the return value. The function
writes the return value there and returns the address
(that was in %rdi
) in %rax
.
The ABI is unclear on the alignment requirement of the
stack. What must be ensured is that, right before
executing a 'call' instruction, the stack pointer %rsp
is aligned on 16 bytes. On entry of the called
function, the stack pointer is 8 modulo 16. Since most
functions will have a prelude pushing %rbp
, the frame
pointer, upon entry of the body code of the function is
also aligned on 16 bytes (== 0 mod 16).
Here is a diagram of the stack layout after a call from g() to f().
| | | g() locals | +-------------+ ^ | | \ | | stack arg 2 | ' | |xxxxxxxxxxxxx| | f()'s MEMORY growing | +-------------+ | arguments addresses | | stack arg 1 | , | |xxxxxxxxxxxxx| / | +-------------+ -> 0 mod 16 | | ret addr | +-------------+ | saved %rbp | +-------------+ -> f()'s %rbp | f() locals | 0 mod 16 | ... | -> %rsp
Legend:
xxxxx
Optional padding.
%rax
, %rdx
are used, or %xmm0
,
%xmm1
, or finally %rax
, %xmm0
. The last case
happens when a struct is returned with one half
classified as INTEGER and the other as SSE. This
is a consequence of the Returning section above.
The size of the arguments area of the stack needs to be computed first, then arguments are packed starting from the bottom of the argument area, respecting alignment constraints. The ABI mentions "pushing" arguments in right-to-left order, but I think it's a mistaken view because of the alignment constraints.
Example: If three 8 bytes MEMORY arguments are passed to the callee and the caller's stack pointer is 16 bytes algined, the layout will be like this.
+-------------+ |xxxxxxxxxxxxx| padding | stack arg 3 | | stack arg 2 | | stack arg 1 | +-------------+ -> 0 mod 16
The padding must not be at the end of the stack area. A "pushing" logic would put it at the end.