QBE
compiler backend
QBE 1.3 took a while to cook, but it is the most significant release since 1.0 with around 7k new lines of code and 1.5k deleted ones. In addition to the usual bug fixes, QBE gained a new and original IL matching algorithm, new optimizations from Roland Paterson-Jones, Scott Graham added support for the Windows ABI, and I implemented a plan suggested by Michael Forney to have QBE produce position-independent code (as in shared objects). QBE is teamwork, and I am happy to thank all the contributors of this release. In the rest of the notes we will take a closer look at some of the meat of the release.
Every once in a while, the QBE fly stings an outstanding programmer.
This time, Roland was the victim! Roland suggested we look at the
coremark benchmark
to give us a concrete but simple playground to optimize. Initial
measurements with qbe-1.2 revealed that we were far behind
our “70% of gcc -O2” goal, closer to 40%.
We decided to address this for the 1.3 release.
An early inspection of profiling data revealed that the performance
gap boiled down in large part to a good treatment of two functions:
ee_isdigit and
crcu8,
both using questionably idiomatic C — for instance, functions
like the first one are usually macros or inlined textually, use
&& instead of & and skip the
superfluous ternary operator; as for CRC, it is best implemented
using a pre-computed table. This observation was a bit disappointing
because, aside from QBE's lack of inlining, it did not point us to a
general source of overhead that would generalize to other
compilation loads. On the other hand, it is expected that CPU-bound
code spends a majority of time in compact code sections.
Nonetheless, we implemented numerous optimizations (GVN/GCM, loop
optimization, if-elimination, CFG simplification, …) that
we could try on both coremark and more realistic usages like the
Hare test suite. In the end, we
decided to keep only a subset of vetted passes and now score more
than 63% of the performance of commercial compilers on vanilla
coremark. Notably, we excluded inlining from the optimizations set
to postpone solving its incompatibility with the streaming per-function
compilation model of QBE. Modifying the coremark benchmark to inline
the ee_isdigit function and use a simpler branch-free
implementation of crcu8
makes QBE reach its 70% goal. The new optimizations should also
benefit Hare users: I measured a 33% improvement on the Hare
test suite against qbe-1.2 (1.7s vs 2.6s).
Since its early days, QBE used a bottom-up tree-numbering algorithm inspired by Ken Thompson's Plan9 C compiler (see 5.5. Addressability). The algorithm is fairly generic but has some subtleties to deal elegantly with associativity and commutativity of arithmetic operators. It has been a long-standing goal of mine to implement a metaprogramming solution to this problem. QBE 1.3 sees the realization of this goal.
A new OCaml tool called mgen is used to compile lispy
IL patterns into idiomatic C code that matches them. The mgen
tool will look for special comment blocks containing IL patterns and
inline the matching C code right below these blocks. The generated
C code is designed to look idiomatic in qbe and works similarly to
the hand-written logic pre 1.3. See the
isel.c
file for the current use of mgen.
In more details, instructions DAGs are matched by following a numbering
approach like in Ken Thompson's compiler. Then, mgen
associates each number with a bitset indicating which of the toplevel
user patterns are matched by the current IL node (temporary); the most
suitable pattern can then be selected by handwritten logic. Patterns
can include variables which can be collected that running a matcher
program. These programs are also generated by mgen in a
simple bytecode language which the
runmatch()
function can interpret.
I expect that in the future mgen is used to simplify
instruction selection in more backends and maybe even to recognize
IL patterns such as bit rotations in optimization passes.
For the 1.3 release, QBE also stung Scott Graham.
What a spree! Scott generously upstreamed his implementation of the Windows ABI,
originally found in a fun derivative
work. The assembly generated by QBE remains AT&T syntax and is best
compiled by the mingw assembler, although I have not tried it myself on Windows.
Compiling for Windows is now as simple as passing -t amd64_win
to QBE.
Last but not least, QBE improved its support for position-independent
code and is now able to link smoothly with and even produce shared objects on
most targets. The main blocker up to now has been the lack of support for
indirect access to globals (e.g., global offset table on ELF). This is now
possible at the level of IL through the support of a new extern
“dynamic constant” flag (DYNCONST in the IL spec).
For example, to access a variable dlvar from a dynamically-linked
library, one would use
function w $load() {
@start
%v =w load extern $dlvar
ret %v
}
And, in case you wonder, we use the oxymoron “dynamic constant” to speak about address symbols (constant through execution) that can only be known at runtime (dynamic) because they are allocated by either the runtime or the dynamic linker rather than by the regular linking phase of compilation.
Thanks for reading this far and happy hacking.