This is a tiny C compiler, I wrote it in a very restricted dialect of OCaml. It is in the same family as otcc but will work for the x86_64 architecture. My medium term goal is to port it into the C dialect it compiles, so I can bootstrap the thing. It will generate dynamically linked ELF files for the 64bits architecture x86_64 (AMD64 or IA-32e). It has no intermediate language and generates code as it parses the C source file. This makes it very fast, unsafe and unoptimizing. Code generated sucks, no doubt, but the compiler is very short and understandable.
Unless your code is dead wrong, QCC should remain very
silent and produce an output file. In its current state, it
will happily compile the expression
1 = 2,
depending on the context, the semantics will differ.
I wrote it to learn x86 64bits machine code and ELF files with dynamic linking. It is very easily extensible and hackable, you are welcome to add your own set of improvements.
I used the architecture one can find in otcc, i.e. the dumbest possible. Just parse while emiting code. You can find a small and quite readable example of this technique in TempleOS' mini compiler; reading it will help you understand how qcc works.
One neat trick I pulled from otcc is the symbol patching
mechanism. Namely, when we don't know the value of a given
symbol yet while emitting code we use the bits dedicated to
this value in the machine code to store a linked list of all
locations to be patched with the given symbol value. See the
code of the
patch function for details about
What makes compiling for 64bits architecture harder is that integer are 32bits while pointers are 64bits. That is really bad if we want to compile properly because this implies the need to keep track of types. Fortunately, we don't want to compile properly so I made integers 64bits wide. This does not cause much trouble with the ABI.
The layout of produced ELF files includes a single code segment. It will contain global variables and string literals at the beginning and code just after. Yes, data is in the code section. Because we don't know how many global variables the program will use we patch memory accesses with the right addresses only when we read the whole input file. Unresolved symbols will be handled by the dynamic linker.
The handling of lvalues is a bit cleaner than in otcc
since I wanted the parsing to be smooth. This allows proper
post-increment and pre-increment treatment. My solution
makes use of a fun system of late patching of the generated
patchlval for details. A friend of
mine called this speculative compilation.