How hard can you believe in your logic?
Why formal artillery?
I once had to fix a bug in the VIM editor. This is a great piece
of software but it kept crashing when I resized my window while some
options (listchars and cursorline) were enabled. So I downloaded the
latest version, compiled it with debugging information and ran the code
under gdb. The bug was in the redrawing procedure, this function is
around 1k lines of code and has one parameter that basically points to
the whole world. After one afternoon of investigation I finally found
the culprit line and fixed it. Then I reported the bug and my patch
was numbered 7.3.140 by Bram.
Interestingly this patch exactly reverts the one numbered 7.3.116.
I did not know this and could have found out using mercurial to bisect,
but at this time I was not aware of this and just did it the old,
brutal, way.
This raises one simple question, as programmers, how hard can we
trust our code? Clearly the person suggesting the patch 116 did not
think he was breaking anything, but on what ground? The only thing that
mattered was that Bram was convinced that the patch did not introduce
any bugs, but he turned out to be wrong.
My Program
Now, I write a text editor. I do not have crazy ideas but
I want it well behaved and designed so I can dig with pleasure in the code
when I need to introduce a new feature. Recently I had to write
the function to save files. Files are represented internally as buffer
objects, these buffers simply store runes (unicode codepoints) and can be
queried at a given unsigned offset with the buf_get
function.
This query function will return the newline character for offsets that
are out of bounds.
Writing a save function is usually trivial but I wanted it to be
clever enough to remove trailing whitespaces at end of lines and
trailing blank lines at the end of file. It turned out to be a
good programming exercise and I hacked it together using a small
state machine.
static void
puteb(EBuf *eb, FILE *fp)
{
enum { Munching, Spitting } state = Munching;
unsigned munchb = 0, munche = 0, nl = 0;
Rune r;
while (munche < eb->b.limbo) switch (state) {
case Munching:
r = buf_get(&eb->b, munche);
if (r == ' ' || r == '\t' || r == '\n') {
nl += (r == '\n');
munche++;
continue;
}
for (; munchb < munche; munchb++) {
r = buf_get(&eb->b, munchb);
if ((r == ' ' || r == '\t') && nl)
continue;
assert(nl == 0 || r == '\n');
nl -= (r == '\n');
putrune(r, fp);
}
assert(munchb == munche);
assert(nl == 0);
state = Spitting;
continue;
case Spitting:
r = buf_get(&eb->b, munchb);
if (r == ' ' || r == '\t' || r == '\n') {
assert(nl == 0);
munche = munchb;
state = Munching;
continue;
}
putrune(r, fp);
munchb++;
continue;
}
putrune('\n', fp);
}
I heavily used assertions because I know that I often write wrong
code. While my code is often wrong, my assertions are often right,
and it is a great satisfaction when programs crash on assertions
compared to when the output the unhelpful "Segmentation fault" message
and suddenly quit. Good use of assertions spared me long debugging
sessions.
I ran it on several tests and it worked fine. It is pretty good, but I
wanted to go further for this piece of code. And, as E.W. Dijkstra
puts it: Program testing can be a very effective way to show the
presence of bugs, but it is hopelessly inadequate for showing their
absence.
A Specification
One thing that the little story above emphasizes is that arguing
about the correctness of a program with a human (even if he/she
is a trained programmer) might not be enough to get it right.
One alternative I explored is to convince a computer instead. I used
the Coq system to program the above
algorithm, specify it, and prove its correctness.
The first difficulty encountered was translating the informal
specification above into a rigorous mathematical statement. After
some thinking it occured to me that relating the stripped down
output and the buffer contents had to be done incrementally and
backwards, because conditions are always in this form:
- A text filled with invisible characters can be stripped if
it is before the end of file.
- A sequence of spaces can be stripped if it preceeds
a newline.
- Finally, any rune of the text can be conserved in the
output.
These observations can be transcribed precisely in the Coq
inductive definition that follows.
Inductive represents: list Rune -> list Rune -> Prop :=
(* first bullet *)
| re_eos:
∀ s, white s -> nil ⊑ s
(* second bullet *)
| re_eol:
∀ s1 s2 nl sp,
space sp -> is_newline nl -> s1 ⊑ s2 ->
[nl] ++ s1 ⊑ sp ++ [nl] ++ s2
(* third bullet *)
| re_sam:
∀ s1 s2 r, s1 ⊑ s2 -> [r] ++ s1 ⊑ [r] ++ s2.
Using this formal definition I can state that the output
out
of the function represents the buffer text
text
:
Theorem output_represents_text:
out ⊑ text.
The proof of this theorem turned out to be a bit tricky. What was
hard was to relate the state machine machinery of my function with
this ⊑ relation. I basically had to run an induction on blocks of
"solid" and "white" characters. This induction by blocks is not natively
supported by Coq so there is a bit of scaffolding I had to prove (see
the next_block
function in my proof).
But this is only one half of the result that I really wanted. Because,
by applying the third bullet repeatedly, you can see that for any string
s
it is true that s ⊑ s
which means that a
simple while loop dumping the buffer intact would satisfy the previous
theorem. We want to make sure that there is no way we can strip down
the output of our function more, this can also be stated formally in Coq:
Theorem output_cannot_be_shorter:
∀ x, x ⊑ out -> x = out.
So if a string represents the output itself, then it must be the output.
We say that the output is a normal form for the relation
⊑. To prove this theorem I first wrote a simple function
which I thought was detecting normal forms, then proved that it indeed
detected normal forms, and then proved that applying this function to
my output validates it as a normal form.
Feel free to play and execute my proof step
by step using Proof General!
Conclusion & Some limitations
This whole exercise took me around a week. You might very well argue
with me that this is a lot too long for such a small function. But, how
would you proceed to test it? By hand made tests? By fuzzing? And if you
fuzz, how are you going to know the correct output? You would have to
program a similar algorithm and you would also have to make sure that this
one is correct by one mean or another.
The end product of this is that I am now rock solid convinced that this
save code is correct. Even though you might find it funny looking
I am sure it makes its job correctly because I convinced a narrow minded
computer that it was.
Although, what could make the whole thing trip is a loosy specification.
Indeed, if the specification is too weak, for instance only the second
theorem above, a valid save function could be the one that writes an empty
file regardless of the buffer contents! So we have to keep in mind that
proofs on programs are only as strong as the specifications. Perhaps
surprisingly, specification often turns out to be the hardest part of this
game, for example, how would you specify a compiler, an OS?
There is also one somehow fatal flaw in my little game, how can I
prove the correspondence between the algorithm I wrote in Coq and my original
C code? If you read my proofs you will see that I tried to make them
syntactically as close as possible, but this is in no way a formal proof.
For instance, my Coq code uses unbounded integers while we all know that
C has finite precision integers. What I would hope is that under certain
assumptions (no overflow, for instance) the two programs are equivalent.
This would require a formal proof. So I think that a tool taking
C code and generating an equivalent Coq program (under some assumptions)
could be really neat, unfortunately this is still science fiction today.
Update: I went to PLDI'14 in June and there was a very intersting
paper published on exactly that subject. Maybe this translation from
C to high level formal systems is no longer science fiction! Some people
from NICTA Australia presented a paper Don't Sweat the Small Stuff:
Formal Verification of C Code Without the Pain that seems to solve
this abstraction barrier problem. Downside for me, it is for Isabelle/HOL,
not Coq, but I guess it would have been too easy...