Notes

2017-02-15

Undefined behavior in C is a common source of bugs, and sometimes, of funny ones. Here is my story about one. A few months ago I was working on a function that looked like this.

for (i=0; arr[i]!=0 && i<2; ++i) {
	/* do some work */
}

When my program was compiled without optimizations it would behave correctly, but turning optimizations on made it incorrect. I was able to quickly pinpoint the above loop as the root of my issues, but the symptom of the bug was quite unusual: Stepping in the debugger revealed that the loop body was only executed once, when it should have been running twice! Indeed, the array arr contained two non-zero values.

After a little bit of head scratching, I eventually realized what the compiler was doing. The variable arr was declared as int arr[2], so accessing its third element is undefined behavior. Because of this, a valid program cannot access arr[2]; but if the loop body is run twice, the test condition will check arr[2]!=0 at the end of the second iteration. The consequence of this reasoning is that, assuming the input program is valid, the second loop iteration will not be run and can be gotten rid of!

I thought this was quite a remarkable use of undefined behavior: Invalid array accesses do not happen in valid programs, so if the compiler can prove such access happens in a branch, it means that the branch is dead code.