Notes
These are my personal opinions on various subjects, they are sorted in
chronological order. I tried to keep them short. If you have any
comments, I would be glad that you drop me a mail.
It is folklore that, under some conditions,
open-addressed hash tables with quadratic
probing do not miss free slots. This note
explains why with a proof.
Let me break things down a little. With hash
tables, the prime problem to solve when
implementing insertion is conflicts: You are
handed a key value pair (K₁,V₁) and, after
hasing the key K₁, you realize that the table
slot designated by the hash value is already
filled with a binding (K₂,V₂). Chaining
implementations will just cram the binding
in the slot anyway, hence creating a bucket
with more than one binding in a single
table slot. Open-addressing implementations
will instead go look for another slot.
One obvious strategy for searching is to
go look at the next slot, and the next one,
and so on, potentially wrapping around when the
last slot of the table is visited.
Quadratic probing offers an alternative scheme:
instead of scanning slot h+i mod N
at the
i-th step, it scans the slot h+P(i) mod N
where P
is a polynomial function of degree 2.
This strategy helps with the clustering problem
that linear probing may create.
The polynomial P
of interest to us here is
P(X) = (X + X²)/2
.
The polynomial P
outputs so-called
triangular numbers (they can be stacked in a
triangle shape), and the sequence P(0)
, P(1)
,
P(2)
, ... is easily computed:
p_i = 0;
for (i=0; i<N; i++) {
assert(p_i == (i + i*i)/2);
p_i += i;
}
But, more importantly, if the table size is a
power of 2, and there is a single free slot,
quadratic probing will find it. It is not an
obvious fact and I prove it below.
Formally, we want to show that as i
ranges
over the interval I=[0,2ⁿ)
, h+P(i) mod 2ⁿ
ranges over I
as well. Said differently, we
want to show that the function mapping x
to
h+P(x) mod 2ⁿ
is a bijection over I
.
Since I
is finite, it is sufficent to show
that the map is injective; that is, it maps
different values of I
to different values.
To prove injectivity we consider i
and j
in I
such that h+P(i) mod 2ⁿ
= h+P(j) mod 2ⁿ
and show
that i
must be equal to j
. We first reason by
equivalence:
(h + P(i) mod 2ⁿ) = (h + P(j) mod 2ⁿ)
⟺ (h + (i + i²)/2) - (h + (j + j²)/2) = k2ⁿ
⟺ (i + i²) - (j + j²) = k2ⁿ⁺¹
⟺ (i - j)(1 + i + j) = k2ⁿ⁺¹
Breaking the products on both sides we get that
there must be three integers k₁
, k₂
, and m
such that:
A := i - j = k₁2ᵐ with k = k₁k₂
B := 1 + i + j = k₂2ⁿ⁺¹⁻ᵐ and 0 ≤ m ≤ n + 1
We now consider three cases:
- If
m
is not 0
nor n+1
then it must be that
both A
and B
are even. That is however impossible
because B = 1 + 2j + A
, so when A
is even, B
is
odd, and vice versa.
- If
m
is n+1
, we must then have i - j = k₁2ⁿ⁺¹
,
but since i
and j
are in the interval I
, i - j
is in (-2ⁿ,2ⁿ)
, so k₁
can only be zero and thus
i
equals j
.
- If
m
is 0
, we again use a size argument. Since i
and j
are in I
, we have that 1 + i + j
is in
(0,2ⁿ⁺¹-1)
. So k₂
must be positive and less
than 1; these requirements prove that the present
case is impossible.
The only possible outcome for the case analysis
is thus that i
equals j
. Unwinding everything, the
map h+P(x) mod 2ⁿ
is injective on I
, so it is
also bijective on I
, so when i
ranges from 0
to 2ⁿ-1
, the map ranges over all of I
, and thus
quadratic probing indeed vists all slots!
Say you are given two integer ranges [a, b) where a is inclusive
and b is exclusive. We would like to write a function that returns
whether the two ranges overlap or not. I suggest you try it out
before reading the rest of the note.
We are now going to take a viewpoint on the problem which I hope
you will find original. Formally, the two intervals overlap if
and only if they have (at least) one element in common. So we
would like to decide if the following formula holds:
F := ∃x. x ∈ [a₁, b₁( && x ∈ [a₂, b₂(
However, programming languages are really only subsets of the
mathematical language, and most of them won't let us express
the existential quantification. What we would like to find is
a formula Fx
that is equivalent to F
but free of constructs
that cannot be used in typical programming languages. In
particular, Fx
must be free of the existential quantifier.
It so happens that this business of eliminating quantifiers is
a cornerstone of automated reasoning, where it goes by the
unsurprising name of "quantifier elimination". Quite often
the idea behind these procedures is to eliminate quantified
variables by forming a conjunction of all the implicands.
Let's see how this would work on our formula:
0: F := ∃x. x ∈ [a₁, b₁( && x ∈ [a₂, b₂(
1: <=> ∃x. (a₁ ≤ x && x < b₁) && (a₂ ≤ x && x < b₂)
2: ==> a₁ < b₁ && a₁ < b₂ && a₂ < b₂ && a₂ < b₁
In the last step we considered all pairs (a,b) that sandwich
the variable x and formed the inequalities a < b. By construction,
and by transitivity of ≤,<, the resulting conjunction is a
consequence of our original formula F
. In other words, if
the two ranges overlap, then the formula on line 2 must hold.
Interestingly for integers, and in fact for any total order,
the formula on line 2 is equivalent to F
(take x to be
max(a₁,a₂)). That's one thing we can easily formally verify
using HOL Light:
prove
(`(?(x:int). (a1 <= x /\ x < b1) /\ (a2 <= x /\ x < b2)) <=>
a1 < b1 /\ a1 < b2 /\ a2 < b2 /\ a2 < b1`,
EQ_TAC THENL [ARITH_TAC; ALL_TAC] THEN STRIP_TAC THEN
EXISTS_TAC `max (a1:int) a2` THEN ASM_ARITH_TAC);;
Finally, note that the two conjuncts a₁ < b₁
and a₂ < b₂
are simply saying that the two input ranges are non-empty.
If the two ranges are known to be non-empty, then the check
degenerates to a₁ < b₂ && a₂ < b₁
.
It has been forever since my last note! I hereby take it as
new year's resolution to share more about the interesting
thought nuggets I come across.
In computer hardware, arithmetic primitives are in fact modular
primitives; this means that the result of a+b
is defined as
the mathematical sum of a
and b
modulo a power of 2
(usually, 2³² or 2⁶⁴). The mathematical structure obtained
is a modular ring in which all operations "wrap around".
It is only a ring and not a field because the multiplication
operation does not have an inverse; to see why an inverse
function for multiplication cannot be defined, observe that
42 is the result of both 21×2 and (2³¹+21)×2!
An elementary result equivalent to Bézout's identity
is that any number co-prime with the modulus has a
multiplicative inverse. In a computer, this means that for
any odd number a
, a number b
exists shuch that a✕b = 1
.
Now there is a particularly cute algorithm to find
multiplicative inverses, and it is what motivated me to
write this note.
The core idea of the algorithm lies in an observation I
found quite surprising: a multiplicative inverse b
of a
modulo 2ⁿ can be used to find an inverse c
of a
modulo
2²ⁿ. Let's see how.
ab = 1 (mod 2ⁿ)
⇒ ab - 1 = k2ⁿ
⇒ (ab - 1)² = 0 (mod 2²ⁿ)
⇒ a²b² - 2ab + 1 = 0 (mod 2²ⁿ)
⇒ 2ab - a²b² = 1 (mod 2²ⁿ)
⇒ a(2b - ab²) = 1 (mod 2²ⁿ)
⇒ c = 2b - ab² (mod 2²ⁿ)
This gives us the inductive step of the algorithm;
to bootstrap it we simply need to find an inverse
b
of a
modulo some small power of 2. To that
end, one can check that all odd numbers are their
own inverse modulo 4 (and 8).
Putting it together in C.
uint32_t mulinv(uint32_t a) {
uint32_t b = a; /* 1/a mod 2² */
b *= 2 - a*b; /* 1/a mod 2⁴ */
b *= 2 - a*b; /* 1/a mod 2⁸ */
b *= 2 - a*b; /* 1/a mod 2¹⁶ */
b *= 2 - a*b; /* 1/a mod 2³² */
return b;
}
For those really eager for performance, it turns
out that b = 3*a ^ 2
is a cute initialization
by Peter Montgomery which yields an inverse
of a
modulo 2⁵ and allows to cut one refinement
step!
Undefined behavior in C is a common source
of bugs, and sometimes, of funny ones.
Here is my story about one.
A few months ago I was working on a
function that looked like this.
for (i=0; arr[i]!=0 && i<2; ++i) {
/* do some work */
}
When my program was compiled without
optimizations it would behave correctly,
but turning optimizations on made it
incorrect. I was able to quickly
pinpoint the above loop as the root of
my issues, but the symptom of the bug
was quite unusual: Stepping in the
debugger revealed that the loop body was
only executed once, when it should have
been running twice! Indeed, the array
arr
contained two non-zero values.
After a little bit of head scratching,
I eventually realized what the compiler
was doing. The variable arr
was declared
as int arr[2]
, so accessing its third
element is undefined
behavior. Because of this, a valid
program cannot access arr[2]
;
but if the loop body is run twice, the
test condition will check arr[2]!=0
at the end of the second iteration. The
consequence of this reasoning is that,
assuming the input program is valid, the
second loop iteration will not be run
and can be gotten rid of!
I thought this was quite a remarkable
use of undefined behavior: Invalid array
accesses do not happen in valid programs,
so if the compiler can prove such access
happens in a branch, it means that the
branch is dead code.
The φ nodes of SSA form are often defined
as taking one argument per predecessor block.
In both my compiler and LLVM, the data
structures are based on this definition.
However, with some more experience, I think
it is more sensible to require φ arguments
to be linked with incoming edges, not
predecessors. I will give two practical
examples to prove this point, but first, let
me clarify the concept of edge.
In a control flow graph (CFG), I call edge a link
between two basic blocks. An essential point is
that edges to a block are not in one-to-one
correspondence with the block's predecessors.
This is because of jumps like if(%cond) goto @b
else @b
, where a conditional jump jumps to the
same block in two (or more) cases. While such
a jump may look contrived, it can result from
optimizations, or even from lowering C-style
switch
statements where multiple cases have
the same body.
The program below will serve to illustrate my
claim.
@src
%var = ....
%cond = ...
if(%cond) goto @dst else @dst
@dst
%foo = φ(@src %var, ...)
Following the usual requirement, the φ
node %foo
has one argument for the @src
predecessor block. Assume now that the
predecessor is dead code and gets deleted
by this natural deletion code.
void blockdel(Block block) {
/* do stuff... */
foreach succ in block.successors {
foreach phi in succ.phis {
phi.remove_arg(block)
}
}
}
Stepping through this function reveals
that the remove_arg
method will erroneously be
called twice. In my compiler, the second
invocation caused an assertion failure. On the
other hand, if φ nodes have one argument per
incoming edge, the loop above is naturally changed
to some correct code.
Another case where things could go wrong is
in a CFG simplification pass. Such a pass will
likely remove spurious empty blocks and adjust
links elsewhere in consequence. But consider
the following program.
@blk0
...
if(%cond) goto @blk1 else @blk2
@blk1
jmp @blk3
@blk2
jmp @blk3
@blk3
%foo = φ(@blk1 %v1, @blk2 %v2)
The simplification pass will remove both @blk1
and @blk2
. After deleting @blk1
the block
of the first φ argument will be updated to
@blk0
. Now if the same thing happens to the
second empty block, two arguments of the φ node
will bogusly end up with the same incoming block.
Solving these issues in my compiler required
careful logic to handle the "multiplicity" of
predecessors. However I think the edge-based
design is better, and I might adopt it in the
future.
Because of a misuse of an archiving command I had to delete more
than 40,000 files polluting my home directory. The UNIX way
should make it easy because the decompression program can output
the list of files it extracted. Unfortunately, it turned out to
be surprisingly hard for at least two reasons: first 40,000 files
is a lot and second, files used funny characters (white-spaces,
dashes at the beginning, patheseses, etc.). After some Googling,
here is the final command I used, phew.
tr \\n \\0 <list.txt | xargs -0 -n 1 rm -f --
While browsing information about compilers I found it difficult
to find a formal definition of SSA form (Static Single Assignment).
For whatever reason, it's often under-specified or simplifed.
So anyway, if you are like me and like formal definitions, here it is.
A program is in SSA form when:
- Variables are assigned only once.
- All the uses of a variable are dominated by the unique definition.
- The φ nodes are at the beginning basic blocks.
- The φ nodes have one argument for each predecessor of their basic block.
- Uses of φ arguments are counted at the end of the corresponding predecessor.
The concept of domination is from graph theory: an instruction A dominates
another B when all the paths from the entry point of the program to B go
through A.
Please let me know if the definition above looks wrong to you.
Finally, if you want informal examples, the Wikipedia page has
plenty!
I would like to share some thoughts I had about tagged data. Sum types and
such are getting really common in mainstream languages, they are usually
implemented using what I'll call a tag field. A tag is a small
number that indicates what is the kind of the data considered. For example,
in C, we often use the following idiom to mimic sum types:
struct Fruit {
enum {
FruitNone, /* invalid fruit */
FruitApple,
FruitBanana,
FruitCherry,
} kind;
union {
struct Apple apple;
struct Banana banana;
struct Cherry cherry;
} u;
};
Like in the example, most of the programming languages represent the tag
using a machine integer. However, if you are programming with high space
constraints it might be worth thinking more about the appropriate
representation of the tag field.
Let's say that you want a tagged structure that can be either
one integer of type A in the range [0-255] or integers of type
B, C, and D that lie in the range [0-63], then, because there
are 4 different cases, the tag field needs to be at least 2 bits
long, and because the type A needs at least 8 bits (8 bits
represent at most
28
= 256 values), the whole tagged
data-structure would need at least 8+2 = 10 bits for storage.
The point is now that the whole reasoning above assumes that
the tag bit-width is constant, but in the above case it is
actually smarter to have a variable bit-width for the tag. Here
is why: if we use the tag 0 to represent the kind of data A, and
3, 5, 7 to represent respectively the kinds B, C, and D, then
we use only 1 bit for the tag in the A case and 3 bits in all
the other cases. So instead of using 10 bits of storage as
above we use max(8+1,6+3) = 9 bits! (Note that the above works
because 3, 5, and 7 have 1 as least significant bit, so they
are never confused with the A kind.)
The x64 architecture is better than i386 in many regards but one surprise
it reserved to programmers
is that immediates remain 32 bits constants instead of being
promoted to 64 bits like everything else. In case you do not know it
already, an immediate is a constant in an instruction, for example,
if you want to add N to the register rax, you would write
addq $N, %rax
. And in that case, we say that N is an immediate.
Because immediates are restricted to be only 32 bits long, it is impossible
to add
233
to rax using
one instruction only. One has to load the big constant in another register
first (say rbx), and then perform the addition, like so:
addq %rbx, %rax
.
This brings us to the following question, how does one load a constant into
a 64 bits register?
The answer to this is simple, if you do not care about the code size.
But if you do, there are actually three different ways!
First, like most 64 bits instruction sets that allow operating 32 bits
sub-registers, x64 zeroes the high bits of the larger parent 64 bits
register for any 32 bits operation. (If you
wonder why, lookup data-dependencies in computer architecture books.)
So for instance, when the machine executes movl $42, %eax
, the high
bits of rax will be set to 0, and the full rax register will contain the
value 42. This is our first way to load a value, it is also the
cheapest way and uses 5 bytes only.
Because 32 bits operations are zero-extended, the previous technique
is unsuitable to load a negative value into a 64 bits register. In that
case we have to mention rax explicitely in the instruction: movq
$-42, %rax
. On x64, mentioning a full width register costs you
two extra bytes (one of them is the REX prefix, required for all 64 bits
instructions). So that gives us the second most economical load
instruction, it uses 7 bytes.
Finally if the constant to load does not fit in 32 bits, one has to
use the only x64 instruction with a 64 bits immediate. In
AT&T syntax we write movabsq $4294967296, %rax
. It
really is a last-resort option because that instruction uses 10 bytes!
These three cases give us the following rules for compiling constant
loads into rax on x64. (The same works for any other integer register.)
- When
0 ≤ N < 232,
use
movl $N, %eax
(5 bytes).
- When
-231 ≤ N < 0,
use
movq $N, %rax
(7 bytes).
- Otherwise, use
movabsq $N, %rax
(10 bytes).
Isn't that crazy?
Here is a short note for those who have trouble remembering Russell's
Paradox or the definition of the Y combinator in lambda-calculus.
I will expose here one observation that allows you to remember just one of
the two and recover the other. The actual link between these two notions
prevents the lambda-calculus from serving the purpose Church had for it:
giving a formal basis to mathematics. Fortunately the lambda-calculus
found another application not any less important than grounding mathematics:
it describes universal computations.
Russell's Paradox considers the set R = { X | X ∉ X }
of sets that do not belong to themselves. Now if such set exists, R ∈ R if and only
if R ∉ R, and we have a contradiction. Let us encode this reasoning
in lambda-calculus, we start by observing that sets can
be represented by their characteristic function: a function returning
1 for all members of the set and 0 for the others. Assume that
we represent the negation using a function N. Then, we can define
the set R in lambda-calculus by the following characteristic function:
R = λ x. N (x x)
Let's observe what happens when we ask if R is in itself:
We form Y = R R
(that is exactly the Y combinator) and by one
application of beta-reduction, we get N (R R) ≡ N Y
.
We just found a fixpoint of N. In set theory we had a fixpoint of
negation, but negation does not have fixpoint, hence the paradox. In the
lambda-calculus case, we get out of this with a trick: if N has no fixpoint,
the computation we get by applying these characteristic functions never
terminates, it diverges and we face an ever-running program that
never gives us the answer if R is in itself or not.
I recently hacked together a small yacc for C. Initially, I was programming
as if I lived in the future and performance was not a problem.
However, with this LALR state generation you can't be too
silly, as I soon discovered. Processing this C89 grammar
took around 8 minutes, that was not really acceptable as bison could deal with
it in a fraction of second! After eliminating a few massive
performance hogs (sorted array to represent a set, loop invariant
in a loop), I was processing the above C grammar in 5 seconds.
Much better! Then comes my point, at that time the profiler was saying
that my code was fairly fast and accounted for less than a second
of runtime. The issue turned out to be in my use of malloc/free
in the libc. I used the following pattern:
- Allocate initialize some memory to store a temporary,
- if condition C is true, store the pointer
- otherwise, free the memory.
The case 3 happened much more often than the case 2 so I thought
that glibc's allocation code would simply give me the same
block all the time. Switching to the scheme below got
me a 5x speedup, so I guess I was wrong!
- Initialize a statically allocated object A,
- if condition C is true, allocate some memory and copy A to it,
then store the pointer
- otherwise, do nothing.
The conclusion I can draw is
that objects likely to be short-lived should really not be
allocated dynamically. This makes me wonder how, for instance,
C++'s RAII and Rust's automatic liveness mechanisms handle this
kind of performance problems. The speed of my yacc is now similar
to bison's after only 10% of the code was changed. This proves
the point that, instead of designing for speed, we should design
for simplicity and then, improve.
I am grading some verified code submitted by students and observe
one interesting fact. The code they have to verify is the classic
inefficient bubble sort, they are given an array and need to convince
Microsoft Research's Dafny verifier that the returned array is
indeed sorted.
Now comes the interesting bit. It turns out that
all students have in the precondition that the pointer to the array
is not null, but most of them also require that a.Length > 1
or a.Length > 0
. That is, their sorting function can
only be called on arrays that are either two elements long (in the
worst case) or, non empty. This additional requirement helps
the verifier going through but most of the time their code actually
works for empty arrays! In other words, the requirement is useless
but they did not go through the pain of trying to eliminate or
understand it. I see this as disdain for small cases and would
like to argue it is poor programming practice.
Now that I think about it, it reminds me of a mail I read some
time ago, somebody implemented a vi clone and in its first
versions, editing commands at the end of the file would crash
or misbehave in most cases. To me, it is an indication of
poor design. This software is undoubtedly still flawed since
the corner case was patched afterwards, good code and data
design would have allowed uniform programming and probably made
the program safer.
I see going through the thinking to eliminate corner cases as
an essential part of programming. The more your function or
data structure is general the easier and more comfortable it is
to use. By experience, adapting your code to the corner case will
not make it unclear but more elegant and easier to read/use
(assuming you don't dumbly patch it)! So, next time you write
some code, ask yourself what happens when the window is 0 pixels
wide, or if the file you are reading contains a nul byte...
While restructuring my simple IO multiplexing module I had to solve
a problem that was not completely trivial. It might make an
instructive exercise so I will explain it here.
The problem is that I am iterating over a collection (C array)
that can be modified by the functions called during the iteration.
More specifically, some elements could be removed during the
iteration. I thought the simplest way to do this is to have a
flag that marks the validity of elements in my array. This way,
instead of deleting during the iteration I change a flag and,
during the iteration, I skip flagged elements.
After the iteration, we have to compact the array
to remove flagged elements. My first (mental) try was quadratic
but with some more thought, I found a linear algorithm. I let
you meditate and find it, it's not so hard and pretty nice to
hack (3 lines in my program).
Here are my latest thoughts on compilers for low level languages
(I think C). I am a pretty big fan of the C language. What I like
with C is that when you write it and know your target platform you
have pretty good idea what the code generated will be. It is very
predictable. C was designed at a time where programs had to be very
simple, and this includes compilers. So C is trying to be as
expressive as possible while remaining very easy to compile.
It seems that this ideal is now slipping out of sight among
language users and compiler writers. Recently, a new family of
bugs has emerged, compilers exploit undefined behavior
as defined in the C standard to enable "optimizations".
This mail shows how GCC removes a check for overflow that has
undefined behavior without any warning. In the same vein,
this article describes a bug in the Linux kernel that was aggravated by
another undefined behavior based GCC optimization. Another recent article
tries to show how, in modern C, one should zero a piece
of memory. This problem is made hard by optimizing compilers
that know the semantics of memset and will
optimize it out if the it does not change the observational
behavior of C programs.
These different behaviors of compilers greatly impair the predictability,
and I think this is a real deal breaker for many people.
But why predictability is something important? Well, consider
the following function. If the compiler is predictable, the
memset call will be compiled down to assembly.
void f() {
char *p = malloc(512);
if (!p) return;
sensitivestuff(p);
memset(p, 0, 512);
free(p);
}
If only standard C is using this function, it is perfectly
fine to remove the memory clearing call to memset. But we are
not in a world where only well-behaved C calls C. This function
could very well be called by some malicious code that tries to
access back the memory just freed. In that case, not zeroing
the memory causes a security bug since sensitive information can
be accessed. So, "modern" compilers, when they do
their job, assume that all code is written in C and well behaved.
This is obviously not true in the real world, where abstraction
leaps exist and can be exploited.
The previous argument shows that predictability is essential
for security. Another important area where predictability is
important is performance evaluation. While basic optimizations
are critical, when C is used for embedded applications,
predictability of the compiler is usually preferred to crazy
compiler tricks. This is true for a simple reason, programmers
want to evaluate the cost of their programs at the C level
but what runs is the actual compiled code. These two abstraction
layers need to be related in some simple way for the analysis
on the high level to be relevant. Airbus, for example, uses
a custom GCC version with all optimizations disabled.
If I have to sum it up, I think simpler is better, and
compiler writers are taking a slippery path that could very
well lead their compilers to be unusable for many applications
where C used to shine. The focus is a lot more on a literal
and probably misguided interpretation of the standard than it
is on usability and relevance of compilers to today's
problems. And, to be honest, is any of the above
"optimizations" critical to any application that is not
a contrived benchmark?
Don Knuth, in its Volume 1 of The Art of Computer Programming
argues against garbage collection by saying that early freeing can
lead to a more efficient usage of the memory by the allocator.
In the following diagrams which represent a memory state, the
heavy gray area is not used by the program but the garbage collector
did not reclaim it yet, the black areas are in use, and the white
areas are free to use.
If the program tries to allocate one small block in the current
state, the allocator will use some of the free space,
Now the garbage collector reclaims the gray zone and we
are left with two disjoint blocks of memory.
On the other hand, if memory is managed manually the gray block
has been freed eagerly,
Thus, when the application needs some extra memory it can be tightly
packed and only leaves one large hole in the memory.
I guess this is all to be taken with a grain of salt since
there are also easy arguments for garbage collection (for example
fast allocation). All in all I feel that programmers using languages
with GC will advocate for it, and the converse for the others.
The reasonable choice to make is probably application specific and
even in this case it is likely to be hard and controversial making
the right decision at the planning stage.
Today's test revealed that fancy text data structures
might very well not be necessary.
While hacking on my text editor I discovered
a fun idiom. Often, in a switch statement you have two
cases that are very similar except in one or two points.
In that situation it would be poor style to copy the big
chunk of code that is common to the two cases, instead,
you can use the following neat trick.
switch(type) {
...
case X:
isx = 1;
if (0) {
case Y:
isx = 0;
}
common code;
...
break;
}
The line isx = 0;
will only be executed when
the type is Y because of the guard. This is very similar
and sort of dual to Duff's device. For additional fun,
the braces of if (0)
can be removed!
Here is a simple Awk program to check that your files
respect the religious constraint of 80 characters per
line.
{
gsub(/\t/, " ");
if (length($0) > 80)
printf("%d: %s\n", NR, $0);
}