Notes
2017-02-09
The φ nodes of SSA form are often defined
as taking one argument per predecessor block.
In both my compiler and LLVM, the data
structures are based on this definition.
However, with some more experience, I think
it is more sensible to require φ arguments
to be linked with incoming edges, not
predecessors. I will give two practical
examples to prove this point, but first, let
me clarify the concept of edge.
In a control flow graph (CFG), I call edge a link
between two basic blocks. An essential point is
that edges to a block are not in one-to-one
correspondence with the block's predecessors.
This is because of jumps like if(%cond) goto @b
else @b
, where a conditional jump jumps to the
same block in two (or more) cases. While such
a jump may look contrived, it can result from
optimizations, or even from lowering C-style
switch
statements where multiple cases have
the same body.
The program below will serve to illustrate my
claim.
@src
%var = ....
%cond = ...
if(%cond) goto @dst else @dst
@dst
%foo = φ(@src %var, ...)
Following the usual requirement, the φ
node %foo
has one argument for the @src
predecessor block. Assume now that the
predecessor is dead code and gets deleted
by this natural deletion code.
void blockdel(Block block) {
/* do stuff... */
foreach succ in block.successors {
foreach phi in succ.phis {
phi.remove_arg(block)
}
}
}
Stepping through this function reveals
that the remove_arg
method will erroneously be
called twice. In my compiler, the second
invocation caused an assertion failure. On the
other hand, if φ nodes have one argument per
incoming edge, the loop above is naturally changed
to some correct code.
Another case where things could go wrong is
in a CFG simplification pass. Such a pass will
likely remove spurious empty blocks and adjust
links elsewhere in consequence. But consider
the following program.
@blk0
...
if(%cond) goto @blk1 else @blk2
@blk1
jmp @blk3
@blk2
jmp @blk3
@blk3
%foo = φ(@blk1 %v1, @blk2 %v2)
The simplification pass will remove both @blk1
and @blk2
. After deleting @blk1
the block
of the first φ argument will be updated to
@blk0
. Now if the same thing happens to the
second empty block, two arguments of the φ node
will bogusly end up with the same incoming block.
Solving these issues in my compiler required
careful logic to handle the "multiplicity" of
predecessors. However I think the edge-based
design is better, and I might adopt it in the
future.