Notes

2017-02-09

The φ nodes of SSA form are often defined as taking one argument per predecessor block. In both my compiler and LLVM, the data structures are based on this definition. However, with some more experience, I think it is more sensible to require φ arguments to be linked with incoming edges, not predecessors. I will give two practical examples to prove this point, but first, let me clarify the concept of edge.

In a control flow graph (CFG), I call edge a link between two basic blocks. An essential point is that edges to a block are not in one-to-one correspondence with the block's predecessors. This is because of jumps like if(%cond) goto @b else @b, where a conditional jump jumps to the same block in two (or more) cases. While such a jump may look contrived, it can result from optimizations, or even from lowering C-style switch statements where multiple cases have the same body.

The program below will serve to illustrate my claim.

@src
        %var = ....
        %cond = ...
        if(%cond) goto @dst else @dst
@dst
        %foo = φ(@src %var, ...)

Following the usual requirement, the φ node %foo has one argument for the @src predecessor block. Assume now that the predecessor is dead code and gets deleted by this natural deletion code.

void blockdel(Block block) {
        /* do stuff... */
        foreach succ in block.successors {
                foreach phi in succ.phis {
                        phi.remove_arg(block)
                }
        }
}

Stepping through this function reveals that the remove_arg method will erroneously be called twice. In my compiler, the second invocation caused an assertion failure. On the other hand, if φ nodes have one argument per incoming edge, the loop above is naturally changed to some correct code.

Another case where things could go wrong is in a CFG simplification pass. Such a pass will likely remove spurious empty blocks and adjust links elsewhere in consequence. But consider the following program.

@blk0
        ...
        if(%cond) goto @blk1 else @blk2
@blk1
        jmp @blk3
@blk2
        jmp @blk3
@blk3
        %foo = φ(@blk1 %v1, @blk2 %v2)

The simplification pass will remove both @blk1 and @blk2. After deleting @blk1 the block of the first φ argument will be updated to @blk0. Now if the same thing happens to the second empty block, two arguments of the φ node will bogusly end up with the same incoming block.

Solving these issues in my compiler required careful logic to handle the "multiplicity" of predecessors. However I think the edge-based design is better, and I might adopt it in the future.