Finding Loops in Control Flow Graphs


We want to give basic blocks that occur in loops priority in register allocation. But how do we recognize loops in the control flow graph? One simple approach builds on the notion of dominance, which we've already seen.

Recall that node N dominates node M if all paths to M must pass through N. A node trivially dominates itself. Formally, dom(Z), the dominators of node Z, are defined as

	dom(Z) = {Z} Union (Intersect (over Y in Pred(Z)) dom(Y))
This equations says that besides itself, any node that dominates all of Z's predecessors also dominates Z.

One easy way to compute dominators is via a simple worklist algorithm.

	dom(N0) = {N0} where N0 is the start node (this set never changes)
	For all nodes but N0, initially set dom(N) = {all nodes}
	Push each node but N0 onto a worklist.
	Now remove any node, Z, from the worklist.
		Compute a new value for dom(Z) using the above formula.
		If the new value of dom(Z) differs from the current value,
		use the new value.
		Add all successors to Z to the worklist (if they are
		 not already on the list, except N0 whose value is known).
	Repeat until the worklist is empty.

Once dominators are computed, we can define a back edge. An arc (or edge) from node N to node H is a back edge if H dominates N. Node H is the "header" of the loop (the place where the loop is entered). The back edge is the "jump back" to the header that starts the next iteration.

The body of the loop defined by a back edge from N to H includes N and H, as well as all predecessors of N (direct and indirect) up to H. H's predecessors are not included. The algorithm is

	body = {H}
	push N onto an empty stack;
	while (stack != empty) {
		pop D from the stack;
		if (D not in body) {
			body = {D} union body;
			push each predecessor of D
				onto the stack.
		}
	}

Loops defined in this way are called natural loops. If there is more than one back edge to the same header, the body of the loop is the union of the nodes computed for each back edge. Since loops can nest, a header for one loop can be in the body of (but not the header of) another loop.