Dataflow Analysis


Contents

Now consider the following two sets: The meet of S1 and S2 is { } So f(S1 meet S1) is { }. However, if we apply f to S1 and S2 and then take the meet we get this:

Fixed points

Definition:

Examples:

Let L be the lattice whose elements are sets of letters (as above). Here are the fixed points for the functions we considered above:
  1. f(S) = S union {a}
    This function has many fixed points: all sets that contain 'a'.
  2. f(S) = if sizeof(S) ≤ 3 then S else empty-set
    The fixed points for this function are all sets of size less than or equal to 3
  3. f(S) = S - {a}
    The fixed points for this function are all sets that do not contain 'a'.
As an example of a function that has no fixed point, consider the lattice of integers with the usual "less than or equal to" ordering. The function: f(x) = x+1 has no fixed point.

Here is an important theorem about lattices and monotonic functions:

Theorem:

Examples

L is our usual lattice of sets of letters, with set union for the join.
  1. f(S) = S U {a}
    Recall that f is monotonic. The greatest fixed point of f is the set of all letters: {a, b, ..., z}. That is also the top element of the lattice, so we find the greatest fixed point in just one iteration:
      f(T) = T
    The bottom element is the empty set. If we apply f to that we get the set {a}. f({a}) = {a}, and we've found the least fixed point. Other fixed points are any set of letters that contains a.

  2. f(S) = if size(S) ≤ 3 then S else {}
    Recall that this function f is not monotonic. It has no greatest fixed point. It does have a least fixed point (the empty set). Other fixed points are sets of letters with size ≤ 3.

Creating new lattices from old ones

We can create new lattices from old ones using cross-product: if L1, L2, ..., Ln are lattices, then so is the cross-product of L1, L2, ..., Ln (which we can write as: L1 x L2 x ... x Ln). The elements of the cross-product are tuples of the form:

such that value ek belongs to lattice Lk

The ordering is element-wise: <e1, e2, ..., en> ⊆ <e1', e2', ..., en'> iff:

If L1, L2, ..., Ln are complete lattices, then so is their cross-product. The top element is the tuple that contains the top elements of the individual lattices: <top of L1, top of L2, ... , top of Ln>, and the bottom element is the tuple that contains the bottom elements of the individual lattices: <bottom of L1, bottom of L2, ... , bottom of Ln>.

Summary of lattice theory

Kildall's Lattice Framework for Dataflow Analysis

Recall that our informal definition of a dataflow problem included:

and that our goal is to solve a given instance of the problem by computing "before" and "after" sets for each node of the control-flow graph. A problem is that, with no additional information about the domain D, the operator ⌈⌉ , and the dataflow functions fn, we can't say, in general, whether a particular algorithm for computing the before and after sets works correctly (e.g., does the algorithm always halt? does it compute the MOP solution? if not, how does the computed solution relate to the MOP solution?).

Kildall addressed this issue by putting some additional requirements on D, ⌈⌉ , and fn. In particular he required that:

  1. D be a complete lattice L such that for any instance of the dataflow problem, L has no infinite descending chains.
  2. ⌈⌉ be the lattice's meet operator.
  3. All fn be distributive.
He also required (essentially) that the iterative algorithm initialize n.after (for all nodes n other than the enter node) to the lattice's "top" value. (Kildall's algorithm is slightly different from the iterative algorithm presented here, but computes the same result.)

Given these properties, Kildall showed that:

It is interesting to note that, while his theorems are correct, the example dataflow problem that he uses (constant propagation) does not satisfy his requirements; in particular, the dataflow functions for constant propagation are not distributive (though they are monotonic). This means that the solution computed by the iterative algorithm for constant propagation will not, in general be the MOP solution. Below is an example to illustrate this:
         1: enter
           |
	   v
	 2: if (...)
	/        \
       v          v
   3: a = 2     4: a = 3
       |          |
       v          v
   5: b = 3     6: b = 2
         \     /
	  v   v
        7: x = a + b
            |
	    v
	8: print(x)
The MOP solution for the final print statement includes the pair (x,5), since x is assigned the value 5 on both paths to that statement. However, the greatest solution to the set of equations for this program (the result computed using the iterative algorithm) finds that x is not constant at the print statement. This is because the equations require that n.before be the meet of m.after for all predecessors m; in particular, they require that the "before" set for node 7 (x = a + b) has empty, since the "after" sets of the two predecessors have (a,2), (b,3), and (a,3), (b,2), respectively, and the intersection of those two sets is empty. Given that value for 7.before, the equations require that 7.after (and 8.before) say that x is not constant. We can only discover that x is constant after node 7 if both a and b are constant before node 7.

In 1977, a paper by Kam and Ullman (Acta Informatica 7, 1977) extended Kildall's results to show that, given monotonic dataflow functions:

To show that the iterative algorithm computes the greatest solution to the set of equations, we can "transform" the set of equations into a single, monotonic function L → L (for a complete lattice L) as follows:

Consider the right-hand side of each equation to be a "mini-function". For example, for the two equations:

The two mini-functions, g11 and g12 are:

Define the function that corresponds to all of the equations to be:

Where the (...)s are replaced with the appropriate arguments to those mini-functions. In other words, function f takes one argument that is a tuple of values. It returns a tuple of values, too. The returned tuple is computed by applying the mini-functions associated with each of the dataflow equations to the appropriate inputs (which are part of the tuple of values that is the argument to function f).

Note that every fixed point of f is a solution to the set of equations! We want the greatest solution. (i.e., the greatest fixed point) To guarantee that this solution exists we need to know that:

  1. the type of f is L→L, where L is a complete lattice
  2. f is monotonic

To show (1), note that the each individual value in the tuple is an element of a complete lattice. (That is required by Kildall's framework.) So since cross product (tupling) preserves completeness, the tuple itself is an element of a complete lattice.

To show (2), note that the mini-functions that define each n.after value are monotonic (since those are the dataflow functions, and we've required that they be monotonic). It is easy to show that the mini-functions that define each n.before value are monotonic, too.

For a node n with k predecessors, the equation is:

and the corresponding mini-function is: We can prove that these mini-functions are monotonic by induction on k.

base case k=1

base case k=2

Induction Step

Assume that for all k < n

Now we must show the same thing for k=n

Given that all the mini-functions are monotonic, it is easy to show that f (the function that works on the tuples that represent the nodes' before and after sets) is monotonic; i.e., given two tuples:

    t1 = <e1, e2, ..., en>, and
    t2 = <e1', e2', ..., en'>
such that: t1 ⊆ t2, we must show f(t1) ⊆ f(t2). Recall that, for a cross-product lattice, the ordering is element-wise; thus, t1 ⊆ t2 means: ek ⊆ ek', for all k. We know that all of the mini-functions g are monotonic, so for all k, gk(ek) ⊆ gk(ek'). But since the ordering is element-wise, this is exactly what it means for f to be monotonic!

We now know:

Therefore:

This is not quite what the iterative algorithm does, but it is not hard to see that it is equivalent to one that does just this: initialize all n.before and n.after to top, then on each iteration, compute all of the "mini-functions" (i.e., recompute n.before and n.after for all nodes) simultaneously, terminating when there is no change. The actual iterative algorithm presented here is an optimization in that it only recomputes n.before and n.after for a node n when the "after" value of some predecessor has changed.

SUMMARY

Given: the goal of dataflow analysis is to compute a "dataflow fact" (an element of L) for each CFG node. Ideally, we want the MOP (meet over all paths) solution, in which the fact at node n is the combination of the facts induced by all paths to n. However, for CFGs with cycles, it is not possible to compute this solution directly.

Another approach to solving a dataflow problem is to solve a system of equations that relates the dataflow facts that hold before each node to the facts that hold after the node.

Kildall showed that if the dataflow functions are distributive, then the (original version of the) iterative algorithm always terminates, and always finds the MOP solution. Kam and Ullman later showed that if the dataflow functions are monotonic then the iterative algorithm always finds the greatest solution to the set of equations. They also showed that if the functions are monotonic but not distributive, then that solution is not always the same as the MOP solution. It is also true that the greatest solution to the system of equations is always an approximation to the MOP solution (i.e., may be lower in the lattice of solutions).