Interprocedural Analysis: Computing Summary Information

Contents

Introduction

Recall that one way to define dataflow functions for call nodes is to compute summary information for the called procedure. In particular, for each procedure P, compute: Below we will consider how to compute this information; first for global variables only, then for globals and value parameters, then for globals, value parameters, and reference parameters.

Globals Only

Initially, we will assume that the program we're analyzing has:

Therefore, for now we only care about the globals that might be modified/used by each procedure (we will relax the assumption about parameters later).

Step 1: Compute IMOD and IREF

Step 2: Build the call graph

Step 3: Collapse cycles.

Step 4: Compute GMOD and GREF.

For each node n, GMOD(n) and GREF(n) are the dataflow facts that hold before node n (i.e., n.GMODBefore and n.GREFBefore).

Value Parameters

Now let's relax our initial assumption that there are no parameters, and consider procedures with value parameters. In this case, the computation of GMOD sets will not change, but the computation of GREF sets must change. (We want to consider a call like p( x ) a use of x iff p might use the corresponding formal, so we need to take formal parameters into account when computing GREF sets).

Step 1: Compute IREF sets as before, but include local variables and formal parameters.

Step 2: Build the call graph, but this time include one edge for every call site (so there can be multiple edges between pairs of nodes). For example, here's a program:

void main() {         void a( f1 ) {           void b( f2, f3 ) {
s1: call a(v1);	      s2: call b(v2, v3);          print f3;
}                     s3: call b(v4, v5);      s4: call b( g1, g2 );
                                               }
And here is the corresponding call (multi)graph, with IREF sets (graph edges are labeled with the corresponding call site):

Step 3: Add a new exit node and an edge n → exit for each node n with no outgoing edges, and for one node n of every strongly connected component (note that sccs are not collapsed).

Step 4: Put dataflow functions on the edges of the call graph. The function on all edges to the exit node is the identity function. For other edges, the function filters out the locals of the called function, and "backbinds" the formals of the called functions to the actuals of the calling procedure. Here is the example call graph with the added exit node and the dataflow functions:

Step 5: Determine GREF sets for each node n and each call site s (corresponding to call graph edge m→n with dataflow function fs) as the greatest fixed point of the following set of equations:

GREF(exit) = { }
GREF(n) = (union of all GREF(s) such that s is a call site in n) U IREF(n)
GREF(s) = fs(GREF(n))
For the running example, the values are:

node or call site GREF set
maing2
a g2, v3, v5
b f3, g2
s1 g2
s2 v3, g2
s3 v5, g2
s4 g2

Reference Parameters

Finally, let's think about what happens when we allow reference parameters. In a sense, this introduces two problems:

  1. The IMOD/IREF sets are not complete because of aliases:
    void a( f1, f2 ) {
        f1 = f2 + 1;
    }
           
    In this example, f1 and all of its aliases are modified; f2 and all of its aliases are used. The aliases can include globals (due to a call like: a( g1, g2 )) or other formals (due to a call like: a( x, x )).

    Similarly, any def/use of a global is actually a def/use of the formals to which it is aliased, too.

  2. Since a procedure's IMOD/IREF sets are not complete, neither are its GMOD/GREF sets, which means that incomplete information is propagated to its callers. For example, here is a call graph in which each node represents one procedure, the code for that procedure is given in the node, and the IMOD and GMOD sets that would be computed using a straightforward extension of the algorithm used above for value parameters, are shown to the right:
      +-------------+
      | void main() |  IMOD = { }
      | call a( g ) |  GMOD = { g }
      +-------------+
            |
            v
      +--------------+
      | void a( f1 ) |  IMOD = { }
      | call b( f1 ) |  GMOD = { f1 }
      +--------------+
            |
            v
      +--------------+
      | void b( f2 ) |
      | f2 = 0       |  GMOD = IMOD = { f2 }
      +--------------+
      
    Note that in b, f2 is aliased to g, so b actually modifies g as well as f2; this is an example of problem 1, discussed above. However, since b does modify g (due to the call from a), it is also true that a modifies g. Yet g is not in GMOD(a). This is the example of problem 2.
Surprisingly, it has been shown (by Banning in 1979) that GMOD/GREF sets can be computed correctly in the presence of reference parameters by breaking the computation into separate phases:
  1. Compute DMOD/DREF: modify/use sets that ignore the effects of aliasing due to reference parameters (essentially the GMOD/GREF sets computed by the algorithm discussed above for value parameters).

  2. Compute alias sets for all formal parameters and all globals on a per-procedure basis:
    • Alias(f,p) = {x | x is a formal of p or a global, and is aliased to f}
    • Alias(g, p) = {x | x is a formal of p and is aliased to g}
  3. Combine the results of (1) and (2) to determine:
    • What each call may actually define/use (transitively)
    • What each def/use of a formal f or a global g may actually define/use.

Computing DMOD and DREF

The computation of DMOD/DREF is similar to the method for computing GMOD/GREF given only value parameters; i.e., dataflow functions go on the edges of the call (multi)graph, and IMOD/IREF sets are propagated back across those edges, replacing formals with actuals. Here is an example; the values shown on the edges are the actuals of the calls that are modified by the called procedure (i.e., the corresponding formals are in the called procedures' IMOD or DMOD sets):

The DMOD sets are:
s1:   { x }
s2:   { g2 }
s3:   { g3 }
s4:   { f2 }
s5:   { }
main: { x, g2, g3 }
a:    { f2 }
b:    { f3 }
c:    { }

The next step is to compute the alias sets. This has been described in the paper "Fast Interprocedural Alias Analysis" by Keith Cooper and Ken Kennedy, published in the Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages (1989).

Alias sets can be computed in two steps:

  1. Use the "binding graph" to compute formal/global aliases.
  2. Use the "pair binding graph" to compute formal/formal aliases.

The binding graph for a program includes a node for each formal of each procedure, and an edge f1 → f2 iff f1 is passed as an actual parameter in some call to p, and f2 is the corresponding formal of p. For example, at call site s4 in procedure a of the program shown above, f2 is the 1st actual, and f3 is the corresponding formal of the called procedure b. Therefore, in the program's binding graph there would be an edge f2 → f3. Here is the complete binding graph for the example program:

To compute the formal/global aliases (for each formal f, which globals it may be aliased to):

// collapse scc's of binding graph
   replace each scc with a representative node n

// initialize
   for each node x, set A(x) = {}
   for each call site s
      for each global v passed to formal f at s
         A(f) = A(f) U { v }  // if f was in an scc, use the rep. node n for f

// traverse the graph, propagating aliases
   for each node f in topological order
      A(f) = A(f) U A(g) such that g is a predecessor of f

// set values for all nodes of scc's
   for each scc c
      for each node f in c
         let n be c's representative node in
	    A(f) = A(n)
For our example, after the initialization step, we'd have the following initial alias sets
A(f1) = { g1 }
A(f2) = { g2 }
A(f3) = { g3 }
A(f4) = { g1 }
A(f5) = { }
A(f6) = { }

The propagation loop would add g2 to the sets of f3 and f6, and would add g1 to the set of f5.

Note that this algorithm maps formals to the globals to which they are aliased (f1 → {g1}, etc.). We also want the sets "Alias(g,p) = the set of p's formals to which g is aliased in p". We can get those sets using this algorithm:

After using this algorithm in our example, we get:
Alias(g1, a) = { f1 }    Alias(g1, b) = { f4 }    Alias(g1, c) = { f5 }
Alias(g2, a) = { f2 }    Alias(g2, b) = { f3 }    Alias(g2, c) = { f6 }
Alias(g3, a) = { }       Alias(g3, b) = { f3 }    Alias(g3, c) = { }

Now we need to compute the formal/formal aliases. This is done using the "pair binding graph".

There are 3 different ways two formals can be aliased:

  1. The same actual is passed to two formals:
    +----------------+
    | call a( x, x ) |
    +----------------+
            |
    	v
    +-------------------+
    | enter a( f1, f2 ) |           f1 and f2 are aliased in a
    +-------------------+
    
  2. Global g is passed to one formal, an alias of g to another:
    +-------------+
    | call a( g ) |
    +-------------+
            |
    	v
    +-----------------+
    | enter a( f1 )   |
    | call b( f1, g ) |
    +-----------------+
            |
    	v
    +-------------------+
    | enter b( f2, f3 ) |           f2 and f3 are aliased in b
    +-------------------+
    
  3. Formals f1 and f2 are aliases, and are both passed as actuals.
           ...
            |
    	v
    +---------------------+
    | enter a( f1, f2 )   |
    | call b( x, f1, f2 ) |
    +---------------------+
            |
    	v
    +-----------------------+
    | enter b( f3, f4, f5 ) |       f4 and f5 are aliased in b
    +-----------------------+
    
So to identify aliased formals, we must: This is done using the "pair binding graph", which has one node for each pair of formals of the same procedure, and an edge (f1, f2) → (f3, f4) iff there is a call that passes f1 to f3 AND passes f2 to f4, or that passes f1 to f4 AND passes f2 to f3.

Here is the pair binding graph for our example program:

Once this graph is created, we identify "initial alias pairs" (formals that are aliased either because the same actual is passed twice, or because a global and its alias are passed as actuals) as follows:
for each call site s
   if var x is passed to two formals f1 and f2
   then {
         // same actual passed twice:
         //       call p(x, x)
         //              |  |
         //              v  v
         //       void p(f1,f2)
      mark (f1,f2)
   }
   for each actual f that is a formal of the procedure containing s
       let f' be the corresponding formal of the called procedure in
	  for each global g in A(f) that is passed as an actual at s
	      // global and its alias passed
	      //     call p(f, g)
	      //            |  |
	      //            v  v
              //     void p(f',f'')
	         let f'' be the corresponding formal in the called procedure in
		     mark (f', f'')
                             
In our running example, only the pair (f1, f2) is marked, because of the call "a(x, x)" in main.

The next step is to propagate the initial alias pairs by marking all nodes reachable from a marked node. This can be done as follows:

In our running example, node (f5, f6) would be marked (i.e., the initial alias of f1 and f2 would be propagated to f5, f6, due to the call at call site s5).

Note that if, in the end, a pair (fj, fk) is marked, it means that fj and fk may be aliased.

The final step in computing formal's aliases is to combine the results computed using the binding graph (which globals each formal is aliased to) with the new results computed using the pair binding graph (which other formals each formal is aliased to):

Here are the final Alias sets for all globals and formals:
Alias(f1, a) = { g1, f2 }    Alias(f1, b) = { }           Alias(f1, c) = { }
Alias(f2, a) = { g2, f1 }    Alias(f2, b) = { }           Alias(f2, c) = { }
Alias(f3, a) = { }           Alias(f3, b) = { g2, g3 }    Alias(f3, c) = { }
Alias(f4, a) = { }           Alias(f4, b) = { g1 }        Alias(f4, c) = { }
Alias(f5, a) = { }           Alias(f5, b) = {  }          Alias(f5, c) = { g1, f6 }
Alias(f6, a) = { }           Alias(f6, b) = {  }          Alias(f6, c) = { g2, f5 }
Alias(g1, a) = { f1 }        Alias(g1, b) = { f4 }        Alias(g1, c) = { f5 }
Alias(g2, a) = { f2 }        Alias(g2, b) = { f3 }        Alias(g2, c) = { f6 }
Alias(g3, a) = {  }          Alias(g3, b) = { f3 }        Alias(g3, c) = { }

Once alias information is known, we can use it to compute GMOD sets for every call site and for every procedure:

For our example:
           DMOD             Aliases                           Final GMOD

GMOD(s1) = { x }             ---                              { x }
GMOD(s2) = { g2 }            ---                              { g2 }
GMOD(s3) = { g3 }            ---                              { g3 }
GMOD(s4) = { f2 } U Alias(f2, a) = { f2 } U { f1, g2 } =      { f1, f2, g2 }
GMOD(s5) = { }

GMOD(main) = { x, g2, g3 }   ---                              { x, g2, g3 }
GMOD(a)    = { f2 } U Alias(f2, a) = { f2 } U { f1, g2 } =    { f1, f2, g2 }
GMOD(b)    = { f3 } U Alias(f3, b) = { f3 } U { g2, g3 } =    { f2, f3, g3 }
GMOD(c)    = { }             ---                              { }
Note that GMOD(a) does not include g1 even though a modifies f2 and f2 may be aliased to f1, and f1 may be aliased to g1. The reason is that those two aliases occur on different calls to a, so their effects are not combined (and this is correctly reflected in the computed GMOD set)!

Note also that for dataflow analysis, it is the call site GMOD sets that we would use to define the dataflow function for a call node, not the called procedure's GMOD set (because the GMOD set for the call site tells what may be modified as a result of that particular call, rather than what might be modified by the called procedure on some call).


Return to Interprocedural Analysis table of contents.

Go to the previous section.

Go to the next section.