Partial Evaluation

Motivation and Overview
Partial Evaluation of a Simple, Imperative Language
Partial Evaluation of a Simple, Functional Language

Motivation and Overview

In addition to reading these notes, I suggest that you read Chapter 1 of Neil Jones's book for additional background material, and chapter 13 of that book for more discussion of the applications of partial evaluation.

Partial evaluation (PE) is a technique for program specialization. The idea is to optimize a program by specializing it with respect to some of its inputs. In particular, given

a program P
a classification of each of P's inputs as either static or dynamic
values for all of the static inputs

partial evaluation produces a residual program P_s such that

i.e., running the residual program on the dynamic inputs produces the same result as running the original program on all of the inputs.

Note that we already have an operation, namely currying, that produces a residual program with this property. For example:

_x=2

And if we apply plus_x=2 to any value y, we get the same result as applying plus to the two arguments, 2 and y. However, the only optimization provided by currying is to reduce the number of beta reductions. We want to do more than that. The partial evaluation that we will look at will do (more or less) the following optimizations:

constant folding
loop unrolling
function specialization and inlining

Intuitively, partial evaluation of a program (or of a function) is most likely to be useful (to speed up execution) when

the function is evaluated many times, and
some subset of its arguments are the same every time, and
a significant part of the function's computation depends only on the values of those arguments.

The book on partial evaluation by Neil Jones et al includes a section on Partial Evaluation in Practice (section IV, page 261), which includes some examples. They say that partial evaluation has been applied successfully to the following kinds of problems:

pattern recognition
ray tracing of solid models
neural network training
database queries
spreadsheet computations
scientific computing
discrete hardware simulation

An interesting application of partial evaluation (though one that I think has not turned out to be of practical value) is its use for automatic program generation. This was originally defined by Futamura (and thus, the following are known as the Futamura projections):

Using Partial Evaluation (PE) to Compile a Program: An interpreter takes two arguments, a program P and P's input I, and "runs" P on I:
Whenever we have a program with more than one input (as we do for interp), there's an opportunity for specialization. In this case, we can specialize interp with respect to a given P
```
                   +----+
        interp --> | PE | --> interp_P
             P --> |    |           
                   +----+
```
producing interp_P such that
We expect that interp_P(I) will be faster than interp(P, I). Furthermore, if interp is written in machine code, then interp_P will also be in machine code. In this case, specialization acts like a compiler: partial evaluation of (interp, P) is like compiling P, and interp_P is a "compiled" version of P.
Using PE to Create a Compiler: In the previous example, PE was applied to a pair of inputs: interp and P. So again we have an opportunity for specialization. What if we specialize the partial evaluator itself with respect to interp?
```
                   +----+
        PE     --> | PE | --> PE_interp
        interp --> |    |           
                   +----+
```
We get PE_interp, which takes one input, a program P, and produces a compiled version of P. So PE_interp is a compiler.
Using PE to Create a Compiler Generator: Finally, what if we specialize the partial evaluator with respect to itself?
```
                +----+
        PE  --> | PE | --> PE_PE
        PE  --> |    |      
                +----+
```
We get a program whose input is an interpreter and whose output is a compiler (for the language of the interpreter):
So PE_PE is a compiler generator.

Partial Evaluation of a Simple, Imperative Language

First we will consider how to do partial evaluation of a simple, imperative language. Then we will see what needs to change to handle a simple, functional language.

For now, we will assume that the input to Partial Evaluator PE has three parts:

A program P, written in a simple, imperative language with the following features:
- every basic block starts with a label
- only the following kinds of statements are allowed:
  - I/O (initial read and final print) statements
  - assignment statements
  - goto <label>
  - if <cond> then goto <label> else goto <label>
- every basic block ends with a (conditional or unconditional) goto.
Note that conditional and unconditional gotos can only be at the ends of basic blocks.
In these notes, we may write code in this low-level language, or we may write the code in a higher-level language (e.g., with loops), with the understanding that PE really works on the low-level form.
A division of P's variables (all of its variables, not just its inputs) into static/dynamic, such that for all assignments id = exp, if exp includes any dynamic variables, then id is classified as dynamic. A division that has this property is called a congruent division. (In fact, as we will see later, we really need a stronger property than congruence.)
Values for all static inputs.

Example

Below is code that searches two "parallel" name and value lists for a given name that is known to be in the first list. When the name is found (in position j in the name list), it returns the associated value (the value in position j in the value list). Here is a high-level version of the code:

read nameList
read valueList
read name
while (name != car(nameList)) { 
   valueList = cdr(valueList) 
   nameList  = cdr (nameList)  
}                             
print car(valueList)

And here is the corresponding low-level version (shown as a control-flow graph, using basic blocks, gotos, and labels):

          +-------------------------------+
          |  Enter                        |
          +-------------------------------+
                           |
                           v
          +-------------------------------+
          | L1:                           |
          |   read nameList               |
          |   read valueList              |
          |   read name                   |
          |   goto L2                     |
          +-------------------------------+
                           |
                           v
          +-------------------------------+
          | L2:                           |
          |   if (name != car(nameList))  |---------+
          |   then goto L3                |         |
          |   else goto L4                |<--+     |
          +-------------------------------+   |     |
                           |                  |     |
                           v                  |     |
          +-------------------------------+   |     |
          | L3:                           |   |     |
          |   nameList = cdr(nameList)    |   |     |
          |   valueList = cdr(valueList)  |---+     |
          |   goto L2                     |         |
          +-------------------------------+         |
                                                    |
                               +--------------------+
                               |
                               |
                               v
          +-------------------------------+
          | L4:                           |
          |   print car(valueList)        |
          |   goto Exit                   |         
          +-------------------------------+
                               |
                               v
          +-------------------------------+
          |  Exit:                        |
          |    return                     |
          +-------------------------------+

Given

the above program, and
the division that says that name and nameList are static and valueList is dynamic, and
the value ["susan", "john", "ann"] for nameList and "ann" for name

partial evaluation (as defined below) will produce the following residual program containing just one basic block:

<L1, ("ann", ["susan", "john", "ann"])>: read (valueList)
  valueList = cdr(valueList) valueList = cdr(valueList) print
  car(valueList)

The PE Algorithm

Here's the basic idea of how PE works to produce a residual program:

Start with pp₀, the first program point, and with vs₀, the initial values of the static variables.
Process the statements in the current basic block.
For assignments to static variables, just update vs (and emit no code). For assignments to dynamic variables, simplify the right-hand-side expression as much as possible (using vs), then emit the simplified assignment.
If the current basic block ends with an unconditional goto, or with a conditional goto whose condition only uses static variables, then do not emit a goto. Instead, append the (residual) code for the block that is the target of the goto to the end of the current (residual) block; i.e., start processing the statements in the target block.
If the current basic block ends with a conditional goto whose condition uses dynamic variables, then simplify the condition as much as possible and emit it, but change the targets of the gotos to new labels that include the values in vs. Also add the new labels to a worklist of basic block labels that still need to be processed.
Do the above steps until the worklist is empty.

The PE algorithm is given below. It includes calls to (undefined) function reduce. That function takes an expression and the current values of the static variables, and returns a new version of the expression simplified via constant folding.

PE(program, division, vs₀) {
   pending = { (pp₀, vs₀) }   // pp₀ is the label of the first block
   marked = { }

   while (pending is not empty) {
      remove one pair (pp, vs) from pending  // process next basic block
      add (pp, vs) to marked
      emit code label <pp, vs>
      bb = lookup(pp, program)  // bb is a copy of the block labeled pp
      while (bb is not empty) {
         remove the next statement S from bb
         switch (kind(S)) {
          case "read var":
             if (var is dynamic) {
               emit code: read var
             }
             
          case "x = exp":
             if (x is static) {
                update vs with x's new value
             } else {
                emit code: x = reduce(exp, vs)
             }

          case "goto pp'":
             // we're at the end of the current basic block
             // do not emit a goto
             // instead, start processing the code from the target
             // this may cause code duplication (discussed later)
             bb = lookup(pp', program)  

          case "print exp":
             emit code: print reduce(exp, vs)

          case "if exp then goto pp1 else goto pp2":
             // this must be the last stmt in the current basic block
             if (exp is static) {
                // similar to unconditional goto above
                // don't emit a goto
                // instead, start processing the code from the target
                if (reduce(exp, vs)) {
                   bb = lookup(pp1, program)
                } else {
                   bb = lookup(pp2, program)
                }
             } else {
               // exp uses a dynamic variable
               // if we already generated code for pp1 and/or pp2 with
               // current values of static vars, then don't put those
               // labels in "pending" or we might never terminate!
               if ((pp1, vs) not in marked) {
                  insert (pp1, vs) into pending if not already there
               } 
               if ((pp2, vs) not in marked) {
                  insert (pp2, vs) into pending if not already there
               }
               emit code: if reduce(exp) then goto <pp1, vs> else goto <pp2, vs>
             }
         } // end switch
      } // end iterating through current basic block
   } // end while pending set is non-empty
}

Here is a table that traces the execution of the PE algorithm on the example program.

Stmt S Current bb vs Emitted Code
<L1, ("ann", ["susan", "john", "ann"])>
read nameList L1 "ann", ["susan", "john", "ann"]
read valueList read valueList
read name
goto L2
if (name != car(nameList) then goto L3 else goto L4 L2
nameList = cdr(nameList) L3
valueList = cdr(valueList) "ann", ["john", "ann"] valueList = cdr(valueList)
goto L2
if (name != car(nameList) then goto L3 else goto L4 L2
nameList = cdr(nameList) L3
valueList = cdr(valueList) "ann", ["ann"] valueList = cdr(valueList)
goto L2
if (name != car(nameList) then goto L3 else goto L4 L2
print car(valueList) L4 print car(valueList)

TEST YOURSELF #1

Consider the following program, which computes a^x, for a>0 and x>=0.

read a, x
ans = 1
while (x > 0) {
  ans = ans * a
  x = x - 1
}
print ans

Part (a): Write the corresponding low-level program.

Part (b): Assume that the division is (x: static, a, ans: dynamic), and that x has the value 2. Trace the execution of the PE algorithm and produce the residual program.

solution

Dead Static Variables

Consider the following code fragment:

  x = 10
  if dynamic-expression then goto L2 else ...

  x = 20
  if dynamic-expression then goto L2 else ...

L2:
  x = 30
  ...

Assume that x has been classified as static. Then partial evaluation of the above code will produce the following residual code:

  if dynamic-expression then goto <L2, 10> else ...

  if dynamic-expression then goto <L2, 20> else ...

<L2, 10>:
  x = 30
  ...

<L2, 20>:
  x = 30
  ...

Note that PE has created two copies of block L2 that differ only in their (new) labels. If block L2 did not start with an assignment to x, and if it includes a use of x, then we would want two copies, because they would be specialized differently based on the different initial values of x. It is the fact that x is dead at the start of block L2 that causes the two copies to be identical, and to cause useless blow-up of the residual program.

Fortunately, it is not difficult to avoid this problem. If we do standard live-variable analysis, we will know which variables are live at the start of each block. When processing

if dynamic-expression then goto L2 else ...

instead of inserting (L2, vs) into pending, we can insert (L2, vsLive), where vsLive is the subset of vs that includes values only for the variables that are live at L2.

How to Compute a Division

When we first defined partial evaluation, we said that we are given a classification of the program's inputs as either static or dynamic. However, when we specified the inputs to PE, we said that we are given a division of all variables into static / dynamic (not just its inputs). The process of computing a division from a specification of static/dynamic just for inputs is called a binding time analysis. In the Jones book (section 4.4.6 pages 83-84), a very simple binding time analysis is defined as follows:

Create an initial division B that includes all program variables. For each variable v that is an input variable, use v:static or v:dynamic according to the given initial classification. For all other variables v, use v:static.

If the program includes an assignment

exp

such that x:static is in division B, and exp includes some variable v such that v:dynamic is in B, then replace x:static with x:dynamic in B.

Repeat step 2 until there is no change to B.

This binding time analysis produces a congruent division. However, the division is not safe in the sense that it can cause partial evaluation to fail to terminate, even for a program that always terminates. Here's an example:

L1:
  read x
  y = 0
  goto L2

L2:
  if (x > 0) then goto L3 else goto L4

L3:
  y = y + 1
  x = x - 1
  goto L2

L4:
  print y

Given the initial classification x:dynamic, the binding time analysis given above will classify y as static. Let's consider what the PE algorithm will do. It will start by generating the following code:

<L1, ?>:
  read x
  if (x > 0) then goto <L3, 0> else goto <L4, 0>

and putting the pairs (L3, 0) and (L4, 0) in the pending set. Processing the pair (L3, 0) causes the current value of y to be updated from 0 to 1; the following code is generated:

<L3, 0>:
  x = x - 1
  if (x > 0) then goto <L3, 1> else goto <L4, 1>

and the pairs (L3, 1) and (L4, 1) are added to the pending set.

Clearly, the algorithm will never terminate, because it keeps generating new instances of the loop for larger and larger values of y.

Jones was aware of this problem, and he investigated alternatives, some of which are discussed in his book. However, we will consider a simpler way to ensure that a division is safe; i.e., if a program terminates, then the PE algorithm will, too. Our approach uses a representation of programs called the Program Dependence Graph, or PDG. It is more straightforward to define PDGs and how to use them to do binding time analysis using a program's high-level form, so that is what we will do below.

The PDG for a procedure has the same nodes as the procedure's control-flow graph (CFG), except that the PDG has no exit node. (We're talking here about a CFG that has one node for each statement and each condition, not one that has one node for each basic block.) The edges of the PDG represent the procedure's flow and control dependences.

Flow-Dependence Edges: Flow-dependence edges are the same as def-use chains: there is a flow-dependence edge m→n iff all of the following hold:

m assigns to some variable x
n uses x
there is an x-definition-free path in the CFG from m to n.

TEST YOURSELF #2

Draw the CFG for the example program given above (and repeated below in its high-level form), then draw the PDG with the flow-dependence edges.

  read x
  y = 0
  while (x > 0) {
    y = y + 1
    x = x - 1
  }
  print y

solution

Control-Dependence Edges: The source of a control-dependence edge is always a condition, (and the enter node is considered to be a condition that always evaluates to true). Having a control-dependence edge m→n means that condition m controls whether and how often n executes. For the simple language that we are considering, a PDG's control-dependence edges reflect the PDG's nesting structure:

For every if-statement, there is a control-dependence edge from the node that represents the condition to each node that represents a statement in the then-part and to each node that represents a statement in the else-part.
For every while-loop, there is a control-dependence edge from the node that represents the loop condition to each node that represents a statement in the body of the loop.
There is a control-dependence edge from the enter node to each node that represents a statement in the procedure that is not inside any if-statement or while-loop.

TEST YOURSELF #3

Add the control-dependence edges to the PDG that you drew for the previous exercise.

solution

Binding Time Analysis: The problem with the simple binding time analysis given above, is that while it takes into account flow dependences, it ignores control dependences. To fix the problem, we can use the PDG can be used to find a safe, congruent division as follows:

For each variable x identified as dynamic in the initial classification, compute the transitive closure in the PDG starting from "read x".
For each variable y such that an assignment "y = exp" is in the transitive closure computed in step 1, add y:dynamic to the division.
Repeat steps 1 and 2 for each variable newly classified as dynamic until there is no change to the division.

The original simple binding time analysis defined by Jones is equivalent to the above technique if when computing the transitive closure in the PDG we follow only flow-dependence edges. Including control-dependence edges, too, makes the division safe.

Uniform vs Pointwise Divisions

So far, we have assumed that there is just one division that is valid at all program points. This is called a uniform division. The advantages of using a uniform division are that it is simpler to compute and to use than a non-uniform division. However, it has one, potentially major disadvantage: it permits less optimization than a non-uniform division. For example, consider the following program:

read x
read y
z = x + y
y = x * 2
w = z/y
print w

If the initial classification says that x is static (with value 10) and y is dynamic, then the division will be ({x}:static, {y, z, w):dynamic), and the residual program will be as follows:

read y
z = 10 + y
y = 20           // why include this assignment?
w = z/y          // why isn't this "w = z/20" ?
print w

An alternative to using a uniform division is to use a pointwise division, which provides one division for each basic block or even for each statement. An unsafe pointwise division can be computed using standard dataflow analysis techniques. The analysis is similar to constant propagation: the dataflow facts at each program point are the variables that are dynamic at that point. The initial dataflow fact is the set of variables specified as dynamic in the initial classification. The dataflow function for an assignment x = exp adds x to the set of dynamic variables if exp includes a dynamic variable, and otherwise it removes x from the set of dynamic variables. However, how to include the effects of control dependences (to make the division safe) is an interesting challenge.

TEST YOURSELF #4

The following program loops through an array; it adds the even values, subtracts the odd values, and prints the final result.

read array
read len
n = 0
result = 0
while (n < len) {
  item = array[n]
  if (even(item)) result = result + item
  else result = result - item
  n++
}
print result

Write the corresponding low-level program. Then, assuming that the initial classification of the inputs is

array

dynamic

len

static

compute a (uniform) division of the variables, trace the execution of the PE algorithm, and produce the residual program.

solution

Partial Evaluation of a Simple, Functional Language

Now let's consider how to do partial evaluation of a simple functional language. We'll assume that our language has the following features:

A program consists of one or more function definitions, one of which must be a definition of main.
A function definition specifies the function name, the names of its formal parameters, and the expression that is the function body.
Expressions include the following:
- identifiers (names of formal parameters)
- literals:
  - integer values like 1, 2, 3
  - boolean values: true or false
  - strings like "hello"
  - empty lists: nil
  - non-empty, homogeneous lists like [ "a", "b", "c"] or [ 1, 2, 3]
- if-then-else expressions
- function calls
- unary or binary operators, including the usual arithmetic and logical operators, as well as the list operators cons, car, cdr, and nil?.

We'll also assume that all of main's formal parameters are dynamic. The only potential static variables in a program will be the formal parameters of some other function.

Example

Here is the example program we used before (the one that searches name and value lists for a given name), this time written in our functional language.

(define (main valueList)
  (call find "ann" ["susan", "john", "ann"] valueList)
)

(define (find name nameList vList)
  (if (= (car nameList) name)
    then (car vList)
    else (call find name (cdr nameList) (cdr vList))
  )
)

In this example, as in the original example, the values of the name and the name list are provided. So while main's valueList parameter and find's vList parameter are dynamic, find's name and nameList parameters are static.

The PE Algorithm

As for the procedural case, partial evaluation of a functional program involves two basic steps:

Use a binding-time analysis to compute a division that classifies each formal parameter, each function return, and each expression as static or dynamic
Specialize the program, using a worklist to keep track of code that still needs to be specialized.

Step 1: Binding Time Analysis

Given a functional program, the basic rules for computing a division are as follows:

The formal parameters and return value of main are dynamic.
If any formals of a function are dynamic, then its return value is also dynamic.
If the expression used as actual N in a function call is dynamic, then the N^th formal parameter of the called function is also dynamic.

An expression is dynamic if it is any of the following:

A variable that is the name of a dynamic formal parameter.
A call to a function whose return value is dynamic.
An if-then-else where the condition and/or either branch is dynamic.
A unary or binary operator where one or both operands are dynamic.

NOTE: The above rules are not quite right: they are like the simple binding time analysis defined above for the procedural case. Both ignore the effects of control dependences This issue is explored further in a Test Yourself exercise below.

For the example program given above, the binding-time analysis would create the following division:

Static in main Dynamic in main Static in find Dynamic in find
"ann" valueList name vList
["susan" "john" "ann"] call find... nameList car vList
main's return car nameList cdr vList
= ... call find
if ... find's return
cdr nameList

Static in `main`	Dynamic in `main`	Static in `find`	Dynamic in `find`
`"ann"`	`valueList`	`name`	`vList`
`["susan" "john" "ann"]`	`call find...`	`nameList`	`car vList`
`main`'s return	`car nameList`	`cdr vList`
`= ...`	`call find`
`if ...`	`find`'s return
`cdr nameList`

Step 2: Specialization

Specializing a functional program is similar to specializing a procedural program. Both use a worklist to keep track of components that need to be specialized; both can sometimes create copies of components that are "specialized in place"; and both can sometimes create copies of components with new "names" that are based on the values of static variables.

Specializing "in place": For a procedural program, if the current basic block ends with an unconditional goto (or a static, conditional goto), we simply add the specialized version of the target block to the end of the current basic block. The analogy for the functional case is that when the current expression is a call to a function whose return value is static, we simply replace the function call with the specialized version of that function's body (using the values of the actual parameters, which will be static, in place of instances of the function's formal parameters).

Specializing a new copy whose "name" is based on static values: We have a similar analogy for the way conditional gotos with dynamic conditions are handled in the procedural case. In the procedural case, we change the goto targets to new labels that include the current static values, and we put those new labels on a worklist. The corresponding functional case occurs when there is a call to a function whose return value is dynamic. In this case, we replace the call with a call to a new function whose name includes the values of the static actuals, and we put that name on our worklist.

Here is a table that summarizes the different aspects of partial evaluation for the procedural and functional cases.

Procedural Functional
what BTA classifies variables variables (formals), function returns, expressions
what's in the worklist labels of the form <oldLabel><list of static-vars' values> fn names of the form <origName><list of static-parameters' values>
what's processed (specialized) each statement of the current basic block each (sub)expression of the current fn
what may be copied and specialized "in place" a basic block that is the target of an unconditional goto, or of a conditional goto w/ a static condition a fn whose formals and return are all static
what may be copied and have multiple specialized versions with new names a basic block that is the target of a conditional goto w/ a dynamic condition a fn w/ a dynamic return

	Procedural	Functional
what BTA classifies	variables	variables (formals), function returns, expressions
what's in the worklist	labels of the form <oldLabel><list of static-vars' values>	fn names of the form <origName><list of static-parameters' values>
what's processed (specialized)	each statement of the current basic block	each (sub)expression of the current fn
what may be copied and specialized "in place"	a basic block that is the target of an unconditional goto, or of a conditional goto w/ a static condition	a fn whose formals and return are all static
what may be copied and have multiple specialized versions with new names	a basic block that is the target of a conditional goto w/ a dynamic condition	a fn w/ a dynamic return

For our example functional program, the specialization phase would create the residual program shown below. In this case, there are three instances of new versions of functions whose names include the values of the static formals, and there is no instance of a function being specialized "in place" (because function find has a dynamic return). In this example, there is only one call to each of the three new functions. In general, it is possible to have multiple calls to new functions, just as, in the procedural case, it is possible to have multiple jumps to a new label.

(define (main valueList)
  (call find-"ann"-["susan" "john" "ann"] valueList)
)

(define (find-"ann"-["susan" "john" "ann"] vList)
  (call find-"ann"-["john" "ann"] (cdr vList))
)

(define (find-"ann"-["john" "ann"] vList)
  (call find-"ann"-["ann"] (cdr vList))
)

(define (find-"ann"-["ann"] vList)
  (car vList))
)

TEST YOURSELF #5

Below is the functional version of the procedural program that loops through an array, adding even values and subtracting odd values. (Since our functional language doesn't include arrays, we assume that the array parameter is actually a list.)

(define (main array) (call process array 0 0 3))

(define (process array n result len)
  (if (n < len)
   then (if (even (car array))
         then (call process (cdr array) (n + 1) (result + (car array)) 3)
         else (call process (cdr array) (n + 1) (result - (car array)) 3)
        )
   else result
  )

Compute a division for this program, then use specialization to produce the residual program.

TEST YOURSELF #6

The rules for computing a division of a functional program don't take control dependences into account. Find an example that illustrates the problem: i.e., an example of a program that does terminate when run, but for which partial evaluation using the given, unsafe rules for computing a division does not terminate.

Then give a new rule for computing a division that solves the problem.

Stmt S	Current bb	vs	Emitted Code
			`<L1, ("ann", ["susan", "john", "ann"])>`
`read nameList`	L1	`"ann", ["susan", "john", "ann"]`
`read valueList`			`read valueList`
`read name`
`goto L2`
`if (name != car(nameList) then goto L3 else goto L4`	L2
`nameList = cdr(nameList)`	L3
`valueList = cdr(valueList)`		`"ann", ["john", "ann"]`	`valueList = cdr(valueList)`
`goto L2`
`if (name != car(nameList) then goto L3 else goto L4`	L2
`nameList = cdr(nameList)`	L3
`valueList = cdr(valueList)`		`"ann", ["ann"]`	`valueList = cdr(valueList)`
`goto L2`
`if (name != car(nameList) then goto L3 else goto L4`	L2
`print car(valueList)`	L4		`print car(valueList)`

Partial Evaluation

Contents

Example

The PE Algorithm

Partial Evaluation of a Simple, Functional Language

Step 1: Binding Time Analysis