Implementing Functional Languages


Contents


Overview

Some issues that arise when considering how to implement functional languages are how to implement:

To explore these issues, we will look at how to define interpreters for a (functional) subset of LISP. In particular, we will start by looking at a language called clean LISP, and later we will make a few extensions.

Clean LISP

Following tradition (see the paper on the history of LISP by John McCarthy) we will write the interpreter itself in clean LISP. However, the actual syntax of clean LISP is not very nice, so we'll use a more readable version, defined by the context-free grammar given below (italicized names are nonterminals; non-italicized names and symbols are terminals; star means zero-or-more and plus means one-or-more; the last two nonterminals are defined in English rather than using actual grammar productions).

program fn-definition* expression
fn-definition id [ id-list ] ← expression
expression fn-application | s-expr
fn-application id [ expression-list ]
s-expr atom | list | ( s-expr . s-expr )
atom T | NIL | ID | id
list NIL | ( s-expr+ )
id-list zero-or-more-ids-separated-by-commas
expression-list zero-or-more-expressions-separated-by-commas

Notes:

The built-in (or primitive) functions are:

functiontypeexamples or definitions
cons s-expr X s-expr → s-expr cons[X,Y] = (X.Y)
cons[X,(Y)] = (X Y)
cons[(A B),(Y)] = ((A B) Y)
car s-expr → s-expr car[(X.Y)] = X
cdr s-expr → s-expr cdr[(X.Y)] = Y
null s-expr → {T, NIL} null[X] = T if X is NIL; NIL otherwise
atom s-expr → {T, NIL} atom[X] = T if X is an atom; NIL otherwise
eq s-expr X s-expr → {T, NIL} eq[x,y] = T if atom[x] and atom[y] and x=y;
NIL if (atom[x] or atom[y]) and not (x=y);
undefined otherwise
cond expression X expression X expression → expression if eq[expr1,T] then expr2 else expr3

We will also allow a few extensions to the syntax defined above:

Note that it is possible for clean LISP programs to cause runtime errors. We say that such programs evaluate to ⊥ (bottom). This happens, for example:

Below is an example program defined using our new syntax. The program consists of the definition of the concat function followed by a call to concat (recall that a clean LISP program is zero or more function definitions followed by an expression).


TEST YOURSELF #1

To help you get familiar with the clean LISP syntax, augment the example program given above by adding a definition for the reverse function that works by calling concat, as well as for the version that works by using an accumulating parameter. Be careful about the distinction between upper- and lower-case identifiers!


The Basic Interpreter

As mentioned above, our interpreter will be written in clean LISP, and will be used to interpret programs written in clean LISP. However, the input to the interpreter will not be the text of the program, instead, a program (which consists of zero of more function definitions followed by an expression) will be represented as two s-expressions, one for the defined functions, and the other for the expression. We'll call the translator's input language R notation; it is defined in the table below, which shows, for each kind of expression e, what R[e], the representation of e, will be.

LISP Expression e Kind of expression Representation R[e] Comment
T T
NIL NIL
id ID convert to upper-case
ID (QUOTE ID)
( e1 e2 ... en ) a list literal (QUOTE ( e1 e2 ... en )) the expressions in the list are not translated
(s-expr . s-expr) an s-expr literal (QUOTE ( s-expr . s-expr ))
f[e1, e2, ..., en] fn application (F R[e1] ... R[en]) each ei is an arbitrary expression
f[x1, ..., xn] ← exp fn definition (F . (LAMBDA (X1, X2, ..., XN) R[exp]))
if p1 then e1 else if p2 then e2 ... else en (COND (R[p1] R[e1]) (R[p2] R[e2]) ... (T R[en]) )

Note that the only lower-case identifiers in R-notation are inside a QUOTE. The lower-case identifiers in the source language (function names, formal parameters, and free variables) are all converted to upper case in the R-notation.

The representations of the language extensions (other than if-then-else, which is included in the table above) are defined as the representations of the corresponding non-extended construct. For example, the representation of caar[e] is the representation of car[car[e ]].

Here's an example: the representation of the concat function:


TEST YOURSELF #2

Give the representation of the reverse function you wrote for the previous self-study exercise.


Our initial interpreter consists of six functions:

  1. eval (the top-level function; its two arguments are the two s-expressions that represent the program's expression and its defined functions)

  2. apply (to handle function applications other than COND)

  3. evcond (to evaluate a COND)

  4. evlis (to evaluate actual parameters)

  5. lookup (to look up the value of an identifier in the current environment)

  6. pairlis (to augment the current environment with pairs that represent the bindings of formals to actuals)

Here's the code:

Note that all of the functions have a parameter named environ. This is a list of dotted pairs: (ID . value), mapping identifiers to values. The initial environment is eval's second argument: the list of function definitions from the program being interpreted; it binds function names to their definitions (remember that the representation of a function definition is an s-expression of the form:

so the initial environment has the correct form). In subsequent recursive calls to eval, the environment may also include bindings of formals to actuals (added to the front of the list by pairlis).

To understand how the interpreter works, let's consider each of its functions in more detail.

eval: Function eval is responsible for evaluating a given expression in a given environment. Note that the expression must be one of the following:

  1. an atom: In this case, if it is NIL or T, the atom itself is returned. Otherwise, it must be a formal parameter or a free variable, and its value is looked up in the environment and returned.

  2. a literal: i.e., expr is of the form (QUOTE lit). In this case, the literal value is returned.

  3. a function application: The function can be COND, another primitive function, or a user-defined function. COND is handled as a special case (because it is only strict in its first argument). Otherwise, apply is called to do the function application, Note that the second argument to apply is the result of calling evlis to evaluate the arguments.

apply: Function apply first checks whether the function being applied is a primitive function (and in that case, it just returns the result of applying the primitive function). If not, it must be a user-defined function. In that case, make a recursive call to apply; the function that is passed as the first argument is the result of looking up the function name in the current environment (and neither the arguments nor the environment changes). Recall that user-defined functions are stored in the environment as dotted pairs of the form:

So the recursive call will pass (LAMBDA (...) ...) as the function, and the final line of apply (the call to eval) will execute.

evlis: Function evlis is called by eval when the interpreter finds an application of a (primitive or user-defined) function other than COND. It evaluates each actual parameter on the given list in the context of the given environment, and returns a list of the evaluated actuals.

lookup: Function lookup looks up a given ID in the given environment and returns the associated value.

evcond: Function evcond handles applications of cond, which are strict only in their first parameter. It evaluates that parameter (the condition), and returns the value of the second or third parameter depending on the result.

pairlis: Given a list of IDs, a list of corresponding values, and the current environment, pairlis returns a new, extended environment that includes a dotted pair (ID . value) for each of the given IDs.


TEST YOURSELF #3

Question 1.

Recall that it is possible for a clean LISP program to cause a runtime error. Note that the interpreter includes no code to check for errors. What happens if

  1. the program being interpreted tries to apply car to an empty list?
  2. the program being interpreted tries to apply car to something that is not a list (e.g., to T).
  3. the program being interpreted tries to call an undefined function?

Question 2.

Consider the following program:

Note that every function uses x as its formal parameter. How does the interpreter ensure that each time a function uses x, the correct value is used?

Call-by-name and Call-by-value

Let's consider whether our interpreter provides call-by-name or call-by-value semantics for user-defined functions. Since the interpreter itself is written in LISP, in order to answer that question we need to know whether calls made by the interpreter to non-primitive functions use call-by-name or call-by-value. (Recall that, by definition, the primitive functions other than cond are strict, so we can assume that whenever the interpreter calls a primitive function, all arguments are evaluated.)

It would be very convenient if we had the following situation:

Unfortunately, that is not the case. If the interpreter itself uses call-by-value, then programs will be interpreted using call-by-value; however, if the interpreter itself uses call-by-name, programs will not be interpreted using call-by-name (they will use a strange hybrid, which we investigate below).

First, let's assume that the interpreter itself uses call-by-value. What happens when a user-defined function in the program being interpreted is called? That case is handled by the last line of eval:

Since we're assuming that the interpreter itself uses call-by-value, all of apply's arguments will be evaluated when apply is called.

Note that a function call (in the program being interpreted) is represented by a list of the form: (fnName actual1 actual2 ... actualn). So the actual parameters are the tail of the list; i.e., they are what is passed as the first parameter to evlis. Function evlis recursively processes each item in its parameter list by passing the item to eval; i.e., it evaluates all of the actual parameters (which is what is supposed to happen under call-by-value semantics).

So if we want call-by-value, all we need to do is make sure that the interpreter itself uses call-by-value.

Now let's consider what happens if the interpreter itself uses call-by-name. In that case, when apply is called by eval, apply's arguments are not evaluated before the call is made (i.e., the call to evlis is delayed).


TEST YOURSELF #4

The call to apply from eval has two actuals that require evaluation (the first and the second). When in apply is the first formal used, causing the first actual to be evaluated?


We are concerned with apply's second argument, args, since that represents the parameters passed to the function that is being applied, and we're concerned with what apply does when it's applying a user-defined function (not a primitive one). To make a long story short, evaluation of args is delayed until that user-defined function uses an identifier: either a formal parameter or a function name. At that point, several layers of delayed evaluation get "unwound", leading to the call to evlis that evaluates the user-defined function's actual parameters. Function evlis evaluates the actual parameters via function calls made inside a call to cons; since cons is strict in both arguments, that causes all of the user-defined function's actuals to be evaluated.

So the bottom line is that if the user-defined function uses no names at all, then its actuals are never evaluated; as soon as its uses any name, all of its actuals are evaluated. That is clearly neither call-by-name nor call-by-value.


TEST YOURSELF #5

Trace the chain of calls that pass along the unevaluated actual parameters in all its gory detail. You might want to use a specific example like:

In this example, the user-defined function f ignores its second parameter, so running this program should not cause a runtime error. Convince yourself that if this program is interpreted by our current interpreter, a runtime error (an attempt to apply car to an empty list) does occur.

How can we modify our interpreter so that we get call-by-name semantics for user-defined functions? The basic idea is to change eval and apply so that instead of evaluating actuals when a function is called, we "package up" each actual with the corresponding formal name. That way, when the called function uses a formal, it can find (and evaluate) the appropriate actual.

The following changes are needed:

Change the way eval calls apply.

The idea is to prevent eval from evaluating arguments. The original code in eval is:

and the new code is: Note that the new code passes the unevaluated actuals (cdr[exp]) to apply.

Change how apply handles calls to primitive functions.

Since arguments are no longer evaluated by eval, we must change apply so that all arguments passed to an application of a primitive function are evaluated before the call. Original code:

New code: Similar changes must be made for calls to the other primitive functions.

Change how apply handles calls to user-defined functions.

We want to change the last line of apply so that when it calls eval to evaluate a call to a user-defined function it augments the current environment with information about the unevaluated actuals, and the corresponding formals. Note however that an additional piece of information must be provided to handle actuals that use identifiers correctly. To understand this, consider the following program:

When this program executes, the call to h should create the list (B), and the call to g should create the list (A B), which is the final result.

However, if we modify the interpreter so that when apply calls eval it simply adds to the front of the environment one pair ( formal . actual ) for each parameter of the called function, we will get the wrong result.

Thinking about what happens at a high level (i.e., not worrying about how the interpreter actually works) here's what would happen when the example program above executes using call-by-name if we simply "match up" formals and their corresponding unevaluated actuals in the environment:

  1. Function f is called. The environment is a list with one pair: ((x.B)).
  2. f calls g. The environment is now: ((x.A) (y.h[x]) (x.B)).
  3. The call to cons in g uses both of g's formals, so they need to be evaluated. The first formal, x, is matched in the environment with the literal A, which needs no evaluation. However, the second formal, y, is matched with h[x], which does require evaluation (i.e., h must be called).
  4. When h is called, the environment is: ((z.x) (x.A) (y.h[x]) (x.B)).
  5. The call to cons in h has formal, z, as the first argument, so that formal must be evaluated. The corresponding expression is x, so evaluation involves looking x up in the environment. The problem is that there are two bindings for x in the environment. The one we get is the one that occurs first in the list, namely: (x.A). Unfortunately, that is the wrong one; we really want: (x.B).

The solution to this problem is to have apply include the current environment when it packages up formals and their corresponding (unevaluated) actuals. This means changing the type of the environment. Instead of just pairs of identifiers and their values, we will have pairs of the form:

where exp is an unevaluated expression and env is the environment in which to evaluate it.

The change to apply is made in the last line. The original code is:

The new code is: The new function makePairs is: Now that we've changed the form of an environment, two more changes are necessary:
  1. Change lookup to return the evaluated expression associated with a given identifier.
  2. Change the form of the initial environment.

1. Change lookup

We need to change lookup so that when it looks up an identifier in the environment it also evaluates it.

Original code:

New code:

The new function force takes a pair (exp . environ), and returns the value of exp evaluated in environment environ:

2. Change the form of the initial environment (which matches function names with their definitions).

This change is made, not to the interpreter, but to the definition of R-notation; in particular, to the way a function definition is represented. The old representation was a pair of the form ( function-name . body ):

Now we need to have a pair of the form: such that evaluating exp in env produces the function body. Here's the new representation: This works because evaluating an expression of the form: simply returns exp (the environment isn't used, so it's OK for it to be NIL).

This ends the changes needed to our interpreter to provide call-by-name parameter passing. From now on we will assume that we are dealing with the original interpreter that provides call-by-value parameter passing for user-defined functions.

Higher-Order Functions

Next we will consider passing functions as parameters and returning them as function results. For each, we will consider:

Functions as parameters

We'll start by considering whether clean LISP allows functions (both primitive and user-defined) to be passed as parameters. The relevant grammar rules are:
fn-application id [ expression-list ]
expression-list zero-or-more-expressions-separated-by-commas
expression fn-application | s-expr
s-expr atom | list | ( s-expr . s-expr )
atom T | NIL | ID | id
We can see that actual parameters are arbitrary expressions, and one kind of expression is an atom, and one kind of atom is an identifier. Since a function name is an identifier, this means that we can pass functions as parameters.

Now let's consider whether the interpreter supports functions as parameters. When a function is called, evlis is called to evaluate each actual. An actual that is an atom other than T or NIL is looked up in the environment, and the corresponding value is used as the value of the actual. That's fine for user-defined functions, but primitive functions are not in the environment!


TEST YOURSELF #6

Consider the following clean LISP program:

Trace what happens when this program is executed by the interpreter. At what point is there a runtime error?

What list l could be used so that the call map[l, cons] does not cause a runtime error?


There are several ways we could change the interpreter to handle primitive functions passed as parameters; for example, we could change lookup to recognize primitives, or we could change the initial environment to include primitives.


TEST YOURSELF #7

For each of the two changes to the interpreter proposed above to handle primitive functions as parameters, specify the actual changes (to the code, for the first change, and to the values initially passed to the interpreter for the second change).


Functions as function results

In clean LISP, can functions be returned as function results? Yes, because a function name is an id and id is an expression, and function results are expressions. So for example, we can define the following function, which returns a function:

However, clean LISP syntax does not allow the result of a function to be applied "directly". For example, we cannot write:

because the rule for function application is: i.e., the applied function is assumed to be specified via its name (not as the result of a function call).

We can use an "intermediate" function to allow "indirect" calls to functions returned as function results; for example:

but this is not very nice...

To allow "direct" application of the result of a function call to a list of arguments, we must change the grammar rule for function application to:

and we must change the corresponding R-notation:

Now let's consider what changes need to be made to the interpreter. To help us understand what needs to change, let's consider the specific example given above (applying the result of a call to choose to the list (A B). Using our new R-notation, the representation would be:

i.e., that would be the exp passed to eval when the interpreter is started. Since exp is not an atom, and its car is neither QUOTE nor COND, apply would be called with: Since fn is neither a primitive function name nor an atom, apply would conclude that it must be of the form (LAMBDA (...) ...). However, this is not correct, and so we clearly must add a new case to apply to account for functions that are non-LAMBDA expressions. In particular, we'll change: to: Note that, for our example (using choose), the evaluation of ( CHOOSE NIL ) produces CAR, which is then applied to ( A B ), producing A as expected.

Lazy Constructors

Our final topic is how to modify the interpreter to make cons lazy. This involves changing the way the interpreter calls cons, car, and cdr. Instead of passing cons its evaluated arguments, we will pass it two dotted pairs (called suspensions), each consisting of an unevaluated argument and the current environment. Then we'll add code before the invocations of car and cdr to look for and resolve suspensions.

For example, consider the call:

The representation of this expression is: An eager implementation would first evaluate the two arguments and then would call cons with the results: producing the list: (A Y Z).

Our proposed lazy implementation would package up the two unevaluated arguments with the current environment and would call cons as follows:

producing the dotted pair:

But this won't quite work, because we also need a way for car and cdr to recognize suspensions (i.e., to know that the head and/or tail of the list consists of a not-yet-evaluated expression). So we'll package up the literal SUSPEND with each unevaluated argument and the current environment, producing:

OK, now we know what we want to do, where do we actually need to make changes? The first change is to eval, which must handle CONS as a special case (or else it will evaluate CONS's arguments, since we're assuming our original interpreter with call-by-value parameter passing). This means changing the last line of eval from:

to: adding the following definition of suspend: and removing from apply the code that looks for and calls cons (since that is now handled in eval).

To illustrate how this works, consider the expression

which is represented as: The picture below shows the suspensions that are created by eval when it is called to evaluate the expression in an environment env (the labels (a), (b), etc are used later below).

         (a)
         -----------------------------------------------------------------
        |                                                                 |
        |                            -------                              |
        |                            |CONS |                              |
        |                            |-----|                              |
        |                            |  |  |                              |
        |                            -/---\-                              |
        |                            /     \                              |
        |                           /       \                             |
        |                          /         \                            |
        |                         /           \                           |
        |                        /             \                          |
        |                       /               \                         |
        |                      /                 \                        |
        |                     /                   \                       |
        |  (b)               /                     \                      |
        |  -----------------/-------------          \                     |
        | |                /              |          \                    |
        | |               /               |           \                   |
        | |              /                |            \                  |
        | |         -------               |           -------             |
        | |         |CONS |               |           |CONS |             |
        | |         |-----|               |           |-----|             |
        | |         |  |  |               |           |  |  |             |
        | |         -/---\-               |           -/---\-             |
        | |         /     \               |           /     \             |
        | |        /       \              |          /       \            |
        | |       /         \             |         /         \           |
        | |      /           \            |        /           \          |
        | |     /           -------       |       /           -------     |
        | | SUSPEND         |CONS |       |   SUSPEND         |CONS |     |
        | |                 |-----|       |                   |-----|     |
        | |                 |  |  |       |                   |  |  |     |
        | |                 -/---\-       |                   -/---\-     |
        | |         (c)     /     \  (d)  |                   /     \     |
        | |         -------/---    \----  |                  /       \    |
        | |        |      /    |  | \   | |                 /         \   |
        | |        | (QUOTE A) |  | env | |            (QUOTE B)      env |
        | |        |           |  |     | |                               |
        | |         -----------    -----  |                               |
        | |                               |                               |
        |  -------------------------------                                |
        |                                                                 |
         ----------------------------------------------------------------- 

The other change that must be made is to change calls to car and cdr so that they look for and resolve suspensions. That requires changing the code in apply that handles those two functions. In particular, we must change:

to the following (with annotations to show the relationship between the expressions in the code and the parts of the suspension illustrated above): Where function first is defined as follows (using auxiliary function suspended):

Similar changes are made for the code that handles CDR.


TEST YOURSELF #8

Assume that atoms are extended to include integer literals, and that succ is a primitive function. Trace the interpreter, and draw the suspensions created for the following program: