Some issues that arise when considering how to implement functional
languages are how to implement:
Following tradition (see the paper
on the history of LISP by John McCarthy) we will write the interpreter
itself in clean LISP.
However, the actual syntax of clean LISP is not very nice, so we'll
use a more readable version, defined by the context-free grammar given
below (italicized names are nonterminals; non-italicized names and
symbols are terminals; star means zero-or-more and plus means
one-or-more;
the last two nonterminals are defined in English rather
than using actual grammar productions).
Overview
To explore these issues, we will look at how to define interpreters
for a (functional) subset of LISP.
In particular, we will start by looking at a language called clean
LISP, and later we will make a few extensions.
Clean LISP
program | → | fn-definition* expression |
fn-definition | → | id [ id-list ] ← expression |
expression | → | fn-application | s-expr |
fn-application | → | id [ expression-list ] |
s-expr | → | atom | list | ( s-expr . s-expr ) |
atom | → | T | NIL | ID | id |
list | → | NIL | ( s-expr+ ) |
id-list | → | zero-or-more-ids-separated-by-commas |
expression-list | → | zero-or-more-expressions-separated-by-commas |
Notes:
The built-in (or primitive) functions are:
function | type | examples or definitions |
---|---|---|
cons | s-expr X s-expr → s-expr | cons[X,Y] = (X.Y) |
cons[X,(Y)] = (X Y) | ||
cons[(A B),(Y)] = ((A B) Y) | ||
car | s-expr → s-expr | car[(X.Y)] = X |
cdr | s-expr → s-expr | cdr[(X.Y)] = Y |
null | s-expr → {T, NIL} | null[X] = T if X is NIL; NIL otherwise |
atom | s-expr → {T, NIL} | atom[X] = T if X is an atom; NIL otherwise |
eq | s-expr X s-expr → {T, NIL} | eq[x,y] = T if atom[x] and atom[y] and x=y; |
NIL if (atom[x] or atom[y]) and not (x=y); | ||
undefined otherwise | ||
cond | expression X expression X expression → expression | if eq[expr1,T] then expr2 else expr3 |
We will also allow a few extensions to the syntax defined above:
Note that it is possible for clean LISP programs to cause runtime errors. We say that such programs evaluate to ⊥ (bottom). This happens, for example:
Below is an example program defined using our new syntax. The program consists of the definition of the concat function followed by a call to concat (recall that a clean LISP program is zero or more function definitions followed by an expression).
concat[l1,l2] ← if null[l1] then l2 else cons[car[l1], concat[cdr[l1],l2]] concat[(A B C), cons[X,(Y Z)]]
To help you get familiar with the clean LISP syntax, augment the example program given above by adding a definition for the reverse function that works by calling concat, as well as for the version that works by using an accumulating parameter. Be careful about the distinction between upper- and lower-case identifiers!
LISP Expression e | Kind of expression | Representation R[e] | Comment |
---|---|---|---|
T | T | ||
NIL | NIL | ||
id | ID | convert to upper-case | |
ID | (QUOTE ID) | ||
( e1 e2 ... en ) | a list literal | (QUOTE ( e1 e2 ... en )) | the expressions in the list are not translated |
(s-expr . s-expr) | an s-expr literal | (QUOTE ( s-expr . s-expr )) | |
f[e1, e2, ..., en] | fn application | (F R[e1] ... R[en]) | each ei is an arbitrary expression |
f[x1, ..., xn] ← exp | fn definition | (F . (LAMBDA (X1, X2, ..., XN) R[exp])) | |
if p1 then e1 else if p2 then e2 ... else en | (COND (R[p1] R[e1]) (R[p2] R[e2]) ... (T R[en]) ) |
Note that the only lower-case identifiers in R-notation are inside a QUOTE. The lower-case identifiers in the source language (function names, formal parameters, and free variables) are all converted to upper case in the R-notation.
The representations of the language extensions (other than if-then-else, which is included in the table above) are defined as the representations of the corresponding non-extended construct. For example, the representation of caar[e] is the representation of car[car[e ]].
Here's an example: the representation of the concat function:
(CONCAT . (LAMBDA (L1 L2) (COND ( (EQ L1 NIL) L2) ( T ( CONS (CAR L1) (CONCAT (CDR L1) L2))) ) ) )
Give the representation of the reverse function you wrote for the previous self-study exercise.
Our initial interpreter consists of six functions:
Here's the code:
eval [exp, environ] ← if atom[exp] then if eq[exp, NIL] then NIL else if eq[exp, T] then T else lookup[exp, environ] // lookup an identifier else if eq[car[exp], QUOTE] then cadr[exp] // a literal else if eq[car[exp], COND] then evcond[cdr[exp], environ] else apply[car[exp], evlis[cdr[exp], environ], environ] // a fn application apply [fn, args, environ] ← if eq[fn, CAR] then caar[args] else if eq[fn, CDR] then cdar[args] else if eq[fn, CONS] then cons[car[args], cadr[args]] else if eq[fn, ATOM] then atom[car[args]] else if eq[fn, EQ] then eq[car[args], cadr[args]] else if atom[fn] then apply[lookup[fn, environ], args, environ] else // fn must be of the form (LAMBDA (...) ...) eval[caddr[fn], pairlis[cadr[fn], args, environ]] evcond [list, environ] ← if eval[caar[list], environ] then eval[cadar[list], environ] else evcond[cdr[list], environ] evlis [list, environ] ← if null[list] then NIL else cons[eval[car[list], environ], evlis[cdr[list], environ]] lookup [var, environ] ← if null[environ] then NIL else if eq[var, caar[environ]] then cdar[environ] else lookup[var, cdr[environ]] pairlis [vars, args, environ] ← if null[vars] then environ else cons[cons[car[vars], car[args]], pairlis[cdr[vars], cdr[args], environ]]
Note that all of the functions have a parameter named environ. This is a list of dotted pairs: (ID . value), mapping identifiers to values. The initial environment is eval's second argument: the list of function definitions from the program being interpreted; it binds function names to their definitions (remember that the representation of a function definition is an s-expression of the form:
To understand how the interpreter works, let's consider each of its functions in more detail.
eval: Function eval is responsible for evaluating a given expression in a given environment. Note that the expression must be one of the following:
apply: Function apply first checks whether the function being applied is a primitive function (and in that case, it just returns the result of applying the primitive function). If not, it must be a user-defined function. In that case, make a recursive call to apply; the function that is passed as the first argument is the result of looking up the function name in the current environment (and neither the arguments nor the environment changes). Recall that user-defined functions are stored in the environment as dotted pairs of the form:
evlis: Function evlis is called by eval when the interpreter finds an application of a (primitive or user-defined) function other than COND. It evaluates each actual parameter on the given list in the context of the given environment, and returns a list of the evaluated actuals.
lookup: Function lookup looks up a given ID in the given environment and returns the associated value.
evcond: Function evcond handles applications of cond, which are strict only in their first parameter. It evaluates that parameter (the condition), and returns the value of the second or third parameter depending on the result.
pairlis: Given a list of IDs, a list of corresponding values, and the current environment, pairlis returns a new, extended environment that includes a dotted pair (ID . value) for each of the given IDs.
Question 1.
Recall that it is possible for a clean LISP program to cause a runtime error. Note that the interpreter includes no code to check for errors. What happens if
Question 2.
Consider the following program:
f[x] ← cons[x, g[cons[x, nil]]] g[x] ← h[cons[x, nil]] h[x] ← cons[x, nil] f[A]Note that every function uses x as its formal parameter. How does the interpreter ensure that each time a function uses x, the correct value is used?
Let's consider whether our interpreter provides call-by-name
or call-by-value semantics for user-defined functions.
Since the interpreter itself is written in LISP, in order to answer
that question we need to know whether calls made by the interpreter
to non-primitive functions use call-by-name or call-by-value.
(Recall that, by definition, the primitive functions other than
cond are strict, so we can assume that whenever the
interpreter calls a primitive function, all arguments are evaluated.)
It would be very convenient if we had the following situation:
First, let's assume that the interpreter itself uses call-by-value.
What happens when a user-defined function in the program being
interpreted is called?
That case is handled by the last line of eval:
Note that a function call (in the program being interpreted)
is represented by a list of the form:
(fnName actual1 actual2 ... actualn).
So the actual parameters are the tail of the list;
i.e., they are what is passed as the first parameter to evlis.
Function evlis recursively processes each item in its
parameter list by passing the item to eval;
i.e., it evaluates all of the actual parameters (which is what is
supposed to happen under call-by-value semantics).
So if we want call-by-value, all we need to do is make sure that the
interpreter itself uses call-by-value.
Now let's consider what happens if the interpreter itself uses call-by-name.
In that case, when apply is called by eval, apply's
arguments are not evaluated before the call is made (i.e.,
the call to evlis is delayed).
The call to apply from eval has two actuals
that require evaluation (the first and the second).
When in apply is the first formal used, causing the
first actual to be evaluated?
We are concerned with apply's second argument, args,
since that
represents the parameters passed to the function that is being applied,
and we're concerned with what apply does when it's applying
a user-defined function (not a primitive one).
To make a long story short, evaluation of args is delayed
until that user-defined function uses an
identifier: either a formal parameter or a function name.
At that point, several layers of delayed evaluation get "unwound",
leading to the call to evlis that evaluates the user-defined
function's actual parameters.
Function evlis evaluates the actual parameters via function
calls made inside a call to cons;
since cons is strict in both arguments, that causes all
of the user-defined function's actuals to be evaluated.
So the bottom line is that if the user-defined function uses no names
at all, then its actuals are never evaluated;
as soon as its uses any name, all of its actuals are
evaluated.
That is clearly neither call-by-name nor call-by-value.
Trace the chain of calls that pass along the unevaluated actual
parameters in all its gory detail.
You might want to use a specific example like:
How can we modify our interpreter so that we get call-by-name semantics
for user-defined functions?
The basic idea is to change eval and apply so that
instead of evaluating actuals when a function is called, we
"package up" each actual with the corresponding formal name.
That way, when the called function uses a formal, it can find (and
evaluate) the appropriate actual.
The following changes are needed:
Change the way eval calls apply.
The idea is to prevent eval from evaluating arguments.
The original code in eval is:
Change how apply handles calls to primitive functions.
Since arguments are no longer evaluated by eval,
we must change apply so that all arguments passed to
an application of a primitive function are evaluated before
the call.
Original code:
Change how apply handles calls to user-defined functions.
We want to change the last line of apply so that when it calls
eval to evaluate a call to a user-defined function
it augments the current environment with information
about the unevaluated actuals, and the corresponding formals.
Note however that an additional piece of information must be provided to
handle actuals that use identifiers correctly.
To understand this, consider the following program:
However, if we modify the interpreter so that when apply
calls eval it simply adds to the front of the environment
one pair ( formal . actual ) for each parameter
of the called function, we will get the wrong result.
Thinking about what happens at a high level (i.e., not worrying about
how the interpreter actually works) here's what would
happen when the example program above executes using call-by-name if
we simply "match up" formals and their corresponding unevaluated
actuals in the environment:
The solution to this problem is to have apply include the
current environment when it packages up formals and their
corresponding (unevaluated) actuals.
This means changing the type of the environment.
Instead of just pairs of identifiers and their values,
we will have pairs of the form:
The change to apply is made in the last line.
The original code is:
1. Change lookup
We need to change lookup so that
when it looks up an identifier in the environment it also evaluates it.
Original code:
The new function force takes a pair (exp . environ),
and returns the value of exp evaluated in environment environ:
2. Change the form of the
initial environment (which matches function names with
their definitions).
This change is made, not to the interpreter, but to the
definition of R-notation;
in particular, to the way a function definition is represented.
The old representation was a pair of the form
( function-name . body ):
This ends the changes needed to our interpreter to
provide call-by-name parameter passing.
From now on we will assume that we are dealing with the original
interpreter that provides call-by-value parameter passing for user-defined
functions.
Call-by-name and Call-by-value
interpreter uses call-by-value ⇒ programs are interpreted
using call-by-value
Unfortunately, that is not the case.
If the interpreter itself uses call-by-value, then programs will
be interpreted using call-by-value;
however, if the interpreter itself uses
call-by-name, programs will not
be interpreted using call-by-name (they will use a strange hybrid,
which we investigate below).
and
interpreter uses call-by-name ⇒ programs are interpreted
using call-by-name.
apply[car[exp], evlis[cdr[exp], environ], environ]
Since we're assuming that the interpreter itself uses call-by-value,
all of apply's arguments will be evaluated when apply
is called.
f [x, y] ← x
f [A, car[nil]]
In this example, the user-defined function f ignores its
second parameter, so running this program should not cause a
runtime error.
Convince yourself that if this program is interpreted by
our current interpreter, a runtime error (an attempt to apply
car to an empty list) does occur.
else apply[car[exp], evlis[cdr[exp], environ], environ]
and the new code is:
else apply[car[exp], cdr[exp], environ]
Note that the new code passes the unevaluated actuals (cdr[exp])
to apply.
if eq[fn, CAR] then caar[args]
New code:
if eq[fn, CAR] then car[eval[car[args], environ]]
Similar changes must be made for calls to the other primitive functions.
f[x] ← g[A, h[x]]
g[x,y] ← cons[x, y]
h[z] ← cons[z, nil]
f[B]
When this program executes, the call to h should create the
list (B), and the call to g should create the list (A B),
which is the final result.
(ID . (exp . env))
where exp is an unevaluated expression
and env is the environment in which to evaluate it.
eval[caddr[fn], pairlis[cadr[fn], args, environ]]
The new code is:
eval[caddr[fn], pairlis[cadr[fn], makePairs[args, environ], environ]
The new function makePairs is:
makePairs[args, environ] ←
if null[args] then NIL
else cons[ cons[ car[args], environ ],
makePairs[ cdr[args], environ ]
]
Now that we've changed the form of an environment, two more changes
are necessary:
if eq[var, caar[environ]] then cdar[environ]
New code:
if eq[var, caar[environ]] then force[cdar[environ]]
force[p] ← eval[car[p], cdr[p]] // p is of the form (exp. env),
// so car[p] is exp and cdr[p] is env
R[f[x1,...,xn] ← e] = (F. (LAMBDA (X1...Xn) R[e]))
Now we need to have a pair of the form:
( function-name . ( exp . env ) )
such that evaluating exp in env produces
the function body.
Here's the new representation:
R[f[x1,...,xn] ← e] = (F. ((QUOTE (LAMBDA (X1...Xn) R[e])) . NIL))
This works because evaluating an expression of the form:
(QUOTE exp)
simply returns exp (the environment isn't used, so it's
OK for it to be NIL).
fn-application | → | id [ expression-list ] |
expression-list | → | zero-or-more-expressions-separated-by-commas |
expression | → | fn-application | s-expr |
s-expr | → | atom | list | ( s-expr . s-expr ) |
atom | → | T | NIL | ID | id |
Now let's consider whether the interpreter supports functions as parameters. When a function is called, evlis is called to evaluate each actual. An actual that is an atom other than T or NIL is looked up in the environment, and the corresponding value is used as the value of the actual. That's fine for user-defined functions, but primitive functions are not in the environment!
Consider the following clean LISP program:
map[l, f] ← if null[l] then NIL else cons[f[car[l]], map[cdr[l], f]] map[((A B C) (X X) (Y)), cons]Trace what happens when this program is executed by the interpreter. At what point is there a runtime error?
What list l could be used so that the call map[l, cons] does not cause a runtime error?
There are several ways we could change the interpreter to handle
primitive functions passed as parameters;
for example, we could change lookup to recognize primitives,
or we could change the initial environment to include primitives.
For each of the two changes to the interpreter proposed above to handle primitive functions as parameters, specify the actual changes (to the code, for the first change, and to the values initially passed to the interpreter for the second change).
In clean LISP, can functions be returned as function results?
Yes, because a function name is an id and id is an expression, and
function results are expressions.
So for example, we can define the following function, which
returns a function:
However, clean LISP syntax does not allow the result of a function
to be applied "directly".
For example, we cannot write:
We can use an "intermediate" function to allow "indirect" calls
to functions returned as function results;
for example:
To allow "direct" application of the result of a function call to a
list of arguments, we must change the grammar rule for function application to:
Now let's consider what changes need to be made to the interpreter.
To help us understand what needs to change, let's consider the
specific example given above (applying the result of a call to
choose to the list (A B).
Using our new R-notation, the representation would be:
Our final topic is how to modify the interpreter to make cons
lazy.
This involves changing the way the interpreter
calls cons, car, and cdr.
Instead of passing cons
its evaluated arguments, we will pass it two dotted pairs (called
suspensions), each consisting of an unevaluated argument
and the current environment.
Then we'll add code before the invocations of car and cdr
to look for and resolve suspensions.
For example, consider the call:
Our proposed lazy implementation would package up the two unevaluated
arguments with the current environment and would call cons
as follows:
But this won't quite work, because we also need a way for car and
cdr to recognize suspensions (i.e., to know that the head
and/or tail of the list consists of a not-yet-evaluated expression).
So we'll package up the literal SUSPEND with each unevaluated
argument and the current environment, producing:
OK, now we know what we want to do, where do we actually need to make
changes?
The first change is to eval, which must
handle CONS as a special case (or else it will evaluate CONS's arguments,
since we're assuming our original interpreter with call-by-value
parameter passing).
This means changing the last line of eval from:
To illustrate how this works, consider the expression
The other change that must be made is
to change calls to car and cdr so that they look
for and resolve suspensions.
That requires changing the code in apply
that handles those two functions.
In particular, we must change:
Similar changes are made for the code that handles CDR.
Assume that atoms are extended to include integer literals,
and that succ is a primitive function.
Trace the interpreter, and draw the suspensions created for
the following program:
Functions as function results
choose[x] ← if null[x] then car else cdr
choose[NIL][(A B)]
because the rule for function application is:
fn-application → id [ expression-list ]
i.e., the applied function is assumed to be specified via its name
(not as the result of a function call).
choose[x] ← if null[x] then car else cdr
myapply[f, l] ← f[l]
myapply[choose[NIL], (A B)]
but this is not very nice...
fn-application → expression [ expression-list ]
and we must change the corresponding R-notation:
R[exp [e1...en]] = (R[exp] R[e1] ... R[en])
( (CHOOSE NIL) (QUOTE (A B)) )
i.e., that would be the exp passed to eval when
the interpreter is started.
Since exp is not an atom, and its car is neither
QUOTE nor COND, apply would be called with:
Since fn is neither a primitive function name nor an atom,
apply would conclude that it must be of the form
(LAMBDA (...) ...).
However, this is not correct, and so we clearly must add a new
case to apply to account for functions that are non-LAMBDA
expressions.
In particular, we'll change:
fn = ( CHOOSE NIL )
args = ( (A B) )
else // fn must be of the form (LAMBDA (...) ...)
eval[caddr[fn], pairlis[cadr[fn], args, environ]]
to:
else if eq[car[fn], LAMBDA] then
eval [caddr[fn], pairlis[...]]
else // fn must be an expression
apply[eval[fn, environ], args, environ]
Note that, for our example (using choose), the evaluation
of ( CHOOSE NIL ) produces CAR, which is then applied to ( A B ),
producing A as expected.
Lazy Constructors
cons[ car[ (A B) ], cdr[ (X Y Z) ] ]
The representation of this expression is:
(CONS (CAR (QUOTE (A B)) CDR (QUOTE (X Y Z))))
An eager implementation would first evaluate the two arguments and then
would call cons with the results:
cons[ A, (Y Z) ]
producing the list: (A Y Z).
cons[ ((CAR (QUOTE (A B))) . env), ((CDR (QUOTE (X Y Z))) . env) ]
producing the dotted pair:
((CAR (QUOTE (A B)) . env) . (CDR (QUOTE (X Y Z))) . env)
((SUSPEND . ((CAR (QUOTE (A B))) . env)) . (SUSPEND . ((CDR (QUOTE (X Y Z))) . env)))
else apply[...evlis...]
to:
else if eq[car[exp], CONS] then
cons[suspend[cadr[exp],environ], suspend[caddr[exp],environ]]
else apply[...evlis...]
adding the following definition of suspend:
suspend[exp, env] ← cons[SUSPEND, cons[exp,env]]
and removing from apply the code that looks for and calls
cons (since that is now handled in eval).
cons[A,B]
which is represented as:
(CONS (QUOTE A) (QUOTE B))
The picture below shows the suspensions that are created by eval
when it is called to evaluate the expression in an environment env
(the labels (a), (b), etc are used later below).
(a)
-----------------------------------------------------------------
| |
| ------- |
| |CONS | |
| |-----| |
| | | | |
| -/---\- |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| (b) / \ |
| -----------------/------------- \ |
| | / | \ |
| | / | \ |
| | / | \ |
| | ------- | ------- |
| | |CONS | | |CONS | |
| | |-----| | |-----| |
| | | | | | | | | |
| | -/---\- | -/---\- |
| | / \ | / \ |
| | / \ | / \ |
| | / \ | / \ |
| | / \ | / \ |
| | / ------- | / ------- |
| | SUSPEND |CONS | | SUSPEND |CONS | |
| | |-----| | |-----| |
| | | | | | | | | |
| | -/---\- | -/---\- |
| | (c) / \ (d) | / \ |
| | -------/--- \---- | / \ |
| | | / | | \ | | / \ |
| | | (QUOTE A) | | env | | (QUOTE B) env |
| | | | | | | |
| | ----------- ----- | |
| | | |
| ------------------------------- |
| |
-----------------------------------------------------------------
if eq[fn, CAR] then caar[args]
to the following (with annotations to show the relationship
between the expressions in the code and the parts of the
suspension illustrated above):
else if eq[fn, CAR] then first[car[args]]
\_______/
(a)
Where function first is defined as follows (using
auxiliary function suspended):
first[cell] ←
if suspended[car[cell]] then eval[cadar[cell], cddar[cell]]
\_______/ \_________/ \_________/
(b) (c) (d)
else // no cons used; e.g.: car[ ( A B ) ]
car[cell]
suspended[cell] ←
if null[cell] then NIL
else if atom[cell] then NIL
else if eq[car[cell],SUSPEND] then T
else NIL
from[x] ← cons[x, from[succ[x]]]
car[from[1]]