CS 704, Assignment 1

De Bruijn Representation of Lambda Calculus Terms

Due Date: 11:59 PM, February 15th.


Contents


Overview

In this assignment, you'll manipulate abstract syntax trees of lambda-calculus terms in two different forms. You should do the following:

  1. Learn the basics of OCaml
  2. Implement Lambda.to_debruijn, which converts traditional lambda-calculus terms to De Bruijn-indexed terms.
  3. Implement DeBruijn.beta_lor, which performs the leftmost, outermost beta reduction in a De Bruijn-indexed term.
  4. Submit your work.

Terms with De Bruijn1 Indices

In traditional lambda notation, two expressions may have the same essential structure, but be technically different due to the names of bound variables. That is, two expressions may be equal up to alpha reductions, but not syntactically identical. De Bruijn indices ameliorate this problem: if you can alpha reduce two traditional lambda-calculus terms to syntactically identical terms, then their corresponding De Bruijn-indexed terms are syntactically identical.

The binding depth of a variable is the number of lambdas between that variable and the lambda to which it is bound, including the binding lambda. So, in \x.\y.\z.xyzy, the binding depths of the variables are 3, 2, and 1 for x, y, and z, respectively. (We're using \ to represent lambdas so that we don't have to worry about handling Unicode in everyone's browsers, editors, and terminals.)

A De Bruijn-indexed term replaces the names of variables with their binding depth, and thus removes the need to name variables at all. Thus, \x.\y.\z.xyzy is equivalent to the De Bruijn-indexed term \\\3 2 1 2. For a more complicated example, \x.\y.\f.f(\x.x)(\z.y) is \\\1(\1)(\3) with De Bruijn indices. Notice that the occurrences of f and x have the same De Bruijn index, but they are in different contexts, and so they are bound to different lambdas.

De Bruijn indices simplify reductions by eliminating the renaming usually required to avoid variable capture. However, beta-reduction is still more complicated than simply replacing some variables in the left-hand side of an application with the term in the right-hand side, as you'll shortly learn.

Getting Started

If you're working from a CSL machine, go to some suitable working directory and run

/u/a/w/aws/public/html/courses/cs704-code/asn1/grab
    

This will create the Prog1 directory, and populate it with the code you'll need - symbolic links to the files you should not change, and local copies of the files that you should change.

Learn OCaml

If you've never used OCaml, you'll need to learn some of the language. Chapters 1 and 2 of the OCaml Manual form a reasonable introduction to the language. None of the skeleton code, nor none of the code you write, needs to handle OCaml's object-oriented syntax, labeled parameters, or polymorphic variants. You won't need to write functor modules, either, but you may want to use them - OCaml's standard maps and sets are implemented as functors.

I strongly recommend playing with the ocamlc interpreter to help learn the language. Invoke it as rlwrap ocamlc - this interposes the input functions from the Readline library, which gives you a much nicer interactive shell.

Programs You'll Run

To build the code, run ./build in the Prog1 directory. This will run ocamlbuild, producing four executables: ToDeBruijnCheck.byte. ToDeBruijnRepl.byte, BetaLorCheck.byte, and BetaLorRepl.byte; it will put a bunch of OCaml object files in the _build directory.

These programs each drive your code in various ways. The .byte extension on these programs indicates that they're compiled OCaml bytecode. The ToDeBruijn programs read traditional lambda-calculus terms and write equivalent De Bruijn-indexed terms; they're thin wrappers around the Lambda.to_debruijn function. The BetaLor programs read De Bruijn-indexed terms and write those terms after a single beta reduction in normal order; they're thin wrappers around the DeBruijn.beta_lor function. The Check programs read one term per file; these are intended for automated testing. The Repl programs read one term per line, and respond immediately; these are intended for interactive use. As with ocamlc, these programs are more pleasant if you run them with rlwrap.

To clean away the built files, run ocamlbuild -clean.

The check program is the program we'll be using to grade your program. It builds the program, and then runs ToDeBruijnCheck.byte and BetaLorCheck.byte against every test case in the Tests directory. You can (and should!) run this yourself to see if your code handles everything we expect it to handle. Run this before you turn in your code! The Tests directory contains the test files that check uses. You may, of course, define your own test cases.

Syntax for Lambda Terms

Input that you give to ToDeBruijnRepl.byte on a single line, input that you feed to ToDeBruijnCheck.byte in an entire file, and the contents of .lam files should be in the following (BNF) syntax, where var is any single, lowercase letter:

<term> ::= <term> <term>
         | \ <var> . <term>
         | ( <term> )
         | <var>
    

Whitespace is optional between any two elements of the grammar. As in the usual notation, application is left-associative, and application has higher precedence than abstraction. Thus, abcd means the same thing as ((ab)c)d, and \x.\y.y\z.yz means the same thing as \x.(\y.(y (\z.(yz)))).

Syntax for De Bruijn - Indexed Terms

Input that you give to BetaLorRepl.byte on a single line, input that you feed to BetaLorCheck.byte in an entire file, and the contents of .db files should be in the following (BNF) syntax, where var is any number:

<term> ::= <term> <term>
         | \ <term>
         | ( <term> )
         | <var>
    

Whitespace is optional between most elements of the grammar, but must occur between two adjacent vars. Otherwise, the parser would be unable to distinguish between, e.g., 1 1, a one applied to another one, and 11, an eleven. Again, application is left associative, and application has higher precedence than abstraction.

The Code

When you're writing your code, you should only have to worry about a few files. As is usual in OCaml, .mli files describe the interface to a module (like a header in C), and .ml files implement that interface.

The only modules you should need to think much about are the Lambda, DeBruijn, and IO modules.

The Lambda Module

The Lambda module has the following signature:

type expr =
    Var of char 
  | Lambda of char * expr
  | Apply of expr * expr

exception Empty

val to_debruijn: expr -> DeBruijn.expr
    

The Lambda.expr type is the type of abstract syntax trees for traditional lambda-calculus terms. Var is an expression that consists of a single variable, Apply(a,b) is the application of a to b, and Lambda(c,e) is a lambda term in which the variable c is bound. For instance, \x.\y.xyx is encoded as:

Lambda('x',
  Lambda('y',
    Apply(
      Apply( Var 'x', Var 'y'),
      Var 'x')))
    

Lambda.Empty is an exception that the lambda-term parser throws when it receives empty input; you don't need to worry about it.

Lambda.to_debruijn: Lambda.expr -> DeBruijn.expr is the first function that you should implement in this assignment. In the file Lambda.ml, it is currently implemented trivially and incorrectly - there is just enough there that it type-checks when you try to compile it. Per its signature, to_debruijn takes a lambda term t as input, and returns an equivalent DeBruijn term. When you implement this, be careful to correctly implement lexical scopes: \x.(\x.x)x is \(\1)1, and \x.\y.\x.xy is \\\1 2.

In to_debruijn, you should treat free variables as if they were all bound just one level outside the entire expression. So, the term ``a'' translates to ``1'', and ``\x.xy\z.w'' translates to ``\1 2 \3''.2

When you're done with Lambda.ml, it should contain the type of expr, the declaration of the exception Empty, and your implementation of to_debruijn. If you remove any of these pieces, the project won't compile. On the other hand, it is entirely fine to define more values and types in Lambda.ml - in particular, you may find it useful to define auxiliary functions.

The DeBruijn Module

The DeBruijn module has the following signature:

type expr =
      Var of int
    | Lambda of expr
    | Apply of expr * expr

exception Empty

val beta_lor: expr -> expr option
    

DeBruijn.expr is the type of abstract syntax trees for De Bruijn-indexed terms. Again, Var is an expression that consists of a single variable, Apply(a,b) is the application of a to b, and Lambda e is a lambda term that introduces a new binding into the context of expression e. For example, the term \\2 1 2 is encoded as:

Lambda(
  Lambda(
    Apply(
      Apply( Var 2, Var 1),
      Var 2)))
    

Again, DeBruijn.expr is an exception that the De Bruijn parser throws when it receives empty input. Again, you shouldn't need to worry about it.

DeBruijn.beta_lor: DeBruijn.expr -> DeBruijn.expr option is the second function you should implement in this assignment. It takes, as input, a De Bruijn-indexed expression: let's call it e.

If e contains a beta-redex, then beta_lor finds the leftmost-outermost redex3 and performs that beta reduction. If we call the reduced expressions out, then beta_lor then returns Some(out). On the other hand, if e does not contain a beta-redex, then beta_lor has no expression to return, so it returns None.4

Note that when you perform a beta-reduction, you will need to alter the binding depths of the free variables of the left-hand and right-hand sides of the Apply. In particular, you need to ensure that after all changes have been made to create the contractum, each variable still present in the result is bound to the same Lambda node that it was bound to before the beta-reduction -- including variables that are bound in the surrounding context. Consider carefully what this requires, remembering that the Apply and the Lambda nodes that make up the top two nodes of the redux are removed in a beta-reduction transition.

Again, it is OK to define auxiliary values and types in DeBruijn.ml.

The IO Module

The interface for the IO module is:

val print_lam: Lambda.expr -> unit
val print_db: DeBruijn.expr -> unit
val lam_of_string: string -> Lambda.expr
val db_of_string: string -> DeBruijn.expr
val lam_of_channel: in_channel -> Lambda.expr
val db_of_channel: in_channel -> DeBruijn.expr
    

IO.print_lam and IO.print_db print Lambda and DeBruijn expressions, respectively. You might find these useful for debugging. The of_string and of_channel functions produce Lambda or DeBruijn expressions from strings or file streams - they get called by the various Check and Repl programs, but you probably won't need to use them.

Other Files

The files ending in Check.ml and Repl.ml are the top-level implementations of the corresponding executables. The files ending in Lex.mll or Parse.mly are the sources for ocamllex and ocamlyacc, respectively, that get called by the of_string and of_channel functions in IO.

How to Submit Your Work

Before you submit your work, you should really run check. Once you're satisfied with your program's correctness, copy your versions of Lambda.ml and DeBruijn.ml to a folder titled lastname.firstname.asn1, zip the folder and submit it via canvas.

Do not send edited versions of any of the other files.


  1. In English, "De Bruijn" is pronounced more like "de brown" than "de broyn" (Benjamin A. Pierce. Types and Programming Languages, page 76. MIT Press, 2002).  

  2. Yes, this means that differently-named variables might resolve to the same virtual binding, if this expression was part of a larger expression. I think that's inelegant, but it makes about as much sense as any other compromise. Free variables are only meaningful in some sort of context, even if it is only an implicit context. In the traditional context for lambda-calculus terms, it is assumed that you'll have an environment of bindings of variables to values, so a free variable has a meaning in whatever context you put it in. With De Bruijn indexing, that doesn't mean anything; rather, we assume that the context is an environment of bindings of various depths. Without some translation from names to bindings, any reinterpretation of free variables is going to lose important information.  

  3. The "leftmost-outermost" redex of a lambda term is the redex whose Apply is first encountered during a left-to-right preorder traversal of the term.  

  4. Some and None are constructors for Ocaml's polymorphic option type. So, Some(3), Some(8), and None are all values of type int option. An OCaml programmer uses this when it is uncertain whether a function will actually have a value to return. Think of it as checking for Null or None, but in a type-safe way.