# Lambda Calculus (Part II)

## Overview

So far we have been discussing the mechanics of computing with lambda expressions. Our next topic is how to express "normal" program objects and constructs using lambda expressions. In particular, we'll consider how to encode:

• conditionals (if-then-else)
• booleans (literals, and, or)
• natural numbers (literals, iszero, pred, succ)
• lists (nil, cons, head, tail, isEmpty)
• recursion

First, let's consider one detail: According to the definition given earlier, an abstraction (a term of the form λx.M) defines a function of one argument (x). What if we want functions of more than one argument?

To answer this question, let's think about how to define a (non-pure) lambda expression to represent the "plus" function. Normally, we think of plus as a function of two arguments; i.e., its type is:

(int X int) → int
However, we can use currying (a technique defined by Haskell Curry, having nothing to do with spicy food or dirty horses) to define plus using only functions of one argument. The trick is to define plus as a function of one argument that, when applied to that argument (the first number), produces a function that, when applied to the second number produces the sum. Note that when plus is applied to two numbers, it produces their sum as expected.

The type of this version of plus is:

int → (int → int)
Here's how we define it in lambda calculus:
λx.λy.x+y
And here's an example of an application:
(λx.λy.x+y) 3 2 →β ((λy.3+y) 2) →β (3+2) = 5

## Encoding Conditionals and Boolean Values

First, let's consider how to encode a conditional expression of the form: if P then A else B (i.e., the value of the whole expression is either A or B, depending on the value of P). We will represent this conditional expression using a lambda expression of the form: COND P A B, where COND, P, A and B are all lambda expressions. In particular, COND is a function of 3 arguments that works by applying P to A and B (i.e., P itself chooses A or B):

COND == λp.λa.λb.p a b
(where == means "is defined to be").

To make this definition work correctly, we must define the representations of true and false carefully (since the lambda expression P that COND applies to its arguments A and B will reduce to either TRUE or FALSE). In particular, when TRUE is applied to a and b we want it to return a, and when FALSE is applied to a and b we want it to return b. Therefore, we will let TRUE be a function of two arguments that ignores the second argument and returns the first argument, and we'll let FALSE be a function of two arguments that ignores the first argument and returns the second argument:

TRUE == λx.λy.x
FALSE == λx.λy.y
Now let's consider an example: COND TRUE M N. Note that this expression should evaluate to M. Let's see if it does (by substituting our definitions for COND and TRUE, and evaluating the resulting expression). The sequence of beta-reductions is shown below; in each case, the redex about to be reduced is indicated by underlining the formal parameter and the argument that will be substituted in for that parameter.
p.λa.λb.pab)(λx.λy.x)M N →β

a.λb.(λx.λy.x)ab)M N →β

b.(λx.λy.x)Mb)Nβ

x.λy.x)MN →β

y.M)Nβ

M

## Encoding Boolean Operations

Now let's consider the Boolean operators and and or. We'll use prefix notation (as we did for COND); e.g., A and B will be represented as AND A B, where A and B are lambda expressions that represent boolean expressions, and AND is the lambda expression that represents the and operator. Note that AND and OR need to be functions that take two (boolean) arguments. Note also that given the expression A and B, if A is false, then the value of the whole expression is false, while if A is true, then the value of the whole expression is B (and similarly for or). In other words, AND A B must either reduce to FALSE (if A is FALSE) or to B (if A is TRUE). Recall that FALSE is a function of two arguments that returns the second, while TRUE is a function of two arguments that returns the first. So we can define AND and OR to apply their first argument A (which will reduce to either TRUE or FALSE) to their second argument B and to the appropriate boolean literal:

AND == λa.λb.a b FALSE
OR == λa.λb.a TRUE b
To see how these boolean operators work, consider the example: FALSE OR TRUE (which should reduce to TRUE). Here are the reductions:
 (λa.λb.a TRUE b) (λx.λy.y) (λx.λy.x) OR FALSE TRUE →β (λb.(λx.λy.y) TRUE b)(λx.λy.x) →β (λx.λy.y) TRUE (λx.λy.x) = (λx.λy.y) (λx.λy.x) (λx.λy.x) →β (λy.y)(λx.λy.x) →β (λx.λy.x) = TRUE

## Encoding Lists

Now we'll consider how to encode LISP-style lists. In LISP, a list is either (a) empty (nil), or (b) a pair: (item list). Lists are built using the cons operator. Think of cons as a function of two arguments (an item and a list) that gives you back a list (the given one with the given item added to the front).

Non-empty lists are "taken apart" using selector functions head and tail:

head: list → item (returns first item in given list)
tail: list → list (returns everything except first item)

Finally, the isempty function can be used to determine whether a list is nil or non-nil.

Below are two examples of LISP-like functions that manipulate lists (to show you how the list operators work). The first computes the sum of the numbers in a given list L, and the second creates a new list like the given one except that each item is "doubled" (appears twice in a row).

```function sum(L: list) {
if (isempty L) 0
else + (head L)(sum (tail L))
}

function double(L: list) {
if (isempty L) nil
else cons (head L)(cons (head L) (double (tail L)))
}
```
The way we'll define a non-nil list is as a function that "stores" the head and tail of the list in its body. Its argument will be a selector function (head or tail). (We'll see in a minute how that permits us to implement the selector functions themselves in a clever manner.) So every non-nil list is of the form:
λs.s h t
where s is the selector and h and t are the head and tail of the list.

Before we discuss implementing head and tail, let's think about how to implement cons. Recall that cons is a function that, when applied to two arguments (an item and a list) returns a list. Given our idea of having a list store its head and tail in its body, it's easy to define cons:

CONS == λh.λt.(λs.s h t)
Now let's think about the selectors. We want head applied to list L to give us back the head of the list (which is stored in L's body). We can get that value if we apply the list itself (remember, it's a function of the form λs.s h t) to a function s that takes two arguments (h and t), and returns h. Similarly, for tail we want to apply the list to a function s that takes two arguments (h and t), and returns t. Here are the appropriate definitions:
HEAD == λL.L TRUE
TAIL == λL.L FALSE

Note that we're using TRUE and FALSE as shorthand for λx.λy.x, and λx.λy.y; boolean values have nothing to do with our list implementation.

We still need a representation for nil, and a definition of the isempty function. Let's think about isempty first. For ISEMPTY we want:

(ISEMPTY NIL) →β TRUE
and
(ISEMPTY L) →β FALSE
for a non-nil list L. All non-nil lists are of the form λs.s h t so we really want:
(ISEMPTY (λs.s h t)) →β FALSE
So ISEMPTY needs to "feed" its list argument an appropriate value for s that ignores its arguments h and t, and just returns FALSE. That's easy to define:
ISEMPTY == λL.L(λh.λt.FALSE)
Once we have this definition of ISEMPTY, we can decide how to define NIL so that ISEMPTY works correctly when applied to NIL, too (i.e., returns TRUE in that case). In particular, we want:
((λL.L(λh.λt.FALSE))NIL) →β TRUE
So we need to define NIL as follows:
NIL == λx.TRUE

Here are some examples of lists:

 List Encoding nil λx.TRUE (a) λs.s a NIL (a,b) (λs.s a (λs.s b NIL))

Now let's try an example of some list operations:

ISEMPTY (TAIL (CONS a NIL))
(the answer should be TRUE):
 (λL.L(λh.λt.FALSE)) ( (λL.L FALSE) ((λh.λt.λs.s h t) a NIL) ) ISEMPTY TAIL CONS →β (λL.L(λh.λt.FALSE)) ( (λL.L FALSE) ((λt.λs.s a t) NIL) ) →β (λL.L(λh.λt.FALSE)) ( (λL.L FALSE) (λs.s a NIL) ) →β (λL.L(λh.λt.FALSE)) ( (λs.s a NIL) FALSE ) →β (λL.L(λh.λt.FALSE)) (FALSE a NIL) →β* (λL.L(λh.λt.FALSE)) NIL (since FALSE returns its 2nd arg) →β NIL(λh.λt.FALSE) = (λa.TRUE)(λh.λt.FALSE) →β TRUE

## Encoding Natural Numbers

We will consider two different ways to encode natural numbers. In both cases we'll consider the following operations:

• iszero
• pred (predecessor)
• succ (successor)

### Method #1

The first approach involves representing the number n using a list of n x's:

 0 == empty list = NIL = λx.TRUE 1 == one element list = λs.s x NIL 2 == two element list = λs.s x (λs.s x NIL) etc.

Given this representation, here's how we implement the operations:

• ISZERO == ISEMPTY
• PRED == TAIL
• SUCC is a function that adds one more element onto a given list, so: SUCC == λL.CONS x L

TEST YOURSELF #1

Write the lambda expression that represents: succ(0) and do the beta reductions (make sure that your result is the lambda expression that represents 1)

solution

### Method #2

Our second technique for representing natural numbers is to represent the number n as a lambda expression of the form:

λx.λy.xn y
where xn y means x(x(x ... (xy))...), with n x's. For example:
 0 == λx.λy.y 1 == λx.λy.xy 2 == λx.λy.x(xy) 3 == λx.λy.x(x(xy)) etc.
Now let's think about how to define the operations. We want iszero applied to (the representation of) zero to evaluate to true, and applied to anything other than zero to evaluate to false. Note that ZERO is a function of two arguments; ISZERO needs to "get rid" of that function by applying it to the appropriate two values so that the result is TRUE; i.e., we want ISZERO to be of the form:
λf.f _ _
What should the missing values be? Well, ZERO is the function that returns its second argument, and we want ISZERO applied to ZERO to evaluate to TRUE, so the second argument better be TRUE. The numbers other than ZERO are functions that apply their first argument to the second argument some number of times, and for all of those numbers we want the final result to be FALSE. So the first argument needs to be a lambda term g such that g applied to TRUE is FALSE; g applied to (g applied to TRUE) is FALSE, etc. The answer is actually very simple: make g be the function that ignores its argument and returns FALSE:
g == λx.FALSE
And now we know that ISZERO should be defined like this:
λf.f(λx.FALSE)TRUE

TEST YOURSELF #2

Write the lambda expressions that represent: iszero(0), iszero(1), and iszero(2), and do the beta reductions (make sure that the results are TRUE, FALSE, and FALSE).

Now let's consider the succ function. For all numbers n, we want succ applied to n to produce n+1. That is, we want:

(SUCC(λx.λy.xn y)) →β (λx.λy.xn+1 y)
First, think about what function f applied to (λx.λy.xn y) yields (xn+1 y)? To define that function we'll use what is becoming our standard trick: define f so that it applies its argument (λx.λy.xn y) to something. Since (λx.λy.xn y) is a function of two arguments, f needs to apply it to two arguments:
f == λa.a arg1 arg2
When f is applied to (λx.λy.xn y), arg1 will be substituted in for x, and arg2 for y; i.e., we'll have:
arg1n arg2
and we want that to be
xn+1 y
So arg1 needs to be x, and arg2 needs to be xy (because xn(xy) = xn+1y).

But f is not quite what we need for SUCC, because number n is not xny, it's λx.λy.xny. So we need to stick that "λx.λy" on to the front of the result produced by f, defining SUCC as follows:

SUCC == λa.(λx.λy.a x (xy))

To complete our discussion of the second way to represent the natural numbers, we need to define PRED. Unfortunately, that definition is quite complicated, so we'll just give the definition here (and the "interested reader" is welcome to try it out and/or to think about why it makes sense). The definition is from Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory by Joseph Stoy, and is found on pageg 72:

PRED == λk.(k(λp.λu.u(SUCC(p TRUE))(p TRUE))(λu.u ZERO ZERO))FALSE

### Proof of correctness of ISZERO

In defining our encodings above, we have relied on intuition and a few examples to convince ourselves that the definitions we gave were correct. A more rigorous approach would be to prove the correctness of each encoding. Doing so for all encodings would be a bit much, but we'll do it for one example: the definition of ISZERO for the second way of representing natural numbers.

Theorem: Our encoding of iszero is correct; i.e., the representation of iszero(0) beta-reduces to TRUE, and for all n > 0, the representation of iszero(n) beta-reduces to FALSE.

Proof by cases on n:

Case 1, n = 0: Proved in self-study problem 2.

Case 2, n = 1: Proved in self-study problem 2.

Case 3, n > 1:

In this case, the representation of iszero(n) is:
 (λf.f(λa.FALSE)TRUE) (λx.λy.xn y) ISZERO n
This term beta-reduces as follows (with the formal and the argument underlined each time):
f.f(λa.FALSE)TRUE)(λx.λy.xn y) →β
x.λy.xn y)(λa.FALSE)TRUE →β
y.(λa.FALSE)n y)TRUEβ
(λa.FALSE)n (TRUE)
Since n > 1 this is equal to:
(λa.FALSE)((λa.FALSE)n-1 (TRUE))
But λa.FALSE applied to anything beta-reduces to FALSE,so this whole expression beta-reduces to FALSE, and the proof is complete.

## Defining Recursive Functions

We would like to be able to define recursive functions like factorial in the lambda-calculus:

fact = λx. if x==0 then 1 else x*fact(x-1)

But we're not allowed to name functions in lambda calculus, so we can't do this directly. In the previous notes on encoding natural numbers, etc., we used names that represented lambda-calculus expressions, such as "ISZERO", but these are better thought of as shorthand for the corresponding lambda expressions (or macros that are expanded into the corresponding lambda expressions). We'd get an infinite "unfolding" if we treated "fact" as such a macro and tried to expand it.

To understand recursion in lambda expressions, it helps to think about equations like X = 4/X where we are to solve for X. In such cases, we can test a potential solution by replacing X with some value; for example:

2 = 4/2

It is also true that if the value x' is a solution, then:

x' = (λy.4/y)(x')

Here we used a lambda expression in which the X on the right-hand side was abstracted out, and then applied the expression to the value x'.

We can use the same trick to define a recursive function. For example, think of the definition of the factorial function given above as an equation instead of a definition and abstract on "fact":

fact' = (λf.λx. if x==0 then 1 else x*f(x-1))(fact')

Note that the lambda expression takes an argument f, and gives you a function that either returns 1 or returns the result of an application of the given function.

Finding a value (a lambda expressions) for fact' that solves this equation involves finding a fixed-point of the function we created by abstracting.

Definition: A fixed point of a function F is a value x such that x = F(x).

Here are some example functions and some of their fixed points:

Function Fixed Point
f(x) = 5 5
f(x) = x all integers
f(x) = 4/x 2, -2
f(x) = x+1 none in the domain of integers

An amazing fact is that in lambda-calculus, every function has a fixed point, though it may not correspond to anything "useful". (For example, the fixed point of λx.x+1 is a lambda-expression that doesn't correspond to an integer.) The fixed point may not have a normal form either (for recursive definitions), but that's OK since normal forms are the lambda equivalent of "answers" to computations and we don't expect a recursive definition to be an answer.

### Finding fixed points

Recall that our goal is to define recursive functions using lambda expressions. Our technique is to define an equation of the form

f' = (λf.λx. function body including an application of f)(f')

Solving an equation like that requires that we find a fixed point of the function defined on the right-hand side by the outermost lambda expression. In general, given a function F, we need a way to produce a lambda expression X such that X = (F)X.

The insight is as follows: We need X to be an expression that can make a copy of itself that's ready to be applied to the "n-1 case". This should keep happening until the "n=0 case" is reached. The form for X will be DD.

If we plug in (DD) for X we get:

(DD) = (F)(DD)

What does it mean for lambda expression DD to be equal to lambda expression (F)(DD)? We haven't actually this discussed this question (it is discussed below), but intuitively it seems reasonable that they should be considered equal if one beta-reduces to to the other. We don't get to choose what F is (that's the abstraction of the recursive function we're trying to define), so we can't define DD so that (F)(DD) reduces to (DD). However, we do get to define (DD), so let's try to do that so that (DD) beta-reduces to (F)(DD). To do that, we need to make F part of D; in particular we can define D as:

λy.F(y y)

Let's check this definition of D by actually trying it out (i.e., by using it to replace each instance of D in (DD) -- the left-hand side of the equation (DD) = (F)(DD) -- and verifying that the result does beta-reduce to (F)(DD)):

 (λy.F(y y)) (λy.F(y y)) →β F( (λy.F(y y)) (λy.F(y y)) D D F D D

It works! Now we know how, given any function F, we can define a term X (of the form DD) such that X is a fixed point of F. But we can do more -- we can "automate" that process by defining a lambda expression that, when applied to any function F, produces a fixed point for F. This fixed-point creator is called the Y combinator, and is defined as follows:

λf.(λy.f(y y))(λy.f(y y))

Let's try it out to get a (non-recursive) definition of factorial; i.e., we will apply Y to:

(λf.λx. if x==0 then 1 else x*f(x-1))

Here's what we get:

(λf.(λy.f(y y))(λy.f(y y)))(λf.λx. if x==0 then 1 else x*f(x-1))

We can test this definition by applying it to the integer 3. If it's correct, the whole thing should reduce to 3! = 6. Here's the sequence of beta-reductions (for clarity, we start by using F to represent the function to which we applied the Y combinator):

((λf.(λy.f(y y))(λy.f(y y)))F)(3)

(λy.F(y y))(λy.F(y y))(3)

F((λy.F(y y))(λy.F(y y)))(3)

((λf.λx. if x==0 then 1 else x*f(x-1))((λy.F(y y))(λy.F(y y))))(3)

(λx. if x==0 then 1 else x*((λy.F(y y))(λy.F(y y)))(x-1))(3)

if 3==0 then 1 else 3*((λy.F(y y))(λy.F(y y)))(3-1)

3*(((λy.F(y y))(λy.F(y y)))(2))

3*(F((λy.F(y y))(λy.F(y y)))(2))

3*(((λf.λx. if x==0 then 1 else x*f(x-1))((λy.F(y y))(λy.F(y y))))(2))

3*((λx. if x==0 then 1 else x*((λy.F(y y))(λy.F(y y)))(x-1))(2))

3*(if 2==0 then 1 else 2*((λy.F(y y))(λy.F(y y)))(2-1))

3*(2*(((λy.F(y y))(λy.F(y y)))(1)))

3*(2*(F((λy.F(y y))(λy.F(y y)))(1)))

3*(2*(((λf.λx. if x==0 then 1 else x*f(x-1))((λy.F(y y))(λy.F(y y))))(1)))

3*(2*((λx. if x==0 then 1 else x*((λy.F(y y))(λy.F(y y)))(x-1))(1)))

3*(2*(if 1==0 then 1 else 1*((λy.F(y y))(λy.F(y y)))(1-1)))

3*(2*(1*(((λy.F(y y))(λy.F(y y)))(0))))

3*(2*(1*(F((λy.F(y y))(λy.F(y y)))(0))))

3*(2*(1*(((λf.λx. if x==0 then 1 else x*f(x-1))((λy.F(y y))(λy.F(y y))))(0))))

3*(2*(1*((λx. if x==0 then 1 else x*((λy.F(y y))(λy.F(y y)))(x-1))(0))))

3*(2*(1*(if 0==0 then 1 else 0*((λy.F(y y))(λy.F(y y)))(0-1))))

3*(2*(1*1))

6
Note that when the base case was reached (the 3rd-to-last line), the "else" part of the "if" was thrown away and the recursive "calls" finished.

Summary: To define a recursive function "f = λx. ...f...f...", find a closed-form solution (a lambda expression) as follows:

1. Abstract on f: λF.λx. ...F...F...F...
2. Apply the Y combinator: (Y)(λF.λx. ...F...F...F...)
Now you have a lambda expression that you can apply to any argument. The result (after beta-reducing) will be the result you want: the result of applying the recursive function to that argument.

## Equality of Lambda Expressions

Below are the rules of equality for lambda expressions. Note that rules 1 - 3 are required for any equivalence relation (it must be reflexive, symmetric, and transitive). Rule 4 says that if one lambda-expression alpha- or beta-reduces to another, then the two are equivalent. Rule 5 says that equal terms can be substituted on either side of an application, or as the body of a function, taking care not to mess up free variables.

For all lambda-expressions M, N, and P
1. M = M
2. if M=N then N=M
3. if M=N and N=P then M=P
4. if M →β N, or M →α N, then M=N
5. if M=N then
1. PM = PN
2. MP = NP
3. λx.M = λx.N, where x is not free in M or N
With these rules we can actually prove that Y produces fixed points. That is, for all M, YM = M(YM). We'll start by using beta-reduction on each side until a common form is reached; then we'll use some of the basic rules for equality to finish up the proof.

Proof:

By definition, Y is λf.(λx.f(x x))(λx.f(x x)). So for the left-hand side, we start with YM and do the following two beta-reductions:
YM →β (λx.M(x x))(λx.M(x x)) →β M((λx.M(x x))(λx.M(x x)))
The right-hand side, M(YM) is:
M[(λf.(λx.f(x x))(λx.f(x x)))M]
which beta-reduces to:
M((λx.M(x x))(λx.M(x x))).

Now we've shown that both YM and M(YM) reduce to the same term T (using a sequence of beta-reductions). By rule 4, YM=T and M(YM)=T. By rule 2, T=M(YM), and by rule 3, YM=T=M(YM). Thus YM = M(YM).

TEST YOURSELF #3

Y is Curry's fixed-point combinator, and though we've just shown that YM=M(YM), it is not the case that YM reduces to M(YM) using a sequence of beta-reductions. Turing used a different combinator T, which has the property that for all M, TM →β* M(TM). What is T?

hint