Lambda calculus is a model of computation, invented by Church in the early 1930's. Lambda calculus and Turing machines are equivalent, in the sense that any function that can be defined using one can be defined using the other. Here are some points of comparison:
Lambda Calculus | Turing Machine |
Forms the basis for functional languages (LISP, Scheme, ML). | Forms the basis for imperative languages (Pascal, ADA, C). |
We write a lambda expression for each function. Input and output are also lambda expressions. | Design a new machine to compute each function. Input and output are written on tape. |
Here's an example of a simple lambda expression that defines the "plus one" function:
This example defines a function of one argument, whose formal parameter is named 'x'. The function body is: "x+1". Note that the function has no name (i.e., it is an anonymous function). To compute with this function, we need to apply it to an argument; for example:
Computation involves re-writing:
The syntax of (pure) lambda expressions is defined as follows:
Rule 2(a) just says that we can put parenthesis around anything. Rule 2(b) defines what we call an abstraction: a function whose formal parameter is id, and whose body is M. Rule 2(c) defines what we call an application: we apply one lambda expression to another (M is applied to N).
Note that the pure lambda calculus excludes constants, types, and primitive operators (e.g. +, *, ...). Note also that (by convention) application is left associative: ABC means (AB)C not A(BC), and application has higher precedence than abstraction: λx.AB means λx.(AB), not (λx.A)B
We can express the rules given above that define the language of lambda expressions using a context-free grammar:
exp | → | ID | |
| | ( exp ) | ||
| | λ ID . exp | // abstraction | |
| | exp exp | // application |
As mentioned above, computing with lambda expressions involves rewriting; for each application, we replace all occurrences of the formal parameter (a variable) in the function body with the value of the actual parameter (a lambda expression). It is easier to understand if we use the abstract-syntax tree of a lambda expression instead of just the text. Here's our simple example application again:
apply / \ λ 3 / \ x + / \ x 1We rewrite the abstract syntax tree by finding applications of functions to arguments, and for each, replacing the formal parameter with the argument in the function body. To do this, we must find an apply node whose left child is a lambda node, since only lambda nodes represent functions.
apply => + / \ / \ λ 3 3 1 / \ x + / \ x 1Here's an example with two applications:
apply => apply => apply => + => 6 / \ / \ / \ / \ λ apply λ + λ 5 5 1 / \ / \ / \ / \ / \ x + λ 3 x + 3 2 x + / \ / \ / \ / \ x 1 y + x 1 x 1 / \ y 2In general, different strategies for choosing which application to rewrite first can have different ramifications. That issue is discussed below.
Do the rewriting again, this time choosing the other application first.
Note that the result of rewriting a non-pure lambda expression can be a constant (as in the examples above), but the result can also be a lambda expression: a variable, or an abstraction, or an application. For a pure lambda expression, the result of rewriting will always itself be a lambda expression. Here are some more examples:
apply => λ => λ λx.x+1 / \ / \ / \ λ λ x apply x + / \ / \ / \ / \ f λ y + λ x x 1 / \ / \ / \ x apply y 1 y + / \ / \ f x y 1Note that the result of the rewriting is a function. Also note that in this example, although there are initially two "apply" nodes, only one of them has a lambda node as its left child, so there is only one rewrite that can be done initially.
apply λ λy.λz.z / \ / \ λ λ => y λ / \ / \ / \ x λ z z z z / \ y x
Draw the abstract-syntax tree for the lambda expression given below, then do the rewriting steps.
Recall that the imprecise definition of rewriting an application (λx.M)N is "M with all occurrences of x replaced by N". However, there are two problems with this definition.
Problem #1: We don't, in general, want to replace all occurrences of x. To see why, consider the following (non-pure) lambda expression:
However, if we rewrite the outer application first, using the naive rewriting rule, here's what happens:
apply /\ λ 2 / \ x + + / \ / \ x apply => 2 apply => + => 5 / \ (bad / \ / \ λ 3 application) λ 3 2 + / \ / \ / \ x + x + 2 1 / \ / \ x 1 2 1We get the wrong answer (5 instead of 6), because we replaced the occurrence of x in the inner expression with the value supplied as the parameter for the outer expression.
Problem #2: Consider the (pure) lambda expression
To understand how to fix the first problem illustrated above, we first need to understand scoping, which involves the following terminology:
λ / \ x /\ / \ / \ /..x...\ | this x is boundHere is a precise definition of free and bound variables:
(λx.y)(λy.yx) | || | |free free | boundTo solve problem #1 above, given lambda expression
+----- M ---------+ | | (λx. x + ((λx.x + 1)3)) 2 | | | | free bound in M in M => 2 + ((λx.x + 1)3)The issue behind problem #2 is that a variable y that is free in the original argument to a lambda expression becomes bound after rewriting (using that argument to replace all instances of the formal parameter), because it is put into the scope of a lambda with a formal that happens also to be named y:
((λx.λy.x)y)z | | free, but gets bound after application
To solve problem #2, we use a technique called alpha-reduction. The basic idea is that formal parameter names are unimportant; so rename them as needed to avoid capture. Alpha-reduction is used to modify expressions of the form "λx.M". It renames all the occurrences of x that are free in M to some other variable z that does not occur in M (and then λx is changed to λz). For example, consider λx.λy.x+y (this is of the form λx.M). Variable z is not in M, so we can rename x to z; i.e.,
alphaReduce(M: lambda-expression, x: id, z: id) { // precondition: z does not occur in M // postcondition: return M with all free occurrences of x replaced by z case M of { VAR(x): return VAR(z) VAR(y): return VAR(y) APPLY(e1, e2): return APPLY(alphaReduce(e1, x, z), alphaReduce(e2, x, z)) LAMBDA(x,e): return LAMBDA(x,e) LAMBDA(y,e): return LAMBDA(y, alphaReduce(e, x, z)) } }
Note: Another way to handle problem #2 is to use what's called de Bruijn notation, which uses integers instead of identifiers. That possibility is explored in the first homework assignment.
We are finally ready to give the precise definition of rewriting:
The left-hand side ((λx.M)N) is called the redex. The right-hand side (M[N/x]) is called the contractum and the notation means M with all free occurrences of x replaced with N in a way that avoids capture. We say that (λx.M)N beta-reduces to M with N substituted for x. And here is pseudo code for substitution.
substitute(M: lambda-expression, x: id, N: lambda-expression) { // when substitute is first called, M is the body of a function of the form λx.M case M of { VAR(x): return N VAR(y): return M LAMBDA(x,e): return M // in this case, there are no free occurrences of // x in M, so no substitutions can be done; // note that this solves problem #1 LAMBDA(y,e): if (y does not occur free in N) then return LAMBDA(y,substitute(e,x,N)) // substitute N for x in the // body of the lambda expression else { // y does occur free in N; here we address problem #2 let y' be an identifier that is neither x nor y, and occurs in neither N nor e; let e' = alphaReduce(e,y,y'); return LAMBDA(y',substitute(e',x,N)) } APPLY(e1,e2): return APPLY(substitute(e1,x,N), substitute(e2,x,N)) } }To illustrate beta-reduction, consider the previous example of problem #2. Here are the beta-reduction steps:
((λx.λy.x)y)z -> ((λy.x)[y/x])z // substitute y for x in the body of "λy.x" -> ((λy'.x)[y/x])z // after alpha reduction -> (λy'.y)z // first beta-reduction complete! -> y[z/y'] // substitute z for y' in "y" -> y // second beta-reduction complete!Note that the term "beta-reduction" is perhaps misleading, since doing beta-reduction does not always produce a smaller lambda expression. In fact, a beta-reduction can:
As discussed above, computing with lambda expressions involves rewriting them using beta-reduction. There is another operation, beta expansion that we can also use. By definition, lambda expression e1 beta-expands to e2 iff e2 beta-reduces to e1. So for example, the expression
A computation is finished when there are no more redexes (no more applications of a function to an argument). We say that a lambda expression without redexes is in normal form, and that a lambda expression has a normal form iff there is some sequence of beta-reductions and/or expansions that leads to a normal form.
This leads to some interesting questions about normal form:
Definition: An outermost redex is a redex that is not contained inside another one. (Similarly, an innermost redex is one that has no redexes inside it.) In terms of the abstract-syntax tree, an "apply" node represents an outermost redex iff
For example:
apply <-- not a redex / \ an outermost redex --> apply apply <-- another outermost redex / \ / \ λ ... λ apply <-- redex, but not outermost / \ / \ / \ ... ... ... ... λ ...
To do a normal-order reduction, always choose the leftmost of the outermost redexes (that's why normal-order reduction is also called leftmost-outermost reduction).
Normal-order reduction is like call-by-name parameter passing, where you evaluate an actual parameter only when the corresponding formal is used. If the formal is not used, then you save the work of evaluating the actual. The leftmost outermost redex cannot be part of an argument to another redex; i.e., reducing it is like executing the function body, rather than evaluating an actual parameter. If it is a function that ignores its argument, then reducing that redex can make other redexes (those that define the argument) "go away"; however, reducing an argument will never make the function "go away". This is the intuition that explains why normal-order reduction will get you to a normal form if one exists, even when other sequences of reductions will not.
Fill in the incomplete abstract-syntax tree given above (to illustrate "outermost" redexes) so that the resulting lambda expression has a normal form and the only way to get there is by choosing the leftmost outermost redex (instead of some other redex) at some point in the reduction.
You may be wondering whether it is a good idea always to use normal-order reduction (NOR). Unfortunately, the answer is no; the problem is that NOR can be very inefficient. The same issue arises with call-by-name parameter passing: if there are many uses of a formal parameter in a function, and you evaluate the corresponding actual each time the formal is used, and evaluating the actual is expensive, then you would have been better off simply evaluating the actual once. This leads to the definition of another useful evaluation order: leftmost innermost or applicative-order reduction (AOR). For AOR we always choose the leftmost of the innermost redexes. AOR corresponds to call-by-value parameter passing: all arguments are evaluated (once) before the function is called (or, in terms of lambda expressions, the arguments are reduced before applying the function). The advantage of AOR is efficiency: if the formal parameter appears many times in the body of the function, then NOR will require that the actual parameter be reduced many times while AOR will only require that it be reduced once. The disadvantage is that AOR may fail to terminate on a lambda expression that has a normal form.
It is worth noting that, for programming languages, there is a solution called call-by-need parameter passing that provides the best of both worlds. Call-by-need is like call-by-name in that an actual parameter is only evaluated when the corresponding formal is used; however, the difference is that when using call-by-need, the result of the evaluation is saved and is then reused for each subsequent use of the formal. In the absence of side-effects (that cause different evaluations of the actual to produce different values), call-by-name and call-by-need are equivalent in terms of the values computed (though call-by-need may be more efficient).
Define a lambda expression that can be reduced to normal form using either NOR or AOR, but for which AOR is more efficient.
Now it's time for our first theorem: The Church-Rosser Theorem. First, we need one new definition:
Theorem: if (X0 red X1) and (X0 red X2), then there is an X3 such that: (X1 red X3) and (X2 red X3). Pictorially:
X0 / \ / \ / \ v v X1 X2 \ / \ / \ / \ / v v X3where the arrows represent sequences of zero or more alpha- and/or beta-reductions.
Corollaries: if X has normal form Y then
First we'll assume that the theorem is true, and prove the two corollaries; then we'll prove the theorem. To make things a bit simpler, we'll assume that we're using DeBruijn notation; i.e., no alpha-reduction is needed.
To prove Corollary 1, note that "X has normal form Y" means that we can get from X to Y using some sequence of interleaved beta-reductions and beta-expansions. Pictorially we have something like this:
^ / ^ / / \ / ...... \ / \ / \ X v \ v Ywhere the upward-pointing arrows represent a sequence of beta-expansions, and the downward-pointing arrows represent a sequence of beta-reductions. Note that we cannot end with an expansion, since Y is in normal form.
We will prove Corollary 1 by induction on the number of changes of direction in getting from X to Y.
Base cases
X \ \ v Yi.e., we got from X to Y using zero or more beta-reductions, so we're done.
W ^ \ / \ / \ / v X Yi.e., we first use some beta-expansions to get from X to some lambda expression W, then use some beta-reductions to get from W to Y. Because every beta-expansion is the inverse of a beta-reduction, this means that we can get from W to X (as well as from W to Y) using a sequence of beta-reductions; i.e., we have the following picture:
W / \ / \ / \ v v X YThe Church-Rosser Theorem guarantees that there's a Z such that both X and Y reduce to Z:
W / \ / \ v v X Y \ / \ / v v ZSince Y is (by assumption) in normal form, it must be that Y = Z, and our picture really looks like this:
W / \ / | v | X | \ | \ / v v Ywhich means that X reduces to Y without any expansions.
Now we're ready for the induction step:
Induction Hypothesis: If X has normal form Y, and we can get from X to Y using a sequence of beta-expansions and reductions that involve n changes of direction (for n >= 1), then we can get from X to Y using only beta-reductions.
Now we must show that (given the induction hypothesis) Corollary 1 holds for n+1 changes of direction.
Here's a picture of an X and a Y such that n+1 changes of direction are needed to get from X to Y:
W ^ \ ^ \ ^ \ / \ / \ / \ / \ / \ / / v / v / v X ... ... Y <--1 change --> <-- n changes of direction -->Note that there is some lambda expression W (shown in the picture above) such that:
W \ \ v YCombining this with point 1 we have:
W ^ \ / \ / \ / v X YIn other words, we can get from X to W using only beta-expansions, and from W to Y using only beta-reductions. Using the same reasoning that we used above to prove the second base case, we conclude that we can get from X to Y using only beta-reductions.
Recall that Corollary 2 says that if lambda-term X has normal form Y then Y is unique (up to alpha-reduction); i.e., X has no other normal form. We can prove that by contradiction: Assume that, in contradiction to the Corollary, Y and Z are two different normal forms of X. By Corollary 1, X reduces to both Y and Z:
X / \ / \ / \ v v Y ZBy the Church-Rosser Theorem, this means there is W, such that:
X / \ / \ v v Y Z \ / \ / v v WHowever, since by assumption Y and Z are already in normal form, there are no reductions to be done; thus, Y = W = Z, and X does not have two distinct normal forms.
Recall that the theorem is:
Where "red" means "zero or more beta-reductions" (since we're assuming de Bruijn notation, and thus ignoring alph-reductions).
We'd like to prove the theorem by "filling in the diamond" from X0 to X3; i.e., by showing that something like the following situation must exist:
X0 / \ W1 Z1 / \ / \ W2 A X2 / \ / \ / X1 B C \ / \ / D E \ / F = X3In other words, we'd like to show that for every lambda term, if you can take two different "steps" (can do two different beta-reductions) to terms A and B, then you can come back to a common term C by doing one beta-reduction from A, and one from B. If we could show that, then we'd have the desired X3 by construction as shown in the picture above.
Unfortunately, this idea doesn't quite work; i.e., it is not true in general that we can get to common term C in just one step from A and one step from B. Below is an example that illustrates this, using * to mean a redex that reduces to y.
X0 = (λx.xx)(*) / \ / \ (**) (λx.xx)y \ / \ / \ / (y*) or (*y) / \ / \ / (yy)Note that there are two redexes in the initial term X0: * itself, and the one in which * is the argument to a lambda-term. So we can take two different "steps" from X0, arriving either at (**) or at (λx.xx)y. While we can come back to a common term, (yy), from both of those, it requires two steps from (**).
So to prove the Church-Rosser Theorem, we need a new definition:
Definition (the diamond property): A relation ~~> on terms has the diamond property iff
Note:
To prove the Church-Rosser Theorem we will perform the following 3 tasks:
Finally, we'll prove that ⇒* (a sequence of zero or more walks) has the diamond property, and we'll use that to "fill in the diamond" and thus to prove the Church-Rosser Theorem.
Definition (walk): A walk is a sequence of zero or more beta-reductions restricted as follows:
Here are some examples (again, * means a redex that reduces to y):
(λx.xx)(*) | | v (λx.xx)y | | v yyThis entire reduction sequence (2 beta-reductions) is a walk because the inner redex is reduced first. Here's what the reductions look like using the abstract-syntax tree:
apply apply apply / \ / \ / \ λ * λ y y y / \ --> / \ --> x apply x apply / \ / \ x x x xHowever, consider this sequence of beta-reductions (starting with the same initial term), first showing the lambda terms, then the abstract-syntax trees:
(λx.xx)(*) | | v (**) | | v (y*) | | v (yy) apply apply apply apply / \ / \ / \ / \ λ * * * y * y y / \ --> --> --> x apply / \ x x ==> =======================> this step is a these two steps constitute a walk walkAlthough as noted above, the first beta-reduction is a walk, and the second and third together are also a walk, the sequence of three reductions is not a walk because once the root "apply" is chosen to be reduced (which happens as the first reduction), no apply in the tree can be reduced as part of the same walk (so the reductions of the two "*" terms are illegal).
Here are two important insights about walks:
NOTE: We've now accomplished task (1) toward proving the Church-Rosser Theorem.
Task 2 involves proving that (X beta-reduce* Y) iff (X walk* Y).
The => direction is trivial; we must show that every sequence of zero or more beta-reductions is also a sequence of walks. Since each individual beta-reduction is a walk, we just let the sequence of walks be exactly the sequence of beta-reductions.
The <= direction is easy, too. Every walk is a sequence of zero or beta-reductions. So every sequence of walks is a concatenation of sequences of beta-reductions, which is itself a sequence of beta-reductions.
For task 3 of our proof of the Church-Rosser Theorem we must prove a lemma that says that the walk relation has the diamond property. We'll call that CRT Lemma 2, because to do that proof we first need another lemma:
CRT Lemma 1 (the walk relation is preserved when free variables are replaced by terms): if (X ⇒ Y) then (X[P/x] ⇒ Y[P/x])
(Note: X ⇒ Y means "X walk Y", and X[P/x] means X with the free occurrences of x replaced by P without capture.)
We won't give a formal proof of the Lemma; instead we'll convince ourselves by considering what happens to the free occurrences of x in X when X ⇒ Y (remember that X ⇒ Y means zero or more of the redexes in X are reduced bottom-up):
(i) If x is "ignored" by the function part of the redex, (i.e., the function body does not include any occurrences of its formal parameter) then there will be no occurrences of x after the reduction, so replacing occurrences of x with P after the reduction is an empty operation; similarly, if we replace x with P before the reduction there will be NO occurrences of P after the reduction. Thus, it doesn't matter if we do the substitution before or after.
(ii) If x is not ignored, then there will be one or more occurrences of x after the reduction; if the x's are replaced with P's before the reduction then there will be the same number of P's after the reduction. Note that these occurrences of x cannot be themselves reduced when X ⇒ Y, because the x's are variables, not applications. So again we get the same thing by doing the substitution before or after the reduction.
Show that CRT Lemma 1 is not iff; i.e., find an example X, Y, and P such that (X[P/x] ⇒ Y[P/x]) but it is not true that (X ⇒ Y).
Now we can prove CRT Lemma 2.
CRT Lemma 2 ( ⇒ has the diamond property): if (X0 ⇒ X1) and (X0 ⇒ X2) then there is an X3 such that (X1 ⇒ X3) and (X2 ⇒ X3).
Pictorially, Lemma 2 says that given X0, X1, and X2 with the following relationship (where the diagonal lines mean a walk):
X0 // \\ // \\ v v X1 X2we're guaranteed to have an X3 with the following relationship to X1 and X2:
X0 // \\ // \\ v v X1 X2 \\ // \\ // v v X3We will prove this by structural induction: induction on the height of the abstract-syntax tree for X0.
Base case: X0 is a variable (i.e., the height of the tree = 1).
In this case, to get from X0 to X1 or from X0 to X2 must require zero reductions (since a variable has no redexes). So X0 = X1 = X2, and the X3 we're looking for is the same thing, too: X0 = X3.
Inductive Step: Assume that CRT Lemma 2 holds for all lambda terms X0 with abstract-syntax tree of height less than or equal to n; show that it holds for all terms of height n+1. Note that there are two ways to get a lambda term of height n+1:
Case 1: X0 is of the form λx.M0
X0 = λx.M0 // \\ v v X1 = λx.M1 X2 = λx.M2By the induction hypothesis there exists an M3 such that:
M1 M2 \\ // v v M3since the height of M0 is n. By choosing X3 = λx.M3, the lemma is proved for this case.
Case 2: X0 is of the form M0 N0
X0 = M0 N0 // \\ v v X1 = M1 N1 X2 = M2 N2By the induction hypothesis, there exist M3 and N3 such that:
M1 M2 N1 N2 \\ // and \\ // v v v v M3 N3Since the walks M1 ⇒ M3 and N1 ⇒ N3 involve non-overlapping redexes, they can be concatenated to produce a walk, and similarly for M2, N2. Thus, we can choose X3 = M3 N3, and the lemma is proved for this case:
X0 = M0 N0 // \\ v v X1 = M1 N1 X2 = M2 N2 \\ // v v M3 N3
Case 2.2: Both X1 and X2 reduce the root 'apply'.
For this to happen, the root apply must be a redex; i.e., M0 must be of the form (λy.W0), which makes X0 of the form (λy.W0)(N0). Also, by the definition of walk, reduction of the root apply must be the last step in the walks from X0 to X1 and from X0 to X2, with earlier reductions walking from W0 to W1 and W2, and from N0 to N1 and N2. Pictorially:
By the induction hypothesis, there exists an N3 such that:
N0 // \\ v v N1 N2 \\ // v v N3
Claim: W1[N1/y] ⇒ W1[N3/y]
Justification: The form of W1[N1/y] is:
/\ / \ / \ /\ /\ /\ N1 N1 N1i.e., in its abstract-syntax tree, all of the N1's are at the "bottom" and do not overlap, because the y's in W1 were leaves.
This means that we can concatenate walks from each of the N1's to an N3 to get a walk on the entire tree (W1[N1/y]) that takes all of the N1's to N3's. Pictorially we have:
/\ /\ /\ /\ / \ / \ / \ / \ / \ ==> / \ ==> / \ ==>... / \ /\ /\ /\ /\ /\ /\ /\ /\ /\ /\ /\ /\ N1 N1 N1 N3 N1 N1 N3 N3 N1 N3 N3 N3and the whole thing is itself a walk: W1[N1/y] ⇒ W1[N3/y].
Similarly W1[N2/y] => W1[N3/y]. Now we have:
By the induction hypothesis there exists a W3 such that:
W1 W2 \\ // v v W3And by Lemma 1:
None of the beta-reductions used to walk from W1[N3/y] to W3[N3/y] takes place inside an N3; therefore, we can combine the "W" walk and the "N" walk to get a walk; i.e., given:
W1[N1/y] ⇒ W1[N3/y] ⇒ W3[N3/y] ^ ^ | | all reductions all reductions are inside N1s are above N3swe have:
W1[N1/y] ⇒ W3[N3/y]Here's the final picture:
By choosing X3 = W3[N3/y], the lemma is proved for this case.
Case 2.3: Exactly one of X1 and X2 (say X1) reduces the root 'apply'.
As for case 2.2., for this to happen X0 must be of the form (Ly.W0)(N0). Pictorially:
By the induction hypothesis, there must be an N3 and a W3 such that the following pictures hold:
W0 N0 // \\ // \\ v v v v W1 W2 N1 N2 \\ // \\ // v v v v W3 N3Now we're ready to put the pictures together:
Here's an explanation of each of the labeled transitions:
Now that we've accomplished our three tasks, it remains to show that ⇒* has the diamond property, and to use that fact to prove the original theorem.
In fact, it can be shown that given any relation ~~> that has the diamond property, ~~>* also has the diamond property. That is left as an exercise; we will give an informal argument here.
We want to show that for given a lambda term X0 such that X0 ⇒* X1, and X0 ⇒* X2, there exists an X3 such that X1 ⇒* X3, and X2 ⇒* X3. Pictorially, we have:
X0 // \\ v v W1 Z1 // \\ v v ... ... // \\ v v X1 X2and we want to show:
X0 // \\ v v W1 Z1 // \\ v v ... ... // \\ v v X1 X2 \\ // v v ... ... \\ // \\ // v v X3Since we know that each individual walk has the diamond property, we can "fill in the diamonds" as shown below, creating a sequence of walks from both X1 to X2, and X2 to X3:
X0 // \\ W1 Z1 // \\ // \\ W2 A Z2 //\\ // \\ // \\ W3 B C Z3 //\\//\\ // \\ // \\ ... ... ... ... ... ... ... //\\//\\ // \\ // \\ //\\ X1 X2 \\ ... // ... ... ... ... ... ... ... \\ // X3
Now for the final proof of the Theorem. Recall that we need to show that if (X0 red X1) and (X2 red X2), then there is an X3 such that (X1 red X3) and (X2 red X3). Task 2 showed that (X0 red X1) implies that (X0 ⇒* X1), and similarly for X2; in other words, we can get from X0 to either X1 or X2 using a sequence of walks. Since ⇒* has the diamond property, this means that we can also get from either X1 or X2 to X3 using a sequence of walks, and we have:
Using the result of task 2 again, we note that a sequence of walks is also a sequence of beta reductions, so (X1 ⇒* X3) implies (X1 red X3), and similarly for X2, and so we're done!