Type Inference (Part II)

Overview
Definitions
Axioms and Rules of Inference
Example Proofs
Algorithm W
Summary

Overview

In this set of notes we will define a formal system for type inference: a set of axioms and inference rules. By definition, an ML expression e has type t iff there exists a proof in this system.

To handle recursive function definitions more cleanly, we will make a slight change to the syntax. Instead of:

let rec

exp

₁

exp

₂

we'll use:

let

fix

exp

₁

exp

₂

where the two ids are the same; for example:

let

fix

then

else

Using this new syntax, a recursive function definition is a sub-case of a general let expression; e.g., the definition of length is of the form

let

exp

₁

where exp₁ is

fix

then

else

Definitions

Axioms and rules of inference are defined using sequents, which are of the form:
Note that:
- A is a set of assumptions. For the purposes of type inference, A will always be a set of bindings of identifiers and/or literals to types (i.e., A is a type environment like the ones we used in our informal type-inference algorithm). For example, { x:int, y:α, 3:int, true:bool } is a type environment that says that x has type int, and y has type α, and 3 has type int, and true has type bool.
- The symbol "|-" is called turnstile.
- e is an expression and τ is a type.
- A |- e:τ means "if we know A, then we can deduce that expression e has type τ". For example, if we know that L has type int-list, then we can deduce that the expression car(L) has type int.
A.x:τ means A minus (x:*) union {x:τ} (i.e., if A already contains an assumption about the type of x, then remove that assumption, and add the new one).
t1[t2/α] means t1 with all free occurrences of α replaced with t2. Both t1 and t2 are type expressions, and α is a type variable. (Recall that an α is free in t1 iff it is not inside a quantifier, "∀ α".)
A rule of inference is of the form:

The goal of type inference for ML is to prove that a given ML expression e has type t. To provide a proof using our formal system, we must create a proof tree such that:

Every leaf is an axiom (defined below), and
The root is of the form
where A includes only the bindings of the primitive functions and literals to their types, and
Every edge represents an application of one inference rule.

To create a proof tree, we will start with the root (what we want to prove), and work down to the leaves. At each step, we will use the rules of inference to add, as children of the current node, nodes that represent facts that we still need to prove.

Axioms and Rules of Inference

The axioms will all be of the form:

A.id:t |- id:t

A.literal:t |- literal:t

For example, the following are both axioms:

The rules of inference are as follows:

	A \|- e₁:bool	A \|- e₂:τ	A \|- e₃:τ
[COND]				(conditional)
	A \|- (if e₁ then e₂ else e₃):τ

	A.x:σ \|- e:τ
[ABS]		(fn abstraction)
	A \|- (λx.e):σ → τ

	A \|- e₁:σ → τ	A \|- e₂:σ
[APP]			(fn application)
	A \|- e₁ (e₂):τ

	A \|- e₁:σ	A.x:σ \|- e₂:τ
[LET]			(let exp)
	A \|- (let x = e₁ in e₂):τ

	A.x:τ \|- e:τ
[FIX]		(recursive fn)
	A \|- (fix x.e):τ

	A \|- e:∀ α.τ
[SPEC]		(specialization)
	A \|- e:τ[σ/α]

	A \|- e:τ
[GEN]		(where α is not free in A)	(generalization)
	A \|- e:∀ α.τ

The last rule, for generalization, may seem counter intuitive. It says that if, given A, we can infer that expression e has type τ, then we can infer that it has type ∀α.τ (for any type variable α that is not free in A).

To understand why this makes sense, note that:

if α is not in τ at all, then ∀α.τ is the same as τ
if α is in τ, then (since α is not free in A) we can infer that e has type τ for an arbitrary type α
thus we want to be able to use the specialization rule to replace α with some particular type t; we can only do that if we first use the generalization rule to introduce the ∀ quantifier

And why do we have the restriction that α cannot be free in A? That is because if α is free in A it means there is a "link" between some assumption in A and the fact that e has type τ. For example:

y:α |- e:α

means that e has the same type as y. Allowing e:∀α.α would break that link (and allow us to reach an invalid conclusion about the type of e).

Example Proofs

Now we give some examples of proofs that use the system defined above.

Proof 1: λx.x has type ∀α.α→α

Note that this proof doesn't rely on any initial assumptions (e.g., about the types of primitive functions or literals). So the sequent that is the root of the proof tree has nothing to the left of the turnstile:

|- (λx.x):∀ α.α → α

Now we must use the rules of inference "upside down" to build the proof tree until every leaf is an axiom. Using the rules "upside down" means that we look for a rule whose bottom part matches a leaf node in our current proof tree; we then grow the tree by giving that node one new child for each sequent in the top of the inference rule.

Our current tree has just the root node given above:

|- (λx.x):∀ α.α → α

We'd like to be able to use the [ABS] rule, but it doesn't quite match, since it doesn't have a ∀ . Therefore, we must first use the [GEN] rule (upside down) to get rid of the ∀α; i.e., we grow the proof tree to be:

|- (λx.x):∀ α.α → α

                 |
		 | [GEN]
		 v

|- (λx.x): α → α

Now we can use the [ABS] rule to get:

|- (λx.x):∀ α.α → α

                 |
		 | [GEN]
		 v

|- (λx.x): α → α

                 |
		 | [ABS]
		 v

x: α |- x: α

And now the (single) leaf of our proof tree is an axiom, so we're done!

Proof 2: (λx.x)(3) has type int

For this proof, we need to assume that 3 has type int; so the root of our proof tree is:

3:int |- (λx.x)(3): int

The form of the expression to the right of the turnstile is a function application, so we need to use the [APP] rule:

3:int |- (λx.x)(3): int

                          /             \
                         /               \ [APP]
                        v                 v

3:int |- (λx.x): σ→int

3:int |- 3: σ

Note that there is no sigma in the sequent in the bottom of the [APP] rule; i.e., to show that a function application has type τ, we must show that the function has type σ→τ, and that the argument has type σ for some σ. We'll complete our proof by showing that it holds when σ is int. So our proof tree is:

3:int |- (λx.x)(3): int

                         /             \
                        /               \ [APP]
                       v                 v

3:int |- (λx.x): int→int

3:int |- 3: int

The right leaf is an axiom, so that branch of the proof is complete. To complete the left branch we use the [ABS] rule (since the left leaf involves lambda abstraction):

3:int |- (λx.x)(3): int

                         /             \
                        /               \ [APP]
                       v                 v

3:int |- (λx.x): int→int

3:int |- 3: int

              |
              | [ABS]
              v

3:int . x: int |- x: int

TEST YOURSELF #1

Show that:

fix

then

else

has type:

∀ α.α-list → int

Assume the initial type environment A:

{0:int, null: ∀ α.α-list → bool, cdr:∀ α.α-list → α-list, succ:int → int }.

(Note that this is a much longer proof than the examples given above!)

solution

Algorithm W

Finally we're ready to present Algorithm W, our sound and complete-up-to-shallow-types type-inference algorithm.

The input to Algorithm W is a type environment A and an ML expression e. An expression has a well-typing T iff there is a proof in the system defined above that e has type T. Algorithm W computes the most general type of e if it (and all its subexpressions) have shallow well-typings.

A shallow type is a type in which all quantifiers occur at the beginning. For example,

∀ α. ∀ β. α→β

is shallow, while

∀ α.(α→(∀ β.(α x β)))

is not shallow.

The fact that Algorithm W can only handle expressions with shallow well-typings is a limitation of the algorithm compared to the formal method. For example, the expression

λ f. pair(f(3))(f(true))

only has only a non-shallow type: (∀α.∀β.α→β) → (∀ γ.∀δ.γ x δ), and so Algorithm W fails on that expression itself, and also on:

(λ f. pair(f(3))(f(true)))(λx.x)

because it includes a sub-expression with a non-shallow type (even though the whole expressions has a shallow type, namely, int x bool).

Algorithm W has been shown to be sound and complete up to shallow types. where soundness and completeness are defined as follows:

Sound: If Algorithm W says e has type T then there is a proof that e has type T.
Complete: (up to shallow types): if there is a proof that e has type T and the proof involves only shallow types, then Algorithm W will find that e has type T (or a generalization of T).

To understand Algorithm W, we must first understand substitution and unification.

Substitution

A substitution is a map from type variables to type expressions, which can be represented using pairs of the form (ID : exp), where ID is a type variable, and exp is a type expression. In Algorithm W, substitutions capture what we called "forced equalities" in our previous informal approach to type inference. For example, consider the expression:

λ f. succ(f(3))

The abstract-syntax tree is:

                            λ
                          /   \
                         f    apply
                              / \
                          succ  apply
                                 / \
                                f   3

When we use our informal approach to typechecking (visiting nodes in post-order, assigning each a type), we get two forced equalities:

When we process the lower apply, the fact that we are doing a function application, and that the argument is an int together force the type of f (the non-generic type t1) to be int→t2.
When we process the other apply, the fact that succ requires an int argument forces t2 to be int.

These "forced equalities" will be represented by two substitutions: (t1: int→t2) and (t2: int).

There are three operations involving substitution that are used by Algorithm W:

A substitution S can be applied to a type expression exp yielding a new type expression.
A substitution S can be applied to a type environment A, yielding a new type environment.
Two substitutions S1 and S2 can be composed, yielding a new substitution.

Operation 1: Apply S to exp

First, find all type variables t such that:

t is free in exp, and
S includes a mapping for t (i.e., S includes t:e).

Second, for each such t, replace all free occurrences in exp with e.

We will use the notation "S exp" to denote applying substitution S to expression exp. Here are two examples:

Example 1: Assume that substitution S is: { (t1 : int→t2), (t2, int) }, and that type expression exp is : t1→(bool→(t1 x t2)). The type variables that are free in exp are t1 and t2, and both have mappings in S. Therefore, S exp yields (int → t2) → (bool → ((int → t2) x int)).

Note that application "happens only once"; e.g., the result of S exp includes some occurrences of t2 even though S maps t2 to int.

Example 2: Assume that S is as above, and that exp is ∀t1.t1→t2. In this example, S includes a mapping for both t1 and t2, but only t2 is free in exp. Therefore, S exp yields: ∀t1.t1→int.

We say that a substitution is idempotent iff for all type expressions exp:

exp

i.e., applying S more than once doesn't have any effect. Note that the substitution S defined in Example 1 is not idempotent; however, when we apply Algorithm W, all substitutions that arise are idempotent.

Operation 2: Apply S to A

A substitution S can be applied to a type environment A to yield a new type environment. Recall that a type environment A is a map from ID's to type expressions; i.e., we can think of A as a set of bindings of IDs to type expressions of the form

(ID: exp)

(Although type environments and substitutions seem similar, remember that a type environment maps each ID that occurs in an ML expression to a type expression, while a substitution maps type variables to type expressions.)

To apply S to A we simply apply S to the "exp" part of each mapping in A.

For example:

Type environment A: (f: t1)
Substitution S: (t1: int→t2)
Result of applying S to A: (f: int→t2)

We will use the notation

S A

to denote S applied to A.

Operation 3: Compose substitutions S1 and S2

The result of composing two substitutions S1 and S2 is a new substitution T such that for all type expressions exp, T exp = S1 S2 exp.

We will use the notation

S1 o S2

to mean "S1 composed with S2".

Here's an example:

S1 = (t2: int)
S2 = (t1: int→t2)
S1 o S2 = (t2: int) (t1: int → int)

Note that S1 o S2 is not just the union of the mappings in S1 and S2; i.e., it is not equal to:

(t2: int) (t1: int→t2)

TEST YOURSELF #2

Find an expression e such that (S1 o S2) e is not the same as (S1 ∪ S2) e.

solution

To actually compute substitution T = S1 o S2, we can start by setting T equal to S2; then apply S1 to all of the type expressions in T; then add to T all mappings in S1 that are for a type variable t not mapped by S2:

Start with T = S2.
For each (t: exp) in T, apply S1 to exp.
For each (t: exp) in S1 such that T does not include a mapping for t, add (t: exp) to T.

Unification

Unification is the operation that figures out what the "forced equalities" of our informal type-inference algorithm should be. For example, if one branch of an if-then-else has type α→β, and the other branch has type α→int, then unification would figure out that β needs to be int. In general, you can think of unification as solving the equation t1=t2, by finding values for the type variables in t1 and t2.

More formally, the goal of unification is:

maximally general

Maximally general is defined as follows:

no less general

S is maximally general with respect to type expressions t1 and t2, if it is no less general than any other substitution S' such that: S' t1 = S' t2.

Note that, for the purposes of type inference, we want unify to return an idempotent substitution. One consequence of this is that it cannot return a cyclic substitution like:

t1: int→t1

(because a cyclic substitution is never idempotent). We'll see below where cyclic substitutions are prevented.

We will define unification by defining a function Unify with three parameters: type expressions exp1 and exp2, and idempotent substitution S. (When used in Algorithm W, Unify will always be called initially with an empty substitution; it is only on recursive calls that S will be non-empty.)

Unify will return either FAIL (if the two expressions cannot be unified) or will return an idempotent substitution U such that:

U S exp1 = U S exp2, and
for all other idempotent substitutions U' such that
U is no less general than U'.

Here is a definition of Unify using some code and some English explanations:

Unify (S, exp1, exp2)
if (S == FAIL) return FAIL

if (exp1 is TYPEVAR(t))
if (exp2 is also TYPEVAR(t)) return S
else if (t occurs in exp2) return FAIL
else if (S maps t to some type expression e)
// this prevents returning a cyclic type
return Unify(S, e, exp2)
else let exp2' = S(exp2) in
if (exp2' is TYPEVAR(t)) return S
else if (t occurs in exp2') return FAIL
else return t:exp2' o S

if (exp2 is TYPEVAR(t)) return unify(S, exp2, exp1)

// here if neither exp1 nor exp2 is a type variable

if (root(exp1) ≠ root(exp2)) return FAIL

if (root(exp1) and root(exp2) are primitive types)
return S

// the roots of exp1 and exp2 are type operators
// (→ or x or list)
for (each corresponding pair of subtrees T1 and T2 of the two roots) {
unify T1 and T2;
use S for the first call to Unify;
use the result of the previous call for each subsequent call;
}
return the final resulting substitution
}

Note that when we unify a type variable t with a type expression e, we need to be very careful to keep the returned substitution idempotent We can't simply add t:e to S. For example, if we have:

exp1 = t3
exp2 = t2→bool
S = (t2: t1)

then adding (t3: t2→bool) to S would create a non-idempotent substitution. We also can't simply compose t:e with S, because S may include a mapping for t, and e may include a mapping for some type variable in an expression in S (i.e., there are times when we would need to return S o t:e, and times when we would need to return t:e o S). Therefore, we apply S to both exp1 and exp2. If the result of the first application is still exp1, then we can return the result of composing the new mapping with S. If not, we call Unify again with s and the results of the two applications.

Here are some examples of unification, always assuming an empty initial substitution:

           exp1                                   exp2

             X                                      X
            / \                                    / \
         int   bool                              t1   bool


      U = {t1: int}




           →                                       →
          /  \                                     / \
        t1    X                                 int   X
             / \                                     / \
           t2   int                              bool   t1
              

     U = { t1:int;  t2:bool }



       →                                          →
      /  \                                        / \
    t1    X                                    int   t3
         / \                                    
       t2   bool

    U = { t1:int;  t3: t2 X bool }




      →                                           →
     /  \                                        /  \
   t1    X                                    int    t1
        / \
      t2   bool

      FAIL



 
       →                                           →
      /  \                                        /  \
    t1    X                                     t2    t1
         / \
       t2   bool

       FAIL

Let's consider the last two examples in more detail. When we attempt to unify (t1→(t2 x bool)) with (int→t1), the roots match, so we unify the left subtrees. That produces the substitution (t1: int). Now we use that substitution as S when we unify the right subtrees. The root of the subtree for the second expression is a type variable (t1), and there is a mapping for t1 in S (namely (t1: int)), so we attempt to unify int with (t2 x bool). That fails, since the root of one expression tree is a primitive type, while the root of the other is a type operator.

The last example starts similarly: we first unify the two left subtrees, producing the substitution (t1: t2), which is used as S when we unify the right subtrees. Again, the root of the subtree for the second expression is a type variable (t1), and again there is a mapping for t1 in S (namely (t1: t2)), so we attempt to unify t2 with (t2 x bool). This fails because t2 occurs in (t2 x bool).

The Algorithm

Finally we're ready for Algorithm W itself! Recall that the inputs to the algorithm are:

A type environment A (remember that a type environment is a map from the identifiers that appear in the ML expression being typechecked to type expressions). On the first call, the identifiers are the primitive function names such as cons and pair; on recursive calls, the type environment may include IDs from expressions like:
An ML expression e to be typechecked.

Algorithm W returns FAIL if the expression has no shallow well-typing otherwise it returns two values:

an idempotent substitution T (at the top level we don't care about T, but on recursive calls T is useful)
a type expression τ -- the type of the ML expression e.

In the algorithm given below, we use I to mean the empty (or identity) substitution. We also omit the circles when composing substitutions; e.g., SR means S o R.

Algorithm W

W(A,e) = (T, τ ) or FAIL, where

If e is a literal or an identifier x, then if x not in A then FAIL; else let
then T = I and τ = σ[β₁/α₁] ... [β_n/α_n], where each β_k is a new type variable.
If e = f (g) (function application), let
where β is a new type variable. Then T = USR and τ = U β.
If e = if p then e₁ else e₂, let
Then T = U'S'SUR and τ = U'σ'.
If e = λx.f, let
where β is a new type variable. Then T = R and τ = R(β→ρ).
If e = fix x.f, let
where β is a new type variable. Then T = UR and τ = UR β.
If e = let x = f in g, let
where ρ' = ∀ α₁ ... α_n.ρ, and α₁, ..., α_n are the type variables that are free in ρ and are not free in RA. Then T = SR and τ = σ.

Below are more detailed explanations of the various cases for Algorithm W.

identifier x

FAIL is returned if A does not include a mapping for x. Otherwise, substitution T is the identity substitution (typechecking an identifier or literal does not cause any "forced" equalities); if the type t of x in A is non-generic, then τ is just t; otherwise (t is the generic type ∀ α₁. ∀ α₂. ... ∀ α_n.σ) τ is σ with all free occurrences of α₁ ... α_n replaced with new type variables β₁ ... β_n.

Here are some examples:

x = cons
x = pair

function application: f(g) or if p then e1 else e2

The idea in these two cases is:

Typecheck each subexpression, left to right.
Use the substitution returned from the typecheck of one subexpression during the typecheck of the other subexpressions (i.e., apply the substitution returned by one recursive call to Algorithm W to the type environment A passed in to the next recursive call).
Use unification as appropriate; e.g., the types of the "then" and "else" branches must unify; the type of "p" must unify with bool.
The result substitution is the composition of the intermediate substitutions.

λ x.f

Recall that our informal typechecking rules said that for a lambda abstraction, the ID x gets a non-generic type in f. So we simply do a recursive call on Algorithm W to typecheck f, passing in as the type environment the original one, A, augmented with x:β (where β is a new type variable).

The recursive call to Algorithm W returns the pair (R, ρ), where ρ is the type inferred for f. The result for λx.f is the substitution R and the type R(β→ρ); i.e., a function type where the domain is (loosely) the type of x and the range is the type of f. We apply the substitution R to that function type because the process of typechecking f might have constrained β to some more specific type.

For example, if we typecheck the expression λx.plus(x)(1), then x must be an int; R will be (β: int), ρ will be int, and R(β→ρ) will be int→int.

fix x . f

As in the case for a lambda abstraction, our informal typechecking rules for a recursive definition said that x should be given a non-generic type in f. Those rules also said that the type assigned to x must be consistent with the type inferred for f.

The first point is handled (as it was for the lambda abstraction) by calling Algorithm W with type environment A extended with the new map (x: β).

The second point is handled by the call Unify(I, Rβ, ρ), which ensures that Rβ (the forced equalities discovered during the typechecking of f applied to the type assigned to x) unifies with ρ (the type inferred for f).

let x = f in g

For a let expression, our informal rules said that x should get a "generic version" of f's type in g, but any type variable in g that is associated with an identifier in an enclosing lambda should not be made generic. Note that if this let expression is inside an enclosing lambda: λy.exp, then the type environment A will include (y: t); i.e., the type associated with the lambda's ID will be free in A. That is why the type ρ' used to typecheck g is the type inferred for f with "forall"s added to the front only for type variables that are not free in RA.

To understand a bit better how typechecking of let expressions works, consider: let f = λx.x in pair(f(3))(f(true))

When W is first called, A = { pair: ∀α.∀β.α→(β→(α x β)) }.

The first thing that happens is a recursive call to W to typecheck λx.x: W(A, λx.x).

The result of that call is:

The only free type variable in ρ is t1; t1 is not free in A (it isn't in A at all) so the type used for f during the typechecking of:

pair (f(3))(f(true))

is:

∀ t1. t1→t1

i.e., the next call to W is:

The fact that f has a generic type allows f to be applied to arguments of different types (3 and true), so this call to W succeeds and returns the type: int x bool.

Now consider the expression

let

Recall that this expression should not typecheck, and indeed it does not:

At the "top-level", this is a lambda abstraction, so the first recursive call augments the type environment with a new binding (g: t1).
Next, the first part of the let, "f = g" is processed. The type inferred for g is t1.
The next step is to process the second part of the let, "pair(f(3))(f(true))" with an extended type environment that includes a type for f. This type is the "generic version" of t1. However, although t1 is free in t1, it is also free in the type environment (which is {(g: t1)}, and so no quantifiers are added to the front of the type; i.e., the type environment used for typechecking "pair(f(3))(f(true))" includes the type for pair as well as: (g: t1), (f: t1).
This means that when we typecheck "pair(f(3))(f(true))", both instances of f are forced to have the same type. Typechecking "f(3)" returns a substitution in which t1 is bound to int→t2. That substitution is applied to the type environment used to typecheck "f(true)"; i.e., the type environment becomes {(pair: ...), (g: int), (f: int→t2)}. This causes the typecheck of "f(true)" to fail.

TEST YOURSELF #3

Trace Algorithm W on the input: (λx.x)(3).

solution

Summary

Typechecking and type inference are interesting and challenging problems for languages with polymorphic functions. We have looked at a formal type system (with axioms and rules of inference) that can be used to prove theorems of the form exp: t (where exp is an ML expression and t is a type expression) meaning that exp has type t. We have also looked at Algorithm W, a type-inference algorithm that is sound and complete up to shallow types.

	A \|- e₁:bool	A \|- e₂:τ	A \|- e₃:τ
[COND]				(conditional)
	A \|- (if e₁ then e₂ else e₃):τ

	A \|- e₁:σ → τ	A \|- e₂:σ
[APP]			(fn application)
	A \|- e₁ (e₂):τ

	A \|- e₁:σ	A.x:σ \|- e₂:τ
[LET]			(let exp)
	A \|- (let x = e₁ in e₂):τ

Type Inference (Part II)

Contents