Most programming languages include the notion of a type system.
This is because types help uncover logical errors.
Although typechecking can be done either statically (at compile
time) or dynamically (at run time), static typechecking has
the advantage that if your program typechecks, you know that
there will be no type violations on any run;
if typechecking is done dynamically, the fact that one run
produced no type errors generally provides no guarantees
about what will happen on other runs.
There are two different approaches to static typechecking:
Both static and strong typing require type inference:
a technique that determines the type of every expression
(possibly given declarations of the types for some variables and some
user-defined functions).
The goal of static typing is to assign a monotype (a single
type) to each expression;
in contrast, strong typing may assign some expressions a polytype
(a type with type variables).
We will consider how to do strong typing in the presence of
polymorphism, using Milner's polymorphic type inference algorithm,
"Algorithm W" (see the paper
by Lucca Cardelli on polymorphic typechecking).
Milner's algorithm was developed for the language ML.
We'll use a simpler language (defined below).
We'll start with an informal definition of how to do type-inference
(in English), then we'll give a formal definition via axioms
and rules of inference (so that an expression e has
a type t iff there is a proof in this system).
We'll see that algorithm W is:
Overview
exp | → | ID | |
| | literal | // int, bool, list, or pair | |
| | λ ID . exp | // function definition | |
| | exp (exp) | // function application | |
| | if exp then exp else exp | // normal if-then-else | |
| | let ID = exp in exp | // define a "macro" or a non-recursive fn | |
| | let rec ID = exp in exp | // define a recursive fn |
The primitive types are:
The primitive functions are:
We will assume that all functions are in curried form (i.e., take only one argument),we'll use functions instead of operators (e.g., "plus(1)(2)" instead of "1+2"), and we'll use square brackets for list literals (e.g., [1,2,3]).
function | type |
---|---|
succ | int → int |
iszero | int → boolean |
plus (uncurried form) | (int x int) → int |
plus (curried form) | int → (int → int) |
cons (uncurried form) | (α x α-list) → α-list |
cons (curried form) | α → (α-list → α-list) |
car | α-list → α |
pair | α → (β → (α x β)) |
Intuitively, a type variable means "any type", although if one type variable occurs multiple times in a type, then they all have to refer to the same type. For example, since we've restricted our attention to homogeneous lists, cons is restricted to operate on an object of some type and a list of objects of that same type, rather than an arbitrary object and an arbitrary list. Therefore, the type of cons is α → (α-list → α-list). We impose no such restriction on pair; its arguments can have unrelated types.
Recall that our goal is to find the most general type for each expression in a program. We'll use "T1 ⊇ T2" (where T1 and T2 are types) to mean: T1 is at least as general as T2, and we'll use "T1 ⊃ T2" to mean T1 is strictly more general than T2. Here's how the ordering is defined:
By definition:
So for example:
(1) | if cond then exp1 else exp2 | |
(a) the type of cond must be bool | ||
(b) the types of exp1 and exp2 must be the same | ||
(c) the type of the whole expression must be the same as the types of exp1 and exp2 | ||
(2) | function application: fn(arg) | |
(a) the type of fn must be α → β | ||
(b) the type of arg must be α | ||
(c) the type of the whole expression is β | ||
(3) | function abstraction: λ id.exp | |
(a) the type of id is α | ||
(b) the type of exp is β | ||
(c) the type of the whole expression is α → β | ||
(4) | let id = e1 in e2 | |
let rec id = e1 in e2 | ||
(a) inside e2 the type of id is the type of e1 | ||
(b) the type of the whole expression is the type of e2 |
Given these rules, here's an informal algorithm for how to typecheck an expression (i.e., how to infer the types of all subexpressions, and make sure that everything is consistent); we assume that we're given the abstract-syntax tree representation of the expression:
Below is the AST for the length function, annotated to show the result of typechecking (the type of each node is shown in parentheses). The type environment is also shown, as are the forced equalities discovered during typechecking (shown in a table at the bottom right, and also as ** xx = yy ** at the point in the tree where they are discovered). Note that some of the types in the type environment are "not quite right." That issue is explained in the next section.
let rec length = λ L. if null(L) then 0 else succ (length (cdr(L))) let rec ** t1 = α-list → int ** -------- / \ / \ (t1) length lambda (α-list → int) / \ / \ (t2) L if-then-else (int) / | \ / | \ ** t3 = int** **t2=α-list** (bool) apply 0 (int) apply (int) / \ / \ null L succ apply (t3) (α-list → bool) (t2) (int → int) / \ **t1:β-list → t3** / \ length apply (β-list) (t1) / \ **α = β** / \ cdr L (α-list) (β-list → β-list) type env equalities --------- ----------- * null: α-list → bool t2 = α-list succ: int → int α = β * cdr: β-list → β-list t1 = β-list → t3 t3 = int * length: t1 l: t2 (Note: * means not quite right)
Generic and Non-Generic Type Variables