An Automated Framework for Proving Type-safety : A Rationale

For many years, type-safety has been considered an essential property of a language. Type-safety was an important goal for the designers of the Java language and is one of its primary selling points. Generally, Java programmers can expect to find many of their bugs revealed as type errors at compile time; in contrast, C programmers experience such bugs as inexplicable behavior or crashes at runtime. Java is also considered to be far more secure because of its type-safety. Even the initial version of Java was quite complicated, and no formal proof was given for its type-safety. Somewhat later this problem was addressed and Java is now generally considered to have been proven type-safe.

Around any reasonably popular language there will arise a large population of variants. These variants may arise to address limitations in the original language's expressiveness or to redress flaws in its design. Java is an outstanding example. There were many contenders for a version of generics for Java; a version based on GJ has now been incorporated into Java 1.5 Tiger. At the time that GJ was designed, it was not demonstrated to be type-safe. A proof of the type-safety of GJ followed only subsequently, and, in fact, work on the proof exposed at least one bug in the design of GJ.

The first part of the work we propose is a new kind of framework for experimenting with language extensions. The novel aspect of this framework will be the support that it provides for proving type-safety of language extensions. The proof is based on a translational semantics developed in detail by Kim Bruce in "Foundations of Object-oriented Languages: Types and Semantics". The semantics defines a translation from a simple object-oriented language to a variant of the lambda calculus which is itself type-safe. The proof, while extremely detailed, is quite uniform, (i.e., correctness of the translation of each syntactic element of the language must be proved separately but the subproofs are generally quite similar). Such a proof is best done using an automatic prover or proof assistant. The framework will generate the necessary additions to the proof automatically from the additional syntax wherever possible. Where this is not possible it will allow the designer of the language extension to specify the additional components of the proof easily.

Another long-standing concern is proving the correctness of programs. Various specification languages have been introduced in an attempt to solve this problem. A notable example for Java is the Java Modelling Language (JML). A drawback of most of these languages is that they are not themselves executable; without the feedback from execution, the programmer is less sure that the specifications themselves are correct. And the specification language is yet another language that the programmer must learn. Most of these specification languages support proofs on the specifications, and, in some cases, correctness of the specifications with regard to the corresponding code can be checked.

An algebraic language, such as ML, when viewed as a specification language, has many advantages. First, it is a real language in which programs can be written, compiled, and executed. Second, it is a language that a number of programmers already know and enjoy using. Third, its succinctness allows programs written in the language to resemble specifications very closely. Fourth, its functional nature allows the correctness of a program to be inferred generally more easily than that of a corresponding program in an imperative language.

Algebraic types, those typically used in ML, and object types, are duals of each other. This relationship has been explored formally in a number of papers. It is generally agreed that the visitor pattern, popularized in the "Design Patterns" book of Gamma, Helm, Johnson, and Vlissides is in essence an ad-hoc instantiation of an algebraic type.

The goal of the second part of this proposal is a transformer from a program written in a language with algebraic types to a semantically equivalent program in an object-oriented language, in this particular case from a subset of ML to Java. A programmer with such a tool would be able to design the essential algorithm of an application in ML, and then, when the algorithm has been satisfactorily implemented, transform it automatically into Java. During the initial implementation, the programmer benefits from the ease of development of ML; after the transformation, the programmer may take advantage of the enormous variety of libraries available as part of the Java API.