CS 536 Program 4: Name Analysis and Type Checking

Due date: Wednesday, December 1 (by midnight)
Not accepted after midnight on Saturday, December 4

Overview | Requirements | Announcements | Handin

Overview

For this assignment you will write a two-pass static-semantic analyzer for C-- programs represented as abstract-syntax trees. Your main task will be to write name analysis and type checking methods for the nodes of the AST. In addition you will need to:

Modify the Sym class from program 1 (by including some new fields and methods and/or by defining some subclasses).
Modify the IdNode class in ast.java (by including a new Sym field and by modifying its unparse method).
Write a new main program, P4.java (an extension of P3.java).
Modify the Errors class.
Update the Makefile used for program 3 to include any new rules needed for program 4.
Write three test inputs: nameErrors.C, typeErrors.C, and test.C to test your new code.

Requirements

Getting started
Name Analysis
Type Checking
- Preventing Cascading Errors
Other Tasks
Some Advice

Getting Started

Skeleton files on which you should build are in: ~cs536-1/public/prog4

The files are:

c.cup: Use this code if there were problems with your own version.
ast.java: Use this code if there were problems with your own version. You will need to add to this file or to your own version.
Type.java: You may use the Type class defined in this file, but you are not required to use it.

Name Analysis

The name analyzer will perform the following tasks:

Build symbol tables. You will use the "list of hashtables" approach (using the SymTab class that you wrote for program 1).
Find bad declarations, multiply declared names, and uses of undeclared names. Like C++, the C-- language allows the same name to be declared in non-overlapping or nested scopes. Unlike C++, the formal parameters of a function are considered to be in the same scope as the function body. All names must be declared before they are used. A bad declaration is a declaration of anything other than a function to be of type void.
Update all of the IdNodes in the abstract-syntax tree to include pointers to the corresponding symbol-table entries (i.e., to have fields of type Sym). Note: all IdNodes should be updated, whether they represent declarations or uses.

You must implement your name analyzer by writing appropriate methods for the different subclasses of ASTnode. Exactly what methods you write is up to you (as long as they do name analysis as specified).

It may help to start by writing the name analysis method for ProgramNode, then work "top down", adding a method for DeclListNode (the child of a ProgramNode), then for each kind of DeclNode, and so on. Be sure to think about which nodes' methods need to add a new hashtable to the symbol table (i.e., when is a new scope being entered), and which methods need to remove a hashtable from the symbol table (i.e., when is a scope being exited).

Some of the methods will process the declarations in the program (checking for bad declarations, and checking whether the names are multiply declared, and if not, adding appropriate symbol-table entries), and some will process the statements in the program (checking that every name used in a statement has been declared). The methods that process IdNodes should also add a link to the corresponding symbol-table entry. Note that you should not add a link for an IdNode that corresponds to a declaration of a name that has already been declared, or a use of an undeclared name.

Your name analyzer should find all of the errors described in the following table; it should report the specified position of the error, and it should give exactly the specified error message (each message should appear on a single line, rather than how it is formatted in the following table). Error messages should have the same format as in the scanner and parser (i.e., they should be issued using a call to Errors.fatal).

Type of Error	Error Message	Position to Report
Bad declaration (variable or parameter of type `void`).	Non-function declared void	The first character of the ID in the bad declaration.
More than one declaration of an identifier in a given scope	Multiply declared identifier	The first character of the ID in the duplicate declaration
Use of an undeclared identifier	Undeclared identifier	The first character of the undeclared identifier

Note that the names themselves should not be printed as part of the error messages.

During name analysis, if a function name is multiply declared you should still process the formals and the body of the function; don't add a new entry to the current symbol table for the function, but do add a new hashtable to the front of the SymTab's list for the names declared in the body (i.e., the parameters and other local variables of the function).

If you find a bad variable declaration (a variable of type void), you should give an error message and add nothing to the symbol table. Note that if a declaration is both "bad" (non-function declared void) and is a declaration of a name that has already been declared in the same scope, you should give two error messages.

Type Checking

The type checker will determine the type of every expression represented in the abstract-syntax tree, and will use that information to identify type errors. You must implement your type checker by writing appropriate member methods for the different subclasses of ASTnode. Your type checker should find all of the type errors described in the following table; it must report the specified position of the error, and it must give exactly the specified error message. (Each message should appear on a single line, rather than how it is formatted in the following table.)

Type of Error	Error Message	Position to Report
Writing a function; e.g., "`cout << f`", where `f` is a function name.	Attempt to write a function	The first character of the function name.
Writing an array; e.g., "`cout << A`", where `A` is an array.	Attempt to write an array	The first character of the array name.
Reading a function: e.g., "`cin >> f`", where `f` is a function name.	Attempt to read a function	The first character of the function name.
Reading an array; e.g., "`cin >> A`", where `A` is an array.	Attempt to read an array	The first character of the array name.
Calling something other than a function; e.g.: "`x();`", where `x` is not a function name. Note: In this case, you should not type-check the actual parameters.	Attempt to call a non-function	The first character of the variable name.
Calling a function with the wrong number of arguments. Note: In this case, you should not type-check the actual parameters.	Function call with wrong number of args	The first character of the function name.
Calling a function with an argument of the wrong type. Note: you should only check for this error if the number of arguments is correct. If there are several arguments with the wrong type, you must give an error message for each such argument.	Type of actual does not match type of formal	The first character of the first identifier or literal in the actual parameter.
Returning from a non-void function with a plain `return` statement (i.e., one that does not return a value).	Missing return value	0,0
Returning a value from a `void` function.	Return with a value in a void function	The first character of the returned expression.
Returning a value of the wrong type from a non-`void` function.	Bad return value	The first character of the returned expression.
Applying an arithmetic operator (`+`, `-`, `*`, `/`) to an operand with type other than `int`.	Arithmetic operator applied to non-numeric operand	The first character of the first identifier or literal in an operand that is an expression of the wrong type.
Applying a relational operator (`<`, `>`, `<=`, `>=`) to an operand with type other than `int`.	Relational operator applied to non-numeric operand	The first character of the first identifier or literal in an operand that is an expression of the wrong type.
Applying a logical operator (`!`, `&&`, `\|\|`) to an operand with type other than `bool`.	Logical operator applied to non-bool operand	The first character of the first identifier or literal in an operand that is an expression of the wrong type.
Using a non-`bool` expression as the condition of an `if`.	Non-bool expression used as an if condition	The first character of the first identifier or literal in the condition.
Using a non-`bool` expression as the condition of a `while`.	Non-bool expression used as a while condition	The first character of the first identifier or literal in the condition.
Using a non-integer expression as an array index.	Non-int expression used as an array index	The first character of the first identifier or literal in the index expression.
Indexing into something that is not an array.	Index applied to non-array operand	The first character of the non-array name.
Applying an equality operator (`==`, `!=`) to operands of two different types (e.g., "`j == true`", where `j` is of type `int`), or assigning a value of one type to a variable of another type (e.g., "`j = true`", where `j` is of type `int`).	Type mismatch	The first character of the first identifier or literal in the left-hand operand.
Applying an equality operator (`==`, `!=`) to `void` function operands (e.g., "`f() == g()`", where `f` and `g` are functions whose return type is `void`).	Equality operator applied to void functions	The first character of the first function name.
Comparing two functions for equality, e.g., "`f == g`" or "`f != g`", where `f` and `g` are function names.	Equality operator applied to functions	The first character of the first function name.
Comparing two arrays for equality, e.g., "`A == B`" or "`A != B`", where `A` and `B` are the names of arrays.	Equality operator applied to arrays	The first character of the first array name.
Assigning a function to a function; e.g., "`f = g;`", where `f` and `g` are function names.	Function assignment	The first character of the first function name.
Assigning an array to an array; e.g., "`A = B;`", where `A` and `B` are the names of arrays.	Array assignment	The first character of the first array name.

Note that "array" is part of the type of every array variable; so it is an error to apply an arithmetic, relational, equality, or logical operator to an (entire) array, or to use an array on either side of an assignment. For example, given the declarations:

int x;
int A[10];
int B[10];
bool b;

The following are all errors:

A + 5;          /* Arithmetic operator applied to non-numeric operand */
A = x;          /* Type mismatch */
if (A == 0) ... /* Type mismatch */
if (A == b) ... /* Type mismatch */
if (A < B) ...  /* Relational operator applied to non-numeric operand */

Note also that given an expression like A[x], you should first check whether A is an array (and give an error message if it is not), and then (even if A is not an array) you should check whether x is of type int. If A is an array, then the type of the whole expression should be the type of the elements of A even if x is not an integer; otherwise, the type of the whole expressions should be ErrorType (see below).

Preventing Cascading Errors

A single type error in an expression or statement should not trigger multiple error messages. For example, assume that A is an array of ints, and f is a function that has one integer parameter and returns a bool. Each of the following should cause only one error message:

cout << A + 1          // A + 1 is an error; the write is OK
A[1+true]              // 1 + true is an error; the subscript is OK
(true + 3) * 4         // true + 3 is an error; the * is OK
true && (false || 3)   // false || 3 is an error; the && is OK
f("a" * 4);            // "a" * 4 is an error; the call is OK
1 + A();               // A() is an error; the + is OK
(true + 3) == x        // true + 3 is an error; the == is OK
                       // regardless of the type of x

One way to accomplish this is to use a special ErrorType for expressions that contain type errors. (Note that ErrorType has been defined in Type.java.) In the first example above, the type given to (true + 3) should be ErrorType, and the type-check method for the multiplication node should not report "Arithmetic operator applied to non-numeric operand" for the first operand. But note that the following should each cause two error messages (assuming the same declarations of A and f as above):

true + "hello" // one error for each of the non-int operands of the +
1 + f(true)    // one for the bad arg type and one for the 2nd operand of the +
1 + f(1, 2)    // one for the wrong number of args and one for the 2nd operand of the +
true || A[1+true] // one for the bad index, and one for the 2nd operand of the ||
return 3+true; // in a void function: one error for the 2nd operand to +
               // and one for returning a value

To provide some help with this issue, here is an example input file, along with the corresponding error messages. (Note: This is not meant to a complete test of the type checker; it is provided merely to help you understand some of the messages you need to report, and to help you find small typos in your error messages. If you run your program on the example file and put the output into a new file, you can use the Unix utility diff to compare your file of error messages with the one supplied here. This will help both to make sure that your code finds the errors it is supposed to find, and to uncover small typos you may have made in the error messages.)

Other Tasks

Extending the `Sym` Class

It is up to you exactly what information you store in each symbol-table entry. For example, you may want to store "kind" information for all symbols, and "type" information for global variables, parameters, and local variables, or you might want to store "type" information for all symbols, using "FnType" as a type rather than using kinds. For function names, the symbol-table entry will also need to include information about the number of parameters and their types (this could be accomplished by having a list of the symbol-table entries for the parameters). All of this information will be needed in order to implement the type checker. Therefore, you will need to modify the Sym class by adding some new fields and/or by declaring some subclasses. You will probably also want to add new methods that return the values of the new fields, and it may be helpful to change the toString method so that you can print the contents of a Sym for debugging purposes.

Modifying the `IdNode` Class

Two changes to the IdNode class are needed:

Adding a new Sym field (to link the node with the corresponding symbol-table entry), and
Changing the unparse method so that every instance of an ID has extra information (in parentheses) after its name. (The point of this is to help you to see whether your name analyzer is working correctly.) For names of functions, the information should be the "kind" of the ID (i.e., function). For names of global variables, parameters, and local variables, the information should be the ID's type (int or bool). For a global or local variable that is an array, the size of the array, inside square brackets, should also be printed. For example, given a program that contains:
```
       void f(int x) {
       }
       void g() {
         int x[10];
         int y;
         x[0] = y;
       }
       
```
The unparser should print:
```
       void f(function)(int x(int)) {
       }
       void g(function)() {
         int x(int[10])[10];
         int y(int);
         x(int[10])[0] = y(int);
       }
       
```

`P4.java`

The main program, P4.java, will be similar to P3.java, except that after parsing, if there are no syntax errors, it will call the name analyzer. After that, if there are no errors so far (either scanning, parsing, or name-analysis errors), it will call the unparser and then the type checker. (Calling the name analyzer and the type checker means calling the appropriate methods of the ASTnode that is the root of the tree built by the parser.)

Modifying the `Errors` Class

Your compiler should quit after the name analyzer has finished if any errors have been detected so far (either by the scanner or the name analyzer). To accomplish this, you can add a static boolean field to the Errors class that is initialized to false and is set to true if the fatal method is ever called (warnings should not change the value of this field). Your main program can check the value of this field and only call the unparser and the type-checker if it is false.

Updating the `Makefile`

You will need to update the Makefile you used for program 3 by adding new rules as necessary so that typing "make" creates P4.class.

Writing Test Inputs

You will need to write three input files to test your code: nameErrors.C should contain code with errors detected by the name analyzer, typeErrors.C should contain code with errors detected by the type checker, and test.C should contain code with no errors.

As usual, you will be graded in part on how thoroughly your input files test your code.

Some Advice

Here are few words of advice about various issues that come up in the assignment:

For this assignment you are free to make any changes you want to the code in ast.java. For example, you may find it helpful to make small changes to the class hierarchy, or to add new fields and/or methods to some classes.
The tree-traversal code you wrote to perform unparsing provides a good model for the traversals that you need to write to handle name analysis and type checking. However, you probably do not want to declare the name-analysis and type-checking methods to be abstract methods of class ASTnode (as we did for unparse). The reason is that this will require you to supply a method for all subclasses of ASTnode. This was OK for unparse, because we wanted to have a pretty-printing method for each node in the tree; however, for type checking (for example), you do not need to visit all parts of the tree, so not all node classes need to have a type-checking method.
However, you will need to declare the name-analysis and type-checking methods to be abstract methods of some of the classes that are lower down in the inheritance hierarchy; for example, you will need to declare an abstract name-analysis method for the DeclNode class, because the method for the DeclListNode class will call that method for each node in the list.
If you are working with a partner, you will have to decide how to divide up the work. One possibility is to have one person write the name analyzer and the other write the type checker. However, this has a number of disadvantages: you will be less likely to understand the part of the static-semantic analyzer that you don't write, you will both be modifying the same ASTnode subclasses (so it may be difficult to put your changed code together), and the work will probably be unequal.
Instead, you might want to divide up some of the "incidental tasks" (like modifying the Errors, Sym, and IdNode classes), then work together to get a small part of the name-analysis phase working (e.g., finding multiply declared global variables). Then you could split up the ASTnode subclasses, and each implement both the name-analysis and type-checking methods for your subset of those classes (you might want to start by choosing just a few each, until you have a better idea which ones will require the most work).
Don't forget to test your work as you go along, rather than waiting until everything is finished!

Announcements

Includes: Additions, Revisions, and FAQs (Frequently Asked Questions).
Please check here frequently.

11/9/2004 Program released.

Handin

What to turn in

See the assignments page for information about how to submit your code. The late policy is also found on the assignments page.

Electronically submit all of the files that are needed to create and run P4.class as well as your Makefile and your test programs (nameErrors.C, typeErrors.C, and test.C). Do not copy any ".class" files, and do not create any subdirectories in your handin directory.

If you are working with a partner only one of you should hand in files. Include a comment at the top of P4.java with the names of both partners.

General information on program grading criteria can be found on the Grading Criteria for Programs page.