CS 536: Program 4
Name Analysis and Type Checking

```
Assigned: Monday, March 26, 2001.  
```

Due: Monday, April 9,  7pm.  Not accepted after 7pm on Thursday, April 12.

Overview

We are finally in the backend, one step closer to generating the real code! Your task now is to write a semantic checker.

Who needs a semantic checker? Programmers who make mistakes! The errors that the semantic analyzer can find include forgetting to declare a variable, or forgetting what the types of your class fields are. Note that all programmers make mistakes. Typically, many mistakes. So, all programmers need a semantic checker. You, too. The semantic checker will find your errors before they show up as nasty bugs in your programs.

The code generator needs the semantic checker, too. It needs it because it does not know how to machine code for badly formed intermediate code. E.g., a code generator typically does not how to generate MIPS instructions that compare a floating-point number with a String.

For this assignment you will do the following:

Write a two-pass static-semantic analyzer for Simple programs represented as abstract-syntax trees.
Write a symbol table. You should simply modify the symbol table entry from program 1 (by including some new fields and methods into the symbol-table entry class).
Modify the IdNode class in ast.java (by including a new field, and by modifying its decompile method).
Write a new main program, P4.java (an extension of P3.java).
Modify the Errors class.
Update the Makefile used for program 3 to include any new commands needed for program 4.
Write three test inputs: nameErrors.sim, typeErrors.sim, and test.sim to test your new code.

The two passes of the static-semantic analyzer that you will write are the (1) name analyzer and the (2) type checker. The name analyzer will perform the following tasks (the semantic rules of Simple are in lecture 17, slide 4!):

Build symbol tables. You can either use the "list of hashtables" approach, or (since Simple programs only include one class, and since local variables can only be declared at the beginning of a method), you can simply use one symbol table for the class name, one symbol table for all field and method names, and, for each method, one local table for the method's parameters and local variables.
Find multiply declared names and uses of undeclared names. Unlike Java, the Simple language does not allow the same name to be declared more than once per scope, even if the names are of different "kinds", and it requires that all names be declared before they are used.
Update the IdNodes of the abstract-syntax tree to include pointers to the corresponding symbol-table entries (i.e., to objects of type Sym).

The type checker will determine the type of every expression represented in the abstract-syntax tree, and will use this information to identify type errors, as described below.

The main program, P4.java, will be similar to P3.java, except that after parsing, if there are no syntax errors, it will call the name analyzer before calling the decompiler. After calling decompiler, if there are no errors so far, it will then call the type checker. (Calling the name analyzer and the type checker means calling the appropriate methods of the ASTnode that is the root of the tree built by the parser.)

The decompiler will be slightly different from the version written for program 3: when decompiling an IdNode, the type of the node will be printed (in parentheses) after the name. This will help you to see whether your name analyzer is working correctly.

You will also write three input files to test your code: nameErrors.sim should contain code with errors detected by the name analyzer, typeErrors.sim should contain code with errors detected by the type checker, and test.sim should contain code with no errors.

You will be graded on the correctness of your code, and on how thoroughly your input files test your code.

Changes to P1 to P3.

In order to simplify the code generation in P5, I decided to make a small change to the front end. We will remove the keyword float and replace it with a keyword boolean. You will have to make a minor change to your scanner, parser, and the AST. My apologies for the late change. The files that you download from the class Web site should have all the necessary changes.

Files

Code for you to use can be found in ~cs536-1/public/PROG4. The files are:

simple.cup: Use this code if there were problems with your own version. You should not have to modify this file.
ast.java: Use this code if there were problems with your own version. (You will need to add to this file or to your own version.)
Types.java: The Types class can be used by your name analyzer and your type checker (see below).
SymbolTable.java: The symbol table from P1. (You will need to add to this file or to your own version. You may have to modify this symbol table class to suit the needs of a compiler.)

Extending Class Sym

A symbol-table entry will need to contain the type of each symbol as well as its name (see below for some advice on how to represent types). For method names, the symbol-table entry will also need to include information about the number of parameters and their types (this could be accomplished by having a list of the symbol-table entries for the parameters). All of this information will be needed in order to implement the type checker. Therefore, you will need to modify the Sym class by adding some new fields. You may also want to add a new method (a ToString method, or a print method, or a method that returns the Sym's type) to be used when decompiling an identifier.

Name Analysis

You must implement your name analyzer by writing appropriate methods for the different subclasses of ASTnode. Some of the methods will process the declarations in the program (checking whether the names are multiply declared, and if not, adding appropriate symbol-table entries), and some will process the statements in the program (checking that every name used in a statement has been declared). The IdNode class will need one of each kind of method (since an ID can appear both in a declaration and in a statement). Those methods will need to update the IdNode by adding a link to the appropriate symbol-table entry. This means that you will need to add a field of type "Sym" to the IdNode class.

Multiply Declared and Undeclared Identifiers

The name analyzer is responsible for finding all instances of multiply declared identifiers (identifiers declared more than once in the same scope). This includes:

Two fields, two methods, or a field and a method with the same name.
Two formal parameters of the same method with the same name.
A formal parameter and a local variable of a method with the same name.
Two local variables of a method with the same name.

An error message (see below) should be given for each multiply declared identifier, and all but the first declaration should be ignored (i.e., do not update the symbol table when you are processing an identifier that is multiply declared, and do not bother to link that IdNode to any symbol-table entry).

The name analyzer is also responsible for finding all instances of uses of undeclared identifiers. Note that this includes calls to methods that have not yet been declared and uses of fields that have not yet been declared. An error message (see below) should be given for each use of an undeclared identifier. (It is up to you whether you treat a use of the class name as an undeclared identifier during name analysis, or instead treat it as a type error during type checking. If you choose the latter, you will need one more error in addition to those described below: "Equality operator applied to a class".)

In addition, the name analyzer is responsible for checking that the program includes a method named main. If the program does not, an error message (see below) should be given.

In summary, your name analyzer should find all of the errors described in the following table; it must report the specified position of the error, and it must give exactly the specified error message (each message should appear on a single line, rather than how it is formatted in the following table). Error messages should have the same format as in the scanner and parser (i.e., they should be issued using a call to Errors.fatal).

Type of Error	Error Message	Position to Report
More than one declaration of an identifier in a given scope	Multiply declared identifier	The first character of the ID in the duplicate declaration
Use of an undeclared identifier	Undeclared identifier	The first character of the undeclared identifier
No main method in the program	No main method declared	0:0

Changes to the Errors Class

Your compiler should quit after decompiling the program if any errors have been detected so far (either by the scanner, the parser, or the name analyzer). To accomplish this, you can add a static boolean field to the Errors class that is initialized to false and is set to true if the fatal method is ever called (warnings should not change the value of this field). Your main program can check the value of this field (by calling a new method of the Errors class that returns the field's value) and only call the type-checker if it is false.

Type Checking

You must implement your type checker by writing appropriate member methods for the different subclasses of ASTnode. Your type checker should find all of the type errors described in the following table; it must report the specified position of the error, and it must give exactly the specified error message. (Each message should appear on a single line, rather than how it is formatted in the following table.) Note: if your name analyzer does not give an "Undeclared identifier" error message for uses of the class name, then your type checker should give the appropriate error message for any use of the class name.

Type of Error	Error Message	Position to Report
Assigning to a method; e.g., "f = 3;", where f is a method name.	Assignment to a method	The first character of the method name.
Assigning from a method; e.g., "x = f;", where f is a method name.	Assignment from a method	The first character of the method name.
Writing a method; e.g., "System.out.println(f)", where f is a method name.	Attempt to write a method	The first character of the method name.
Equality or inequality operator applied to a method; e.g., "f == exp" or "exp != g", where f and g are method names.	Equality operator applied to a method	The first character of the method name.
Calling something other than a method; e.g.: "x();", where x is not a method name. Note: In this case, you should not type-check the actual parameters.	Attempt to call a non-method	The first character of the variable name.
Calling a method with the wrong number of arguments. Note: In this case, you should not type-check the actual parameters.	Method call with wrong number of args	The first character of the method name.
Calling a method with an argument of the wrong type. Note: you should only check for this error if the number of arguments is correct; if there are several arguments with the wrong type, you must give an error message for each such argument.	Type of actual does not match type of formal	The first character of the first identifier or literal in the actual expression with the wrong type.
Applying an arithmetic operator (+, -, , /) to an operand with type other than int*.	Arithmetic operator applied to non-numeric operand	The first character of the first identifier or literal in an operand that is an expression of the wrong type.
Applying a relational operator (<, >, <=, >=) to an operand with type other than int.	Relational operator applied to non-numeric operand	The first character of the first identifier or literal in an operand that is an expression of the wrong type.
Applying a logical operator (!, &&, \|\|) to an operand with type other than boolean.	Logical operator applied to non-bool operand	The first character of the first identifier or literal in an operand that is an expression of the wrong type.
Using a non-boolean expression as the condition of an if.	Non-bool expression used as an if condition	The first character of the first identifier or literal in the condition.
Using a non-boolean expression as the condition of a while.	Non-bool expression used as a while condition	The first character of the first identifier or literal in the condition.
Mixing modes: applying an equality operator (==, !=) to operands of two different, non-method types (e.g., "j == true", where j is of type int); assigning a value of one, non-method type to a variable of another, non-method type (e.g., "j = true", where j is of type int).	Type mismatch	The first character of the first identifier or literal in the left-hand expression.

The Types Class

You will need a way to represent the different types of the identifiers and expressions found in a program. It is up to you how you do this, but one possibility is to use the values in the class defined in Types.java. For example, you would use int as the type of the new field added to each symbol-table entry, and also as the type returned by the type-check methods of ExpNodes. You would use: Types.IntType as the value that represents the int type.

Note that the purpose of Types.ErrorType is to prevent a single type error in an expression or statement from triggering multiple error messages. For example, each of the following expressions should cause only one error message:

(true + 3) * 4

true && (false || 3)

The type given to (true + 3) should be ErrorType, and the type-check method for the multiplication node should not report "Arithmetic operator applied to non-numeric operand" for the first operand. But note that the following should cause two error messages (one for each of the non-integer operands of the plus operator):

true + "hello"

To provide some help with this issue, here is an example input file, along with the corresponding error messsages. (Note: This is not meant to a complete test of the type checker; it is provided merely to help you understand some of the messages you need to report.)

Some Advice

Here are few words of advice about various issues that come up in the assignment:

For this assignment you are free to make any changes you want to the code in ast.java. For example, you may find it helpful to make small changes to the class hierarchy, or to add new fields and/or methods to some classes.
The tree-traversal code you wrote to perform decompiling provides a good model for the traversals that you need to write to handle name analysis and type checking. However, you probably do not want to declare the name-analysis and type-checking methods to be abstract methods of class ASTnode (as we did for decompile). The reason is that this will require you to supply a method for all subclasses of ASTnode. This was OK for decompile, because we wanted to have a pretty-printing method for each node in the tree; however, for type checking (for example), the type-checking traversal does not need to visit all parts of the tree, so not all node classes need furnish a type-checking method.
However, you will need to declare the name-analysis and type-checking methods to be abstract methods of some of the classes that are lower down in the inheritance hierarchy; for example, you will need to declare an abstract symbol-table-building method for the DeclNode class, because the method for the DeclListNode class will call that method for each node in the list.
You may find it useful to write a symbol-table's print method that will print out all entries including their. Note that such a print method may already be available in the standard Java API. Include a call to the symbol table print method in your debugging code: e.g., print each symbol table after it is created (i.e., add a public static boolean DEBUG field to the ASTnode class, and include code in the appropriate name-analysis methods to print symbol tables if DEBUG is true).
If you are working with a partner, you will have to decide how to divide up the work. One possibility is to have one person write the name analyzer and the other write the type checker. However, this has a number of disadvantages: you will be less likely to understand the part of the static-semantic analyzer that you don't write, you will both be modifying the same ASTnode subclasses (so it may be difficult to put your changed code together), and the work will probably be unequal (the type checker will probably be harder to implement than the name analyzer).
Instead, you might want to divide up some of the "incidental tasks" (like modifying the Errors, Sym, and IdNode classes), then work together to get a small part of the name-analysis phase working (e.g., finding multiply declared field names). Then you could split up the ASTnode subclasses, and each implement both the name-analysis and type-checking methods for your subset of those classes (you might want to start by choosing just a few each, until you have a better idea which ones will require the most work).
Don't forget to test your work as you go along, rather than waiting until everything is finished!

What To Turn In

To hand in your program, copy all of the files that are needed to create and run P4.class, (including your Makefile) as well as your test programs (nameErrors.sim, typeErrors.sim, and test.sim) to

  ~cs536-1/public/P4/<your login name>

using your actual login name in place of <your login name>. Please do not create any subdirectories in your handin directory, and do not copy any .class files.

If you are working with a partner only one of you should hand in files. Include a comment at the top of P4.java with the names of both partners.

CS 536: Program 4Name Analysis and Type Checking

Contents