In this page: Due date | Overview | Specifications | Handing in | Grading criteria
For this assignment you will write a name analyzer for base programs represented as abstract-syntax trees. Your main task will be to write name analysis methods for the nodes of the AST. In addition you will need to:
Sym
class (by including some new fields and methods
and/or by defining some subclasses).IdNode
class in ast.java
(by including a new
Sym
field and by modifying its unparse
method).P4.java
(an extension of P3.java
).ErrMsg
class.nameErrors.base
and test.base
to test your new code.The files are:
Sym.java
SymTable.java
DuplicateSymNameException.java
EmptySymTableException.java
ErrMsg.java
P4.java
base.cup
base.jlex
ast.java
Makefile
Here is p4.zip
file.
It is recommended to start your project with it.
The name analyzer will perform the following tasks:
Build symbol tables.
You will use the "list of hashtables" approach (using the
SymTable
class from program 1).
Find multiply declared names, uses of undeclared names,
bad tuple
accesses, and bad declarations.
Like C, the base language allows the same name to be declared in
non-overlapping or nested scopes.
The formal parameters of a function are considered to be in the same
scope as the function body.
All names must be declared before they are used.
A bad tuple
access happens when either the left-hand
side of the colon-access is not a name already declared to be of a
tuple
type or the right-hand side of the colon-access is
not the name of a field for the appropriate type of tuple
.
A bad declaration is a declaration of anything other than a function
to be of type void
as well as the declaration of a variable to be
of a bad tuple
type (the name of the tuple
type
doesn't exist or is not a tuple
type).
Add IdNode
links:
For each IdNode
in the abstract-syntax tree that represents a
use of a name (not a declaration) add a "link" to the corresponding
symbol-table entry.
(As stated above, you will need to modify the IdNode
class in
ast.java
to have a new field of type Sym
.
That is the field that your name analyzer will fill in with a link to the
Sym
returned by the symbol table's globalLookup
method.)
You must implement your name analyzer by writing appropriate methods for the
different subclasses of ASTnode
.
Exactly what methods you write is up to you (as long as they do name analysis as
specified).
It may help to start by writing the name analysis method for ProgramNode
,
then work "top down", adding a method for DeclListNode
(the child of a
ProgramNode
), then for each kind of DeclNode
(except
TupleDeclNode
), and so on (and then handle TupleDeclNode
and
perhaps other tuple
related nodes at the end).
Be sure to think about which nodes' methods need to add a new hashtable to the symbol
table (i.e., when is a new scope being entered) and which methods need to remove a
hashtable from the symbol table (i.e., when is a scope being exited).
Some of the methods will process the declarations in the program (checking for bad
declarations and checking whether the names are multiply declared, and if not,
adding appropriate symbol-table entries) and some will process the statements in the
program (checking that every name used in a statement has been declared and adding links).
Note that you should not add a link for an IdNode
that represents
a use of an undeclared name.
tuple
Handling Issues
Name analysis issues surrounding tuple
s come up in several situations:
Defining a tuple
type: for example
tuple Point { integer x. integer y. }.
When defining a tuple
, the name of the tuple
type
can't be a name that has already been declared.
The fields of a tuple
must be unique to that particular
tuple
; however, they can be a name that has been declared outside
of the tuple
definition.
For this reason, a recommended approach is to have a separate symbol table
associated with each tuple
definition and to store this symbol table
in the symbol for the name of the tuple
type.
Declaring a variable to be of a tuple
type: for example
tuple Point pt.
When declaring a variable of a tuple
type, in addition to
determining if the variable name has been previously declared (and issuing
a "multiply declared" error if it is), you should also check that the name
of the tuple
type has been previously declared and is actually
the name of a tuple
type.
Accessing the fields of a tuple
: for example
pt:x = 7.
When doing name analysis on something like LHS:RHS
,
you will need to check that LHS
is the name of a variable
that has previously been declared to be of a tuple
type and that
RHS
is the name of a field in the tuple
type
associated with LHS
.
Your name analyzer should find all of the errors described in the table given below;
it should report the specified position of the error, and it should give
exactly the specified error message (each message should appear on a single
line, rather than how it is formatted in the following table).
Error messages should have the same format as in the scanner and parser (i.e., they
should be issued using a call to ErrMsg.fatal
).
If a declaration is both "bad" (e.g., a non-function declared void
) and is
a declaration of a name that has already been declared in the same scope, you should
give two error messages (first the "bad" declaration error, then the
"multiply declared" error).
Type of Error | Error Message | Position to Report |
---|---|---|
More than one declaration of an identifier in a given scope
(note: includes identifier associated with a tuple definition) |
Multiply-declared identifier |
The first character of the ID in the duplicate declaration |
Use of an undeclared identifier | Undeclared identifier |
The first character of the undeclared identifier |
Bad tuple access (LHS of colon-access is not of a tuple type) |
Colon-access of non-tuple type |
The first character of the ID corresponding to the LHS of the colon-access. |
Bad tuple access (RHS of colon-access is not a field of the appropriate a tuple ) |
Invalid tuple field name |
The first character of the ID corresponding to the RHS of the colon-access. |
Bad declaration (variable or parameter of type void ) |
Non-function declared void |
The first character of the ID in the bad declaration. |
Bad declaration (attempt to declare variable of a bad tuple type) |
Invalid name of tuple type |
The first character of the ID corresponding to the tuple type in the bad declaration. |
Note that the names themselves should not be printed as part of the error messages.
During name analysis, if a function name is multiply declared you should
still process the formals and the body of the function;
don't add a new entry to the current symbol table for the function, but do add a new
hashtable to the front of the SymTable
's list for the names declared in
the body (i.e., the parameters and other local variables of the function).
If you find a bad variable declaration (a variable of type void
or of a
bad tuple
type), give an error message and add nothing to the symbol table.
tuple
SymTable
entry for the tuple
.
You do not have to process the variables of the tuple
in this case.
tuple
with the same name as a variable or a
function outside the tuple
is legal.
x
inside a tuple
with the same name as another
variable inside the tuple
is illegal.
In this case, create a SymTable
entry for the tuple
and
add all variables up to but excluding the second occurrence of x
and
then continue with the rest of the fields.
tuple
is used without declaration like a:b
,
then you can report two errors (undeclared identifier and colon-access of
non-tuple
type) or you can just report undeclared identifier.
tuple
is in a scope that is one level outside
the scope of the tuple
itself.
Thus, a tuple
and one of its fields can have the same name.
function
SymTable
entry in the outer scope for this
second occurrence.
You should process the formals and the local variables for both the functions.
SymTable
entry for the function.
However, continue processing the body of the function.
a
also has a variable declared
as a
, then create the SymTable
for the function and add
the formal parameter but not the local variable, report the error, and then continue with
processing.
SymTable
, add the first parameter/local variable,
report the error, and then continue with processing.
if/else/while
if
/else
and while
statements have their own scope.
So, names can be reused inside these statements.
if
part and the else
part have different scopes.
So, the same name can be declared in both of them.
Sym
Class
It is up to you how you store information in each symbol-table entry
(each Sym
).
To implement the changes to the unparser described below you will need to know each
name's type.
For function names, this includes the return type and the number of parameters and
their types.
You can modify the Sym
class by adding some new fields (e.g., a
kind
field) and/or by declaring some subclasses (e.g., a subclass for
functions that has extra fields for the return type and the list of parameter types).
You will probably also want to add new methods that return the values of the new
fields and it may be helpful to change the toString
method so that you can
print the contents of a Sym
for debugging purposes.
IdNode
ClassTwo changes to the IdNode
class are needed:
Adding a new field of type Sym
(to link the node with the
corresponding symbol-table entry), and
Changing the unparse
method so that every use of an ID has its
type (in angle brackets, i.e., < >
) after its name.
(The point of this is to help you to see whether your name analyzer is
working correctly; i.e., does it correctly match each use of a name to the
corresponding declaration and does it correctly set the link from the
IdNode
to the information in the symbol table.)
For names of functions, the information should be of the form:
param1Type, param2Type, ..., paramNType -> returnType
.
For names of global variables, parameters, and local variables of a
non-tuple
type, the information should be integer
or
logical
.
For a global or local variable that is of a tuple
type, the
information should be the name of the tuple
type.
For example, given a program that contains this code:
tuple Point { integer x. integer y. }. integer f{integer x, logical b} [ ] void g{} [ integer a. logical b. tuple Point p. p:x = a. b = a == 3. f(a + p:y*2, b). g(). ]
The unparser should print:
tuple Point { integer x. integer y. }. integer f{integer x, logical b} [ ] void g{} [ integer a. logical b. tuple Point p. p<Point>:x<integer> = a<integer>. b<logical> = (a<integer> == 3). f<integer,logical->integer>((a<integer> + (p<Point>:y<integer> * 2)), b<logical>). g< ->void>(). ]
The main program, P4.java
, will be similar to P3.java
except that
Calling the name analyzer means calling the appropriate method of the
ASTnode
that is the root of the tree built by the parser.
ErrMsg
Class
Your compiler should quit after the name analyzer has finished if any errors have
been detected so far (either by the scanner/parser or the name analyzer).
To accomplish this, you can add a static boolean field to the ErrMsg
class that is initialized to false
and is set to true
if the
fatal
method is ever called (warnings should not change the value of this
field).
Your main
program can check the value of this field and only call the
unparser if it is false
.
You will need to write two input files to test your code:
nameErrors.base
should contain code with errors detected by
the name analyzer.
This means that it should include bad and multiply declared names for all
of the different kinds of names, and in all of the different places that
declarations can appear.
It should also include uses of undeclared names in all kinds of statements
and expressions as well as bad tuple
accesses.
test.base
should contain code with no errors that exercises
all of the name-analysis methods that you wrote for the different AST nodes.
This means that it should include (good) declarations of all of the different
kinds of names in all of the places that names can be declared and it should
include (good) uses of names in all kinds of statements and expressions.
Note that your nameErrors.base
should cause error messages to be output,
so to know whether your name analyzer behaves correctly, you will need to know what
output to expect.
As usual, you will be graded in part on how thoroughly your input files test your code.
Here are few words of advice about various issues that come up in the assignment:
For this assignment you are free to make any changes you want to the
code in ast.java
.
The tree-traversal code you wrote to perform unparsing provides a good
model for the traversal that you need to write to handle name analysis.
However, you might not want to declare the name-analysis methods to be
abstract methods of class ASTnode
(as we did for unparse
).
This is because you will not need those methods for all nodes; e.g., you
probably won't want a name-analysis method for all of the sub-classes of the
TypeNode
class.
However, you will need to declare the name-analysis methods to be
abstract methods of some of the classes that are lower down in the
inheritance hierarchy; for example, you will need to declare an abstract
name-analysis method for the DeclNode
class, because the method for
the DeclListNode
class will call that method for each node in the list.
If you are working with a partner, you will have to decide how to divide
up the work.
You might want to divide up some of the "incidental tasks" (like modifying
the ErrMsg
, Sym
, and IdNode
classes), then work
together to get a small part of the name-analysis phase working (e.g.,
finding multiply declared global variables).
Then you could split up the ASTnode
subclasses and each implement
the name-analysis methods for your subset of those classes (you might want
to start by choosing just a few each, until you have a better idea which
ones will require the most work).
Don't forget to test your work as you go along, rather than waiting until everything is finished!
Please read the following handing in instructions carefully.
Turn in the following files to the appropriate assignment in Gradescope (note: these should be the only files changed/needed to run with the provided materials):
ast.java
ErrMsg.java
Sym.java
P4.java
nameErrors.base
test.base
Please ensure that you do not turn in any sub-directories or put your Java files in any packages.
If you are working in a pair, make sure both partners are indicated when submitting to Gradescope.
General information on program grading criteria can be found on the Assignments page.
For more advice on Java programming style, see these style and commenting standards (which are essentially identical to the standards used in CS200 / CS300 / CS400).