Symbol Tables and Static Checks
The parser ensures that the input program is syntactically correct,
but there are other kinds of correctness that it cannot (or usually
does not) enforce.
For example:
The next phase of the compiler after the parser, sometimes called the
static semantic analyzer is in charge of checking for these kinds of errors.
The checks can be done in two phases, each of which involves traversing
the abstract-syntax tree created by the parser:
The purpose of the symbol table is to keep track of names declared in the
program.
This includes names of classes, fields, methods, and variables.
Each symbol table entry associates a set of attributes with one name;
for example:
In most languages, the same name can be declared multiple times if
the declarations occur in different scopes, and/or involve
different kinds of names.
For example, in Java you can use the same name for a class,
a field of the class, a method of the class, and a local variable
of the method (this is not recommended, but it is legal):
In both Java and C++ (but not in Pascal or C), you can use the same
name for more than one method as long as the number and/or types of
parameters are unique.
In Java you cannot declare a variable x in a method if there is
also a parameter named x, or another variable named x declared in an
enclosing block or for loop.
However, such declarations are allowed in C++.
For example, the following is a legal C++ function, but not a legal
Java method:
In the example given above, the outermost scope includes just the name "f",
and function f itself has three (nested) scopes:
Question 1:
Consider the names declared in the following code.
For each, determine whether it is legal according to the rules used in Java.
Question 2:
Consider the following C++ code.
For each use of a name, determine which declaration it corresponds to (or
whether it is a use of an undeclared name).
Not all languages use static scoping.
Lisp, APL, and Snobol use what is called dynamic scoping.
A use of a variable that has no corresponding declaration in the same function
corresponds to the declaration in the most-recently-called
still active function.
For example, consider the following code:
Assuming that dynamic scoping is used, what is output by the following
program?
It is generally agreed that dynamic scoping is a bad idea;
it can make a program very difficult to understand, because a single use
of a variable can correspond to many different declarations
(with different types)!
The languages that use dynamic scoping are all old languages;
recently designed languages all use static scoping.
Another issue that is handled differently by different languages
is whether names can be used before they are defined.
For example, in Java, a method or field name can be used
before the definition appears, but this is not true for
a variable:
In what follows, we will assume that we are dealing with a language that:
In addition to the assumptions listed at the end of the previous section,
we will assume that:
The idea behind this approach is that the symbol table consists of a list
of hashtables, one for each currently visible scope.
When processing a scope S, the structure of the symbol table is:
Here are the operations that need to be performed on scope entry/exit,
and to process a declaration/use:
Here are the times required for each operation:
For all three questions below, assume that the symbol table is
implemented using a list of hashtables.
Question 1:
Recall that Java does not allow the same name to be used for a local
variable of a method, and for another local variable declared inside a
nested scope in the method body.
Even with this restriction, it is not a good idea to put all of
a method's local variables (whether they are declared at the beginning of the
method, or in some nested scope within the method body) in the
same table. Why not?
Question 2:
C++ does not use exactly the scoping rules that we have been assuming.
In particular,
C++ does allow a function to have both a parameter and a
local variable with the same name (and any uses of the name
refer to the local variable).
Consider the following code.
Draw the symbol table as it would be after processing the declarations
in the body of f under:
Question 3:
Which of the four operations (scope entry, process a declaration,
process a use, scope exit) described above would change (and how would
it change) if Java
rules for name reuse were used instead of C++ rules (i.e., if
the same name can be used within one scope as long as the uses
are for different kinds of names, and if the same name cannot
be used for more than one variable declaration in nested scopes)?
The idea behind this approach is that when processing a scope S, the
structure of the symbol table is:
For example, given this code:
Note that the level-number attribute stored in each list item enables
us to determine whether the most closely enclosing declaration was made
in the current scope or in an enclosing scope.
Here are the operations that need to be performed on scope entry/exit,
and to process a declaration/use:
Assume that the symbol table is implemented using a hashtable of lists.
Draw pictures to show how the symbol table changes as each declaration
in the following code is processed.
As mentioned in the Introduction, the job of the type-checking phase
is to:
List as many of the operators that can be used in a Java program as you can
think of
(don't forget to think about the logical and relational operators as well
as the arithmetic ones).
For each operator, say what types the operands may have, and what is the type
of the result.
In addition to finding type errors caused by operators being applied
to operands of the wrong type, the type checker must also find type errors
having to do with expressions that, because of their context must be
boolean, and type errors having to do with method calls.
Examples of the first kind of error include:
Introduction
Below, we will consider how to build symbol tables and how to use them to
find multiply-declared and undeclared variables.
We will then consider type checking.
Symbol Tables
One factor that will influence the design of the symbol table is
what scoping rules are defined for the language being compiled.
Let's consider some different kinds of scoping rules before continuing
our discussion of symbol tables.
Scoping
class Test {
int Test;
void Test( ) {
double Test; // could also be declared int
}
}
void f( int k ) { // k is a parameter
int k = 0; // also a local variable
while (...) {
int k = 1; // and another local variable, inside the loop
...
}
}
In general, the scope rules of a language determine which declaration of
a named object corresponds to each use.
C++ and Java use what is called static scoping;
that means that the correspondence between uses and declarations is
made at compile time.
C++ uses the "most closely nested" rule to match nested declarations
to their uses: a use of variable x matches the declaration in the
most closely enclosing scope such that the declaration precedes the use.
In C++, there is one, outermost scope that includes all function names
and the names of the global variables (the variables that are declared
outside the functions).
Each function has two or more scopes:
one for the parameters, one for the function body, and possibly additional
scopes for each for loop and each nested block (delimited by curly
braces) in the function.
So a use of variable k inside the while loop matches the declaration
in the loop (has the value 1), while a use of k outside the loop
(either before or after the loop) matches the declaration at the
beginning of the function (has the value 0).
class animal {
// methods
void attack(int animal) {
for (int animal=0; animal<10; animal++) {
int attack;
}
}
int attack(int x) {
for (int attack=0; attack<10; attack++) {
int animal;
}
}
void animal() { }
// fields
double attack;
int attack;
int animal;
}
int k=10, x=20;
void foo(int k) {
int a = x;
int x = k;
int b = x;
while (...) {
int x;
if (x == k) {
int k, y;
k = y = x;
}
if (x == k) {
int x = y;
}
}
}
void main() {
f1();
f2();
}
void f1() {
int x = 10;
g();
}
void f2() {
String x = "hello";
f3();
g();
}
void f3() {
double x = 30.5;
}
void g() {
print(x);
}
Under dynamic scoping this program outputs "10 hello".
The first call to g comes from f1, whose copy of x has value 10.
The next call to g comes from f2.
Although f3 is called by f2 before it calls g, the call to f3 is not
active when g is called;
therefore, the use of x in g matches the declaration in f2, and "hello"
is printed.
void main() {
int x = 0;
f1();
g();
f2();
}
void f1() {
int x = 10;
g();
}
void f2() {
int x = 20;
f1();
g();
}
void g() {
print(x);
}
class Test {
void f() {
val = 0; // field val has not yet been declared -- OK
g(); // method g has not yet been declared -- OK
x = 1; // variable x has not yet been declared -- ERROR!
int x;
}
void g() {}
int val;
}
Symbol Table Implementations
Given these assumptions, the symbol-table operations we will need are:
We will look at two ways to design a symbol table: a list of tables,
and a table of lists.
For each approach, we will consider what must be done when entering and
exiting a scope, when processing a declaration, and when processing a use.
To keep things simple, we will assume that each symbol-table entry includes
only:
Method 1: List of Hashtables
front of list end of list
_ _ _
|_|--->|_|--->|_|
| _________
| |
declarations |
made in S declarations made in scopes that enclose S; each hashtable
in the list corresponds to one scope (i.e., contains all
of the declarations for that scope)
For example, given this code:
void f(int a, int b) {
double x;
while (...) {
int x, y;
...
}
void g() {
f();
}
After processing the declarations inside the while loop, the symbol table
looks like this:
+-----------+ +--------------+ +---------------------------+
| x: int, 3 |--->| a: int, 2 |--->| f: (int x int) -> void, 1 |
| y: int, 3 | | b: int, 2 | +---------------------------+
+-----------+ | x: double, 2 |
+--------------+
The declaration of method g has not yet been processed, so it has no
symbol-table entry yet.
Note that because f is a method, its type includes the types of its
parameters (int x int), and its return type (void).
Remember that method names need to go into the hashtable for the outermost
scope (not into the same table as the method's variables).
For example, in the picture above, method name f
is in the symbol table for the outermost scope; name f is not
in the same scope as parameters a and b, and variable x.
This is so that when the use of name f in method g is processed, the name
is found in an enclosing scope's table.
void g(int x, int a) { }
void f(int x, int y, int z) {
int a, b, x;
...
}
Method 2: Hashtable of Lists
+-----------------------------+
| _ _ _ |
| x: |_|--->|_|--->|_| |
| _ |
| y: |_| |
| _ _ |
| z: |_|--->|_| |
| |
+-----------------------------+
There is just one big hashtable, containing an entry for each variable for
which there is some declaration in scope S or in a scope that encloses S.
Associated with each variable is a list of symbol-table entries.
The first list item corresponds to the most closely enclosing declaration;
the other list items correspond to declarations in enclosing scopes.
void f(int a) {
double x;
while (...) {
int x, y;
...
}
void g() {
f();
}
After processing the declarations inside the while loop, the symbol table
looks like this:
+-------------------------------------+
| +----------------+ |
| f: | int -> void, 1 | |
| +----------------+ |
| |
| +--------+ |
| a: | int, 2 | |
| +--------+ |
| |
| +--------+ +----------+ |
| x: | int, 3 |--->| double, 2| |
| +--------+ +----------+ |
| |
| +--------+ |
| y: | int, 3 | |
| +--------+ |
| |
+-------------------------------------+
The required times for each operation are:
void g(int x, int a) {
double d;
while (...) {
int d, w;
double x, b;
if (...) {
int a,b,c;
}
}
while (...) {
int x,y,z;
}
}
Type Checking
The type rules of a language define how to determine expression
types, and what is considered to be an error.
The type rules specify, for every operator (including assignment), what
types the operands can have, and what is the type of the result.
For example, both C++ and Java allow the addition of an int and a double,
and the result is of type double.
However, while C++ also allows a value of type double to be assigned to a
variable of type int, Java considers that an error.
and examples of the second kind of error include: