The parser ensures that the input program is syntactically correct, but there are other kinds of correctness that it cannot (or usually does not) enforce. For example:
The next phase of the compiler after the parser, sometimes called the static semantic analyzer is in charge of checking for these kinds of errors. The checks can be done in two phases, each of which involves traversing the abstract-syntax tree created by the parser:
The purpose of the symbol table is to keep track of names declared in the program. This includes names of classes, fields, methods, and variables. Each symbol table entry associates a set of attributes with one name; for example:
A language's scoping rules tell you when you're allowed to reuse a name, and how to match uses of a name to the corresponding declaration. In most languages, the same name can be declared multiple times under certain circumstances. In Java you can use the same name in more than one declaration if the declarations involve different kinds of names. For example, you can use the same name for a class, a field of the class, a method of the class, and a local variable of the method (this is not recommended, but it is legal):
class Test { int Test; void Test( ) { double Test; // could also be declared int } }
In Java (and C++), you can also use the same name for more than one method as long as the number and/or types of parameters are unique (this is called overloading).
In C and C++, but not in Java, you can declare variables with the same name in different blocks. A block is a piece of code inside curly braces; for example, in an if or a loop. The following is a legal C or C++ function, but not a legal Java method:
void f(int k) { int x = 0; /* x is declared here */ while (...) { int x = 1; /* another x is declared here */ ... if (...) { float x = 5.5; /* and yet another x is declared here! */ ... } } }As mentioned above, the scopinge rules of a language determine which declaration of a named object corresponds to each use. C, C++, and Java use what is called static scoping; that means that the correspondence between uses and declarations is made at compile time. C and C++ use the "most closely nested" rule to match nested declarations to their uses: a use of variable x matches the declaration in the most closely enclosing scope such that the declaration precedes the use. In C and C++, there is one, outermost scope that includes the names of the global variables (the variables that are declared outside the functions) and the names of the functions that are not part of any class. Each function has one or more scopes. Both C and C++ have one scope for the parameters and the "top-level" declarations, plus one for each block in the function (delimited by curly braces). In addition, C++ has a scope for each for loop: in C++ (but not in C) you can declare variables in the for-loop header.
In the example given above, the outermost scope includes just the name "f", and function f itself has three scopes:
Question 1: Consider the names declared in the following code. For each, determine whether it is legal according to the rules used in Java.
class animal { // methods void attack(int animal) { for (int animal=0; animal<10; animal++) { int attack; } } int attack(int x) { for (int attack=0; attack<10; attack++) { int animal; } } void animal() { } // fields double attack; int attack; int animal; }
Question 2: Consider the following C++ code. For each use of a name, determine which declaration it corresponds to (or whether it is a use of an undeclared name).
int k=10, x=20; void foo(int k) { int a = x; int x = k; int b = x; while (...) { int x; if (x == k) { int k, y; k = y = x; } if (x == k) { int x = y; } } }
Not all languages use static scoping. Lisp, APL, and Snobol use what is called dynamic scoping. A use of a variable that has no corresponding declaration in the same function corresponds to the declaration in the most-recently-called still active function. For example, consider the following code:
void main() { f1(); f2(); } void f1() { int x = 10; g(); } void f2() { String x = "hello"; f3(); g(); } void f3() { double x = 30.5; } void g() { print(x); }Under dynamic scoping this program outputs "10 hello". The first call to g comes from f1, whose copy of x has value 10. The next call to g comes from f2. Although f3 is called by f2 before it calls g, the call to f3 is not active when g is called; therefore, the use of x in g matches the declaration in f2, and "hello" is printed.
Assuming that dynamic scoping is used, what is output by the following program?
void main() { int x = 0; f1(); g(); f2(); } void f1() { int x = 10; g(); } void f2() { int x = 20; f1(); g(); } void g() { print(x); }
It is generally agreed that dynamic scoping is a bad idea; it can make a program very difficult to understand, because a single use of a variable can correspond to many different declarations (with different types)! The languages that use dynamic scoping are all old languages; recently designed languages all use static scoping.
Another issue that is handled differently by different languages is whether names can be used before they are defined. For example, in Java, a method or field name can be used before the definition appears, but this is not true for a variable:
class Test { void f() { val = 0; // field val has not yet been declared -- OK g(); // method g has not yet been declared -- OK x = 1; // variable x has not yet been declared -- ERROR! int x; } void g() {} int val; }
In what follows, we will assume that we are dealing with a language that:
In addition to the assumptions listed at the end of the previous section, we will assume that:
The idea behind this approach is that the symbol table consists of a list of hashtables, one for each currently visible scope. When processing a scope S, the structure of the symbol table is:
void f(int a, int b) { double x; while (...) { int x, y; ... } void g() { f(); }After processing the declarations inside the while loop, the symbol table looks like this:
Here are the operations that need to be performed on scope entry/exit, and to process a declaration/use:
There are several factors involved in the time required for each operation:
For all three questions below, assume that the symbol table is implemented using a list of hashtables.
Question 1: Recall that Java does not allow the same name to be used for a local variable of a method, and for another local variable declared inside a nested scope in the method body. Even with this restriction, it is not a good idea to put all of a method's local variables (whether they are declared at the beginning of the method, or in some nested scope within the method body) in the same table. Why not?
Question 2: C++ does not use exactly the scoping rules that we have been assuming. In particular, C++ does allow a function to have both a parameter and a local variable with the same name (and any uses of the name refer to the local variable).
Consider the following code. Draw the symbol table as it would be after processing the declarations in the body of f under:
void g(int x, int a) { } void f(int x, int y, int z) { int a, b, x; ... }
Question 3: Assume that a symbol-table entry includes the "kind" of the declared name as well as the other attributes assumed above (if the same name is declared as two different "kinds" in one scope, there would be one entry with a list of "kinds"). Also assume that we can tell (from context), for each use of a name, what "kind" of name it is supposed to be.
Which of the four operations (scope entry, process a declaration, process a use, scope exit) described above would change (and how would it change) if Java rules for name reuse were used instead of C++ rules (i.e., if the same name can be used within one scope as long as the uses are for different kinds of names, and if the same name cannot be used for more than one variable declaration in nested scopes)?
The idea behind this approach is that when processing a scope S, the structure of the symbol table is:
There is just one big hashtable, containing an entry for each variable for which there is some declaration in scope S or in a scope that encloses S. Associated with each variable is a list of symbol-table entries. The first list item corresponds to the most closely enclosing declaration; the other list items correspond to declarations in enclosing scopes.
For example, given this code:
void f(int a) { double x; while (...) { int x, y; ... } void g() { f(); }After processing the declarations inside the while loop, the symbol table looks like this:
Note that the level-number attribute stored in each list item enables us to determine whether the most closely enclosing declaration was made in the current scope or in an enclosing scope.
Here are the operations that need to be performed on scope entry/exit, and to process a declaration/use:
Assume that the symbol table is implemented using a hashtable of lists. Draw pictures to show how the symbol table changes as the declarations in each scope in the following code is processed.
void g(int x, int a) { double d; while (...) { int d, w; double x, b; if (...) { int a,b,c; } } while (...) { int x,y,z; } }
As mentioned in the Introduction, the job of the type-checking phase is to:
List as many of the operators that can be used in a Java program as you can think of (don't forget to think about the logical and relational operators as well as the arithmetic ones). For each operator, say what types the operands may have, and what is the type of the result.
In addition to finding type errors caused by operators being applied to operands of the wrong type, the type checker must also find type errors having to do with expressions that, because of their context must be boolean, and type errors having to do with method calls. Examples of the first kind of error include: