CS 536 Homework 3

Due date: Friday, December 10 by 4 pm
Not accepted late

Homework may be turned in the following ways:
    - hand it in in-class on Thursday, December 9
    - drop it off at Rebecca Hasti's office (5375 CS); slide it under the door if she's not there
    - drop it off at John O'Malley's office (5351 CS); there will be a box labeled "536 homework"
In order to ensure that your homework is received and not tampered with, do not use the CS department mailboxes on the 5th floor.

Question 1 | Question 2 | Question 3 | Question 4


Question 1

C++ allows you to define new names for existing types using typedefs. For example:

typedef double dollars;
defines dollars to be the same as double. Given the typedef above, the following is now legal:
dollars salary;
dollars f(int k, dollars d) {
    typedef dollars moreDollars;
    moreDollars md = d*2.5;
    return md;
}
and is the same as:
double salary;
double f(int k, double d) {
    typedef double moreDollars;
    double md = d*2.5;
    return md;
}
Note that a typedef can occur anywhere that a variable declaration (local or global) can occur.

Part 1

Assume that the following productions have been added to the grammar for the C-- language:

decl -> typedef
typedef -> TYPEDEF type ID SEMICOLON
What other productions need to be changed and/or added to the C-- grammar to allow typedefs?

Part 2

Now consider the name-analysis phase of the compiler. Note that, in addition to the usual errors for multiply defined names and for uses of undefined names, the name analyzer must enforce the following rules:

Answer each of the following questions:
  1. What information should be stored with each name in the symbol table?

  2. What should be done to process a typedef: typedef T xxx;?

  3. What should be done to process a declaration of a variable, function, or parameter named xxx and declared to be of type T?

  4. What should be done to process the use of a name xxx in a statement?
Illustrate your answer by showing the entries that would be in the symbol table after processing the following declarations:
typedef double dollars;
dollars salary;
typedef dollars moreDollars;
moreDollars md;
double d;

Part 3

Now consider the type-checking phase of the compiler. What if any changes need to be made to this phase given that typedefs have been added to the laguage?


Question 2

Recall that some languages allow nested functions, and allow a nested function to access the variables that are declared in enclosing functions. Recall also that there are two approaches to implementing run-time access to non-local variables in that case: using access links, or using a display.

Below are links to two files that contain partially filled-in tables (the two files are the same -- one is pdf and one is postscript). You are to print one of the files, then complete the tables so that each row of each table shows three consistent pictures (one in each column). The picture in the first column shows the nesting structure of a program with its call statements (a different program for each row). The picture in the second column shows the stack at a moment during the execution of that program, assuming that access links are used (only the access-link fields of the Activation Records are shown). The picture in the third column shows the stack at the same moment, assuming that a display is used (only the save-display fields of the Activation Records are shown).

The first row has been completed as an example.

Be sure to look carefully at the labels on the activation records; different function names are used for the examples on the second page.

Link to pdf file for question 2
Link to postscript file for question 2


Question 3

Assume that parameters in the C-- language can be passed by value, by reference, by value-result, or by name. Consider the following C-- program:

int A[4];
int k;

void f( int x, int y ) {
  x = x + 1;
  k = k + 1;
  A[k] = A[k] * 2;
  y = y + 1;
  cout << x;
  cout << y;
  cout << k;
  cout << A[k];
}

void main() {
  k = 0;
  while (k < 4) {
     A[k] = k;
     k = k + 1;
  }
  k = 0;
  f(k, A[k]);
  cout << k;
  cout << A[0];
  cout << A[1];
  cout << A[2];
  cout << A[3];
}

Below are links to two files that contain six pictures each (the two files are the same -- one is pdf and one is postscript). The first four pictures contain outlines of f's activation record, as well as the space in the static data area for globals k and A. The last two pictures contain space for recording the output of the program.

You should print one of the files, write your name at the top, and complete the pictures:

Link to pdf file for question 3
Link to postscript file for question 3


Question 4

For your compiler project, you will generate code for expressions as discussed in class: the codeGen method for each kind of expression will generate code to evaluate the expression, leaving the value on the stack.

While this kind of code is easy to generate, storing intermediate values on the stack rather than in registers is inefficient. You might think we could instead require that the codeGen method for an expression should work as follows:

  1. Each expression node's codeGen method would have 1 parameter: a register number N, which would be either 0 or 1.
  2. For literals or identifiers, the codeGen method would simply load the appropriate value in register N.
  3. For expressions involving non short-circuited binary operators (+, *, <, etc) the codeGen method would call the codeGen method of the left child with argument 0 (which would generate code to evaluate the left expression, leaving the result in register 0), then call the codeGen method of the right child with argument 1 (which would generate code to evaluate the right expression, leaving the result in register 1), then perform the operation, leaving the result in the appropriate register.
Unfortunately, this approach does not always work.

Part 1

Assume that the AST includes only expressions that involve non short-circuited binary operators, with literals or identifiers at the leaves (no unary operators, no array-index expressions or function calls as operands). Also assume that in the generated code, all operands must be in registers (i.e., neither operand of a binary operator can be in a memory location, nor can it be a literal value).

Show that the approach described above does not always work by giving two examples:

  1. An expression that cannot be evaluated using just two registers (without storing intermediate results on the stack), but can be evaluated using three registers.

  2. An expression that cannot be evaluated using just three registers (without storing intermediate results on the stack), but can be evaluated using four registers.
For each example, give the expression, the abstract-syntax tree, and a pseudo-code version of the code to evaluate the expression (code that works, not the erroneous code that would be generated using the approach described above). For example, here is an expression that can be evaluated using just two registers, its AST, and its pseudo-code (use the same kind of pseudo-code in your answer):
expression AST pseudo code
a - b
   -
  / \
 a   b
    
    load a into T0
    load b into T1
    T0 = T0 - T1
    
Note that an expression's operands need not be evaluated left-to-right. For example, the following pseudo-code would also be OK for the expression a - b:
load b into T1    // evaluate the right operand first
load a into T0
T0 = T0 - T1

Part 2

Use the same assumptions about the AST and the generated code as for Part 1, including the fact that an expression's operands can be evaluated either left-to-right or right-to-left. Also assume that all operators are non-commutative and non-associative (i.e., the expression a op b is not equivalent to the expression b op a, and the expression (a op b) op c is not equivalent to the expression a op (b op c)).

Note that generating code to evaluate a leaf node requires one register (into which the value of the identifier or literal is loaded).

You are to give an algorithm that works as follows:

For example, for the expression a - b, the input to the algorithm could be the root node of the expression tree, with N1 = N2 = 1 (because each operand is a leaf, and thus requires one register). The output of the algorithm for this example should be 2 (because the whole expression can be evaluated using two registers, as illustrated above in Part 1, but it cannot be evaluated using just one register).