Code Generation

Overview

Spim

Auxiliary Fields and Methods

Code Generation for Global Variable Declarations

Code Generation for Functions

Function Preamble
Function Entry
Function Body
Function Exit

Code Generation for Statements

Labels
Write Statement
If-Then Statement
- Test Yourself #1
Return Statement
Digression: Code Generation for Names
Assignment Statement

Code Generation for IdNodes

Code Generation for Expressions

Literals
Function Call
Non Short-Circuited Operators
Short-Circuited Operators
- Test Yourself #2

Control-Flow Code

Test Yourself #3
Test Yourself #4

Overview

Code can be generated by a syntax-directed translation while parsing or by traversing the abstract syntax tree after the parse (i.e., by writing a codeGen method for the appropriate kinds of AST nodes). We will assume the latter approach, and will discuss code generation for the C-- language (however, to simplify the discussion, we will assume that the language includes only scalar variables; the treatment of arrays is left as an exercise for you to figure out). In particular, we will discuss generating MIPS assembly code suitable for input to the Spim interpreter. Some information on Spim is provided in the next section; the following sections discuss code generation for:

global variables
functions (entry and exit)
statements
expressions

Spim

Documentation on Spim is available on-line:

To run the (plain) Spim interpreter, type:

spim -file <name>

where <name> is the name of a file that contains MIPS assembly code (the file produced by your compiler). This will cause Spim to process the code in the file; if there are syntax errors, they will be reported, and the code will not execute. Otherwise, the code will execute; output will be printed to your terminal, and you will get error messages for any run-time errors that result.

To run Spim with an X-windows interface (on a Unix or Linux machine) , type just: xspim. This will cause a window to open. Click on the load button in that window, then type in the name of your assembly-code file, then press return. If there are syntax errors, you will see error messages. If there are no errors, you can run your program by clicking on run, then (in the small window that will be opened) on OK. If your program generates any output, a new window will be opened to display that output.

Spim uses the following special registers (there are others; see the on-line Spim documentation for those):

Register	Purpose
$sp	stack pointer
$fp	frame pointer
$ra	return address
$v0, $a0	used for output and to return values from functions
$t0 - $t7	temporaries

Auxiliary Constants and Methods

To simplify the task of code generation, it is convenient to have a set of constants (final static fields) that define the string representations of the registers that will be used in the generated code and the values used to represent true and false, as well as a set of methods for actually writing the generated code to a file. We will assume that we have the following register constants: SP, FP, RA, V0, A0, T0, T1, as well as the constants TRUE and FALSE (and that TRUE is represented as 1, and false as 0). We will also assume that we have the following methods:

Method Name	Purpose
generate	write the given op code and arguments, nicely formatted, to the output file
generateIndexed	the arguments are: an op code, a register R1, another register R2, and an offset; generate code of the form: `op R1, offset(R2)`
genPush	generate code to push the value of the given register onto the stack
genPop	generate code to pop the top-of-stack value into the given register
nextLabel	return a string to be used as a label (more on this later)
genLabel	given a label L, generate: `L:`

Code Generation for Global Variable Declarations

For each global variable v, generate:

	.data
	.align 2  # align on a word boundary
    _v: .space N

where N is the size of the variable in bytes. (Scalar integer and boolean variables require 4 bytes; an array would require four times the size of the array.) This code tells the assembler to set aside N bytes in the static data area, in a location labeled with the name _v.

Example: Given this source code:

	int x;
	bool y[10];
    }

you should generate this code:

	.data
	.align 2
    _x: .space 4
	.data
	.align 2
    _y: .space 40

It is not actually necessary to generate .data if the previous generated code was also for a global variable declaration; however, since function declarations can be intermixed with global variable declarations (and cause code to be generated in the text area, not the static data area), this may not be the case; it is safe (and easier) just to generate those directives for every global variable.

Code Generation for Functions

For every function you will generate code for:

the function "preamble"
the function entry (to set up the function's Activation Record)
the function body (its statements)
function exit (restoring the stack, and returning to the caller).

Function Preamble

For the main function, generate:

	.text
	.globl main
   main:

For all other functions, generate:

        .text
        _<functionName>:

This tells the assembler to store the following instructions in the text area, labeled with the given name.

After generating this "preamble" code, you will generate code for (1) function entry, (2) function body, and (3) function exit.

Function Entry

We assume that when a function is called, the stack looks like this:


				     <- SP
			|----------|
			|          |  \
	          	|          |  |  parameters
			|==========|  /
		 	|          |
	caller's AR	|          |
			|__________| <- FP

Before starting to execute the function body, we want it to look like this:

				    <- SP
			|==========|
		/	|          |  \
	        |  	|          |  | space for local variables
		|	|----------|  /
	new AR  |      	|          |    control link (saved FP)
		|	|----------|  
	        |  	|          |    return address
		|	|----------|  
		|	|          |  \
	        |  	|          |  | parameters  <- FP
		\	|==========|  /
		 	|          |
	caller's AR	|          |
			|__________|

The parameters will already be on the stack (pushed by calling function). So the code for function entry must do the following:

push the return address
push the control link
set the FP
push space for local variables

Here's the code you need to generate:

  # (1) push return addr
    sw	 $ra, 0($sp)
    subu $sp, $sp, 4
  # (2) push control link
    sw   $fp, 0($sp)
    subu $sp, $sp, 4
  # (3) set the FP
  # note: the following sets the FP to point to the "bottom"
  #       of the new AR; the reason for "+ 8" is:
  #       4 bytes each for the control link and the return addr
    addu $fp, $sp, <size of params in bytes + 8>
  # (4) push space for locals
    subu $sp, $sp, <size of locals in bytes>

Note: <size of params> and <size of locals> will need to be available to the code generator. The symbol-table entry for the function name will have information about the parameters (because that will have been used for type checking). For example, it might have a list of the symbol-table entries for the parameters. You could also store the total size of the parameters in the function name's symbol-table entry, or you could write a method that takes the list of parameters as its argument and computes the total size. It is not so easy to compute the total size of the local variables at code-generation time; it is probably a better idea to do that during name analysis. The name-analysis phase will be computing the offsets for the parameters and local variables anyway; it should not be difficult to extend that code to also compute the total size of the locals (and to store that information in the function name's symbol-table entry).

Function Body

Note: we are talking about the codeGen method for the FnBodyNode, whose subtree will look like this:

		 ---------------
		|   FnBodyNode |
		 ---------------
		 /             \
       --------------     --------------	
      |	DeclListNode |   | StmtListNode |
       --------------     --------------

There is no need to generate any code for the declarations. So to generate code for the function body, just call the codeGen method of the StmtListNode, which will in turn call the codeGen method of each statement in the list. What those methods will do is discussed below in the section on Code Generation for Statements.

Function Exit

Just before a function returns, the stack looks like this:


                                       <- SP
                           |==========|
                   /       |          |  \
                   |       |          |  | space for local variables
                   |       |----------|  /
this function's AR |       |          |    control link (saved FP)
                   |       |----------|  
                   |       |          |    return address
                   |       |----------|  
                   |       |          |  \
                   |       |          |  |  parameters  <- FP
                   \       |==========|  /
                           |          |
           caller's AR     |          |
                           |__________|

We need to generate code to pop off this function's AR, then to jump to the address in the "return address" field. Popping off the AR means restoring the SP and FP. Note that we want to move the SP to where the FP is currently pointing, but if there may be an interrupt that could use the stack, we don't want to change the SP until we're finished with all of the values in the current AR (in particular, the control link, which is used to restore the FP). Therefore, we use a temporary register (t0) to save the address that is initially in the FP

Here is the code that needs to be generated:

  lw   $ra, -<param size>($fp)    # load return address
  move $t0, $fp                   # save value of FP
  lw   $fp, -<paramsize+4>($fp)   # restore FP
  move $sp, $t0                   # restore SP
  jr   $ra                        # return

Note that there are two things that cause a function to return:

A return statement is executed, or
The last statement in the function is executed (i.e., execution "falls off the end" of the function).

You could generate the "return" code given above for each return statement as well as after the last statement in the function body. A more space-efficient approach would be:

Generate the "return" code just once after generating the code for the function body. Label that code with a unique label (e.g.: _<functionName>_Exit).
For each return statement, generate a jump to the label you used (the op code for an unconditional jump is just b).

Note that in this case, you'll need to know the current function name when you are generating code for the function's body. As usual, you could pass those values as parameters to the codeGen functions, or you could store the values in some global variables.

What about a return statement that returns a value? As discussed below, the codeGen method for the returned expression will generate code to evaluate that expression, leaving the value on the stack. The MIPS convention is to use register V0 to return a value from a function. So the codeGen method for the return statement should generate code to pop the value from the stack into register V0 (before generating the "return" code or the jump to the return code discussed above).

Code Generation for Statements

You will write a different codeGen method for each kind of StmtNode. You are strongly advised to write this method for the WriteStmtNode first. Then you can test code generation for the other kinds of statements and the expressions by writing a C-- program that computes and prints a value. It will be much easier to find errors in your code this way (by looking at the output produced when a C-- program is run) than by looking at the assembly code you generate.

Write Statement

To generate code for a write statement whose expression is of type int, bool or String, you must:

Call the codeGen method of the expression being printed. That method will generate code to evaluate the expression, leaving that value on the top of the stack. (If the type of the expression is String, the address of the string will be left on the stack.)
Generate code to pop the top-of-stack value into register A0 (a special register used for output)
Generate code to set register V0 according to the type of the expression to be written:

Type Value of V0

integer 1

boolean 1

String 4
Generate: syscall

Note that since boolean values are actually represented using integer values (0 for false and 1 for true), you should treat boolean values the same as integer values in terms of printing.

Here is the code you would write for the codeGen method of the WriteStmtNode:

    // step (1)
    myExp.codeGen();
    
    // step (2)
    genPop(A0);
    
    // step (3)
    if ( -- type of exp is int or bool -- ) {
        generate("li", V0, 1);
    }
    else if ( -- type of exp is String -- ) {
        generate("li", V0, 4);
    }
    
    // step (4)
    generate("syscall");

If-Then Statement

The AST for an if-then statement looks like:

                    ------------
                   | IfStmtNode |
                    ------------
                 /       |        \
       ---------   --------------   -------------
      | ExpNode | | DeclListNode | | StmtListNode|
       ---------   --------------   -------------

There are two different approaches to generating code for statements that involve conditions (e.g., for if statements and while loops):

The numeric method, and
the control-flow method.

We will discuss code generation for if-then statements assuming the numeric method here; the control-flow method will be discussed later. The code generated by the IfStmtNode's codeGen method will have the following form:

Evaluate the condition, leaving the value on the stack.
Pop the top-of-stack value into register T0.
Jump to FalseLabel if T0 == FALSE.
Code for the statement list.
FalseLabel:

Labels:

Note that the code generated for an if-then statement will need to include a label. Each label in the generated code must have a unique name (although we will refer to labels in these notes using names like "FalseLabel" as above). As discussed above, we will assume that there is a method called nextLabel that returns (as a String) a new label every time it is called, and we will assume that there is a method called genLabel that prints the given label to the assembly-code file. (We will assume that the labels returned by nextLabel are of the form: L<unique #>. So the first time nextLabel is called, it will return "L0"; the second time: "L1"; the third time: "L2"; etc.)

TEST YOURSELF #1

Question 1: What is the form of the code generated by an IfElseStmtNode's codeGen method?

Question 2: What is the actual code that needs to be written for the IfStmtNode's codeGen method?

Question 3: What is the form of the code generated by a WhileStmtNode's codeGen method?

Return Statement

The AST for a return statement is either:

                    ----------------
                   | ReturnStmtNode |
                    ----------------

or:

                    ----------------
                   | ReturnStmtNode |
                    ----------------
                           |
                      -----------
                      | ExpNode |
                      -----------

As discussed above, if a value is being returned, the ReturnStmtNode's codeGen method should call its ExpNode's codeGen method (to generate code to evaluate the returned expression, leaving the value on the stack), then should generate code to pop that value into register V0.

To generate the code that actually does the return, use one of the following approaches:

For each return statement in the program, generate a copy of the code that pops the AR off the stack and jumps back to the return address (that code was discussed above under Function Exit), or
For each return statement in the program, generate a jump to the "return" code that is generated at the end of the function. Note that in this case you will need to label that return code, and you will need to know what that label is when generating code for a return statement.

Digression: Code Generation for Names

Before considering other kinds of statements, let's think about the role identifiers will play in code generation. Names show up in the following contexts:

function calls (the name of the called function)
assignment statements (on the left-hand side)
expressions (an expression can be just a name, or a name can be the operand of any operator)

The code that needs to be generated for the name will be different in each context:

For a function call, we will need to generate a jump-and-link instruction using the name of the function (the same name that was generated as a label in the function's "preamble" code).
For an assignment, we will need to generate code to store a value into the appropriate location (in the static data area for a field, or into the current Activation Record for a local variable).
For an expression, we will need to generate code to fetch the current value either from the static data area or from the current Activation Record, and to push that value onto the stack.

Note that in each case, to generate code we will need information from the symbol-table entry for the name being used (for a function call or a use of a global variable, we will need to generate the label, using the variable/function name; for a local variable, we will need the offset in the Activation Record). Therefore, it seems reasonable to write several different code-generation methods for the IdNode class; for example:

genJumpAndLink,
genStore,
codeGen

We use "codeGen" for the third case (fetching the value and pushing it onto the stack) since that is what the codeGen methods of all ExpNodes must do.

We will come back to how to write the three methods for IdNodes after we talk about code generation for other kinds of statements. For now, we'll just assume that we have those methods at our disposal.

Assignment Statement

The AST for an assignment looks like:

		 ----------------
		| AssignStmtNode |
		 ----------------
		 /             \
           ---------         ---------
          | ExpNode |       | ExpNode |
           ---------         ---------

The AssignStmtNode's codeGen method must generate code to:

Evaluate the right-hand-side expression, leaving the value on the stack.
Store the top-of-stack value into the address of the left-hand-side location.

As for a function call, the work is done by calling the AssignStmtNode's children's methods: the codeGen method of the right-hand-side ExpNode, and the genStore method of the left-hand-side ExpNode (which in fact cannot be an arbitrary expression; in the absence of arrays, it can only be an IdNode).

Code Generation for IdNodes

As discussed above, we will need three code-generation methods for the IdNode class:

genJumpAndLink
genStore
codeGen

genJumpAndLink: The genJumpAndLink method will simply generate a jump-and-link instruction (with opcode jal) using the appropriate label as the target of the jump. If the called function is "main", the label is just "main". For all other functions, the label is of the form:

_<functionName>

genStore: The genStore method must pop the top-of-stack value into a register (e.g., T0), then store it in the appropriate location. If the Id is a global variable, the value must be stored in the static data area, using _<varName> as the address. Otherwise, the value must be stored in the current Activation Record, using the appropriate offset from the FP (in this case, you can use the generateIndexed method to generate the store instruction).

Note that this means there must be a way to tell whether an IdNode represents a global or a local variable. There are several possible ways to accomplish this:

The symbol-table entry includes a "kind" field (which distinguishes between globals and locals).
Different sub-classes of the Sym class are used for globals and for local variables (so you can tell whether you have a global or a local using "instanceof", or using an IsGlobal method that you write for each sub-class of Sym).
The symbol-table entry includes an "offset" field; for local variables, that field has a value less than or equal to zero, while for globals, the value is greater than zero.

codeGen: The codeGen method must copy the value of the global / local variable into a register (e.g., T0), then push the value onto the stack. The code to copy the value into T0 will be similar to the code generated by the genStore method, except that the value will be copied from the static data area or the Activation Record to T0, instead of vice-versa.

Code Generation for Expressions

The codeGen method for the subclasses of ExpNode must all generate code that evaluates the expression and leaves the value on top of the stack. We have already talked about how to do this for IdNodes (using their codeGen methods); in the subsections below we discuss code generation for the other kinds of expressions.

Literals

The codeGen methods for IntLitNodes, TrueNodes, and FalseNodes must simply generate code to push the literal value onto the stack. The generated code will look like this:

           li    $t0, <value>        # load value into T0
           sw    $t0, ($sp)          # push onto stack
           subu  $sp, $sp, 4

For a StringLitNode, the string literal itself must be stored in the static data area, and its address must be pushed. Note that two string literals with the same sequence of characters should be considered equal (i.e., when compared using ==). This means that if there is more than one instance of the same string literal in the program, only a single copy should be stored in the static data area. The code to store a string literal in the static data area looks like this:

           .data
  <label>: .asciiz <string value>

Note:

<label> needs to be a new label; e.g., returned by a call to nextLabel.
The <string value> needs to be a string in quotes. You should be storing string literals that way, so just write out the value of the string literal, quotes and all.

To avoid storing the same string literal value more than once, keep a hashtable in which the keys are the string literals, and the associated information is the statc-data-area label. When you process a string literal, look it up in the hashtable: if it is there, use its associated label; otherwise, generate code to store it in the static data area, and add it to the hashtable.

The code you need to generate to push the address of a string literal onto the stack looks like this:

           .text
           la   $t0, <label>       # load addr into $t0
           sw   $t0, ($sp)         # push onto stack
           subu $sp, $sp, 4

Function Call

The AST for a function call looks like:

		 --------------
		| CallExpNode |
		 --------------
		 /             \
            --------         -------------
           | IdNode |       | ExpListNode |
            --------         -------------

We need to generate code to:

Evaluate each actual parameter, pushing the values onto the stack;
Jump and link (jump to the called function, leaving the return address in the RA register).
Push the returned value (which will be in register V0) onto the stack.

Since the codeGen method for an expression generates code to evaluate the expression, leaving the value on the stack, all we need to do for step 1 is call the codeGen method of the ExpListNode (which will in turn call the codeGen methods of each ExpNode in the list). For step 2, we just call the genJumpAndLink method of the IdNode. For step 3, we just call genPush(V0).

Note that there is also a call statement:

		 --------------
		| CallStmtNode |
		 --------------
                       |
		 --------------
		| CallExpNode |
		 --------------
		 /             \
            --------         -------------
           | IdNode |       | ExpListNode |
            --------         -------------

In this case, the called function may not actually return a value (i.e., may have return type void). It doesn't hurt to have the CallExpNode's codeGen method push the value in V0 after the call (it will just be pushing some random garbage), but it is important for the CallStmtNode's codeGen method to pop that value.

Non Short-Circuited Operators

The codeGen methods for the non short-circuited operators (PlusNode, MinusNode, ..., NotNode, LessNode, ..., EqualsNode, etc.) must all do the same basic sequence of tasks:

Call each child's codeGen method to generate code that will evaluate the operand(s), leaving the value(s) on the stack.
Generate code to pop the operand value(s) off the stack into register(s) (e.g., T0 and T1). Remember that if there are two operands, the right one will be on the top of the stack.
Generate code to perform the operation (see Spim documentation for a list of opcodes).
Generate code to push the result onto the stack.

Note that the comparison operators (described in Section 2.6 of the Spim Reference Manual) produce one (not minus one) when the result of the comparison is true. However, the not operator does a bitwise logical negation, so it will not work correctly if you use 0 and 1 to represent true and false. Thus, it is better to use the seq operator instead of the not operator.

Example: Recall that the AST for an addition looks like this:

		 ----------
		| PlusNode |
		 ----------
		 /        \
          ---------      ---------
         | ExpNode |    | ExpNode |
          ---------      ---------

Here is the codeGen method for the PlusNode:

public void codeGen() {
    // step 1: evaluate both operands
    myExp1.codeGen();
    myExp2.codeGen();

    // step 2: pop values in T0 and T1
    genPop(T1);
    genPop(T0);
    
    // step 3: do the addition (T0 = T0 + T1)
    generate("add", T0, T0, T1);
    
    // step 4: push result
    genPush(T0)
}

To illustrate how code is generated for an expression involving several operators, consider generating code for the expression: b + c * d. Here is the AST for the expression:

                PlusNode
                /       \
	     IdNode   TimesNode
               b       /     \
                     IdNode  IdNode
                        c      d

Below is the sequence of calls that would be made at compile time to generate code for this expression, and a description of what the generated code does.

        Sequence of calls          What the generated code does
	-----------------	   ----------------------------

  +--- PlusNode.codeGen()
  |      IdNode.codeGen()  --------->       push b's value
  | +-   TimesNode.codeGen()
  | |      IdNode.codeGen() --------->      push c's value
  | |      IdNode.codeGen() --------->      push d's value
  | |
  | +------------------------------->   pop d's value into T1
  |                                     pop c's value into T0
  |                                     T0 = T0 * T1
  |                                     push T0's value
  |
  +---------------------------------> pop result of * into T1
                                      pop b's value into T0
                                      T0 = T0 + T1
                                      push T0's value

Short-Circuited Operators

The short-circuited operators are represented by AndNodes and OrNodes. "Short-circuiting" means that the right operand is evaluated only if necessary. For example, for the expression (j != 0) && (k/j > epsilon), the sub-expression (k/j > epsilon) is evaluated only if variable j is not zero. Therefore, the code generated for an AndNode must work as follows:

  evaluate the left operand
  if the value is true then
     evaluate the right operand;
     that value is the value of the whole expression
  else
     don't bother to evaluate the right operand
     the value of the whole expression is false

Similarly, for an OrNode:

  evaluate the left operand
  if the value is false then
     evaluate the right operand;
     that value is the value of the whole expression
  else
    don't bother to evaluate the right operand
    the value of the whole expression is true

This means that the code generated for the logical operators will need to involve some jumps depending on the values of some expressions.

TEST YOURSELF #2

Expand the outlines given above for the code generated for AndNodes and OrNodes, giving a lower-level picture of the generated code. Use the outline of the code generated for an if-then statement as a model of what to write.

Control-Flow Code

As mentioned above in the section on If-Then Statements, there are actually two different approaches to generating code for statements with conditions (like if statements and while loops):

The numeric approach: This is the approach that we have assumed so far. Using the numeric approach, the codeGen method for a statement with a condition generates code to evaluate the condition, leaving the value on the stack. That value is then popped, and a jump is executed if it has a particular value.
The control-flow or jump-code approach: In this case, the code-generation method for the condition has two parameters, both of which are labels (i.e., Strings) named TrueLabel and FalseLabel. Instead of leaving the value of the condition on the stack, the code generated to evaluate the condition jumps to the TrueLabel if the condition is true, and jumps to the FalseLabel if the condition is false.

We will assume that the new code-generation method for the condition is called genJumpCode. As we will see, the reason to prefer the control-flow approach over the numeric approach is that, using the control-flow approach, we will generate fewer instructions for statements that involve conditions (so the generated code will be smaller and will run faster).

First, let's reconsider code generation for an if-then statement; this time we'll use the control-flow method instead of the numeric method for evaluating the condition part of the statement. Under this new assumption, the code generated by the IfStmtNode's codeGen method will have the following form:

           -- code to evaluate the condition, jumping to TrueLab if it is true,
	      and to DoneLab if it is false --
  TrueLab:
           -- code for the statement list --
  DoneLab:

The actual code written for the IfStmtNode's codeGen method will be:

public void codeGen() {
   String trueLab = nextLabel();
   String doneLab = nextLab();
   myExp.genJumpCode(trueLab, doneLab);
   genLabel(trueLab);
   myStmtList.codeGen();
   genLabel(doneLab);
}

To implement the control-flow approach, in addition to changing the codeGen method for the statements that have conditions, we must also write a new genJumpCode method for each AST node that could represent a boolean expression. Below, we look at three representative cases, for IdNode, TrueNode, and LessNode (the code generated for a FalseNode is essentially the same as for a TrueNode, and the code generated for the other relational and equality operators is similar to that generated for a LessNode). We give the code generated for the (old) numeric approach, and for the (new) control-flow approach so that we can see which code is better (in terms of the number of instructions). (The code given below is not quite assembly code -- for example, for clarity, we use "push" instead of the actual two instructions that implement a push operation.)

  IdNode:  numeric                            control-flow
           -------                            ------------

            lw  $t0, <var's addr>             lw  $t0, <var's addr>
            push $t0                          beq $t0, FALSE, falseLab
                                              b   trueLab

Note that in both approaches to generating code for an IdNode, 3 instructions are generated (because "push" is actually 2 instructions). However, while all 3 of the numeric-approach instructions will always execute, the last instruction generated by the control-flow approach will execute only if the value of the Id is true.

  TrueNode:  numeric                          control-flow
             -------                          ------------

            li  $t0, TRUE                     b   trueLab
            push $t0

In this case, the numeric approach generates 3 instructions, while the control-flow approach generates just 1. All instructions are executed every time.

  LessNode:  numeric                                     control-flow
             -------                                     ------------

             -- code to evaluate both operands           -- ditto
             -- pop values into T1, T0                   -- ditto
             slt  $t2, $t0, $t1                          blt $t0, $t1, trueLab
             push $t2                                    b   falseLab

The operands of a LessNode are integer expressions, not boolean expressions, so both approaches start by generating (the same) code to evaluate the operands and to pop their values into registers T0 and T1. After that, however, the numeric approach will generate 3 instructions (to set $t2 to TRUE or FALSE as appropriate, then to push that value onto the stack -- remember that "push" is really two instructions), while the control-flow code will generate only 2 instructions. Furthermore, as was the case for the IdNode, all three of the numeric-approach instructions will always execute, while the last instruction generated by the control-flow approach will only execute if the comparison evaluates to false.

Now let's consider how to write the new genJumpCode method for the short-circuited operators (AndNodes and OrNodes).

Recall that the AST for an && expression looks like this:

		 ---------
		| AndNode |
		 ---------
		 /        \
           ---------       ---------
          | ExpNode |     | ExpNode |
           ---------       ---------

Here's how the genJumpCode method of the AndNode works:

Start by calling the genJumpCode method of the left child. That call will generate code to evaluate the left operand, jumping to the "TrueLabel" that we pass if its value is true, and jumping to the "FalseLabel" that we pass if its value is false.
So what labels should be passed?
- If the left operand is false, then the value of the whole expression is false, so pass the given "FalseLabel" as the "FalseLabel" in the recursive call.
- If the left operand is true, then we must evaluate the right operand, so pass a new label as the "TrueLabel" in the recursive call.
After the call to the left child's genJumpCode method, call genLabel to generate the new label.
Finally, call the genJumpCode method of the right child.
What labels should be passed as the arguments to the right child's genJumpCode method?
- This expression will only be evaluated if the left operand evaluated to true; in that case, the value of this expression (the right operand) is the value of the whole expression. Therefore, for this recursive call, pass the original "TrueLabel" and "FalseLabel".

The AndNode's genJumpCode method would be:

public void genJumpCode(String trueLab, String falseLab) {
    String newLab = nextLabel();
    
    myExp1.genJumpCode(newLab, falseLab);
    genLabel(newLab);
    myExp2;genJumpCode(trueLab, falseLab);
}

Example: Consider the code that would be generated for the statement: if (a && b>0) { ... }, represented by the AST:

                    ------------
                   | IfStmtNode |
                    ------------
                 /       |        \
       ---------   --------------   -------------
      | AndNode | | DeclListNode | | StmtListNode|
       ---------   --------------   -------------
       /        \
  --------       ----------
 | IdNode |     | LessNode |
  --------       ----------    
                /          \
              ...         ...

The IfStmtNode's codeGen method would create two labels, TrueLab and DoneLab, and would call the AndNode's genJumpCode method, passing those labels as the arguments. The AndNode's genJumpCode method would create one new label (NewLab) and then would call the genJumpCode method of its left child (the IdNode), passing NewLab and DoneLab as the arguments. It would then generate the NewLab label, and then would call its right child's genJumpCode method, passing TrueLab and DoneLab. The code generated for the whole condition would look like this:

         Generated Code                                  Generated By
	 --------------					 ------------

            -- code to load the value of a into T0          IdNode
            jump to DoneLab if T0 == FALSE                  IdNode
            jump to NewLab                                  IdNode
NewLab:                                                     AndNode
            -- code to push the value of b                  LessNode's child
            -- code to push the iteral 0                    LessNode's child
            pop into T1                                     LessNode
            pop into T0                                     LessNode
            jump to TrueLab if T0 < T1                      LessNode
            jump to DoneLab                                 LessNode

(Of course, the actual label would be something like L3, not NewLab.) After calling the AndNode's genJumpCode method, the IfStmtNode's codeGen method would call genLabel to print TrueLab, then would call its StmtList child's codeGen method to generate code for the list of statements. Finally, it would call genLabel to print DoneLab. So the code generated for this if statement would be like this:

            -- code to load the value of a into T0
            jump to DoneLab if T0 == FALSE
            jump to NewLab
NewLab:
            -- code to push the value of b
            -- code to push the iteral 0
            pop into T1
            pop into T0
            jump to TrueLab if T0 < T1
            jump to DoneLab
TrueLab:
	    -- code for the list of statements
DoneLab:

TEST YOURSELF #3

Question 1: What is the form of the code generated by an OrNode's genJumpCode method?

Question 2: What is the form of the code generated by a NotNode's genJumpCode method?

How does the code generated for an AndNode using the control-flow method compare to the code generated using the numeric method? Here are outlines of the code generated in each case:

   Numeric Code                                  Control-Flow Code
   ------------                                  -----------------

   -- code to evaluate left                         -- code to evaluate left
   -- operand, leaving the                          -- operand, including jumps
   -- value on the stack                            -- to NewLab and FalseLab
   pop into T0                                  
   goto TrueLab if T0 == TRUE                 newLab:
   push FALSE
   goto DoneLab
TrueLab:
   -- code to evaluate right                        -- code to evaluate right
   -- operand, leaving the                          -- operand, including
   -- value on the stack                            -- jumps to TrueLab and
                                                    -- FalseLab
DoneLab:

Note that the numeric code includes 6 instructions (shown in bold) in addition to the ones generated by the codeGen methods of the two children, while in the control-flow case, no instructions are generated by the AndNode itself (just a label); the instructions are all generated by the genJumpCode methods of its two children. Those children could represent names, boolean literals, comparisons or logical expressions. We have already seen that in the first three cases, better code is generated using the control-flow approach. Now we see that for logical expressions (at least for AndNodes) fewer instructions are generated using the control-flow approach than using the numeric approach.

TEST YOURSELF #4

Compare the code generated by the two approaches for an OrNode and for a NotNode.

Finally, let's compare the code generated by the numeric and control-flow approaches for an if-then statement. Here are outlines of the two different versions of the code that would be generated:

   Numeric Code                                  Control-Flow Code
   ------------                                  -----------------

   -- code for condition, leaving                -- code for condition,
   -- value on the stack                         -- including jumps to
                                                 -- trueLab and falseLab
   pop into T0
   goto falseLab if T0 == FALSE               trueLab: 
   -- code for "then" stmts                      -- code for "then" stmts
   goto doneLab                                  goto doneLab
falseLab:                                     falseLab:          
   -- code for "else" stmts                      -- code for "else" stmts
doneLab:                                      doneLab:

Note that much of the code is the same for the two methods; the code generated to evaluate the condition will be different, and the numeric method has three extra instructions: a pop, followed by a conditional goto (those instructions are shown in bold font). So as long as the code generated for the condition is no worse for the control-flow method than for the numeric method, the control-flow method is better (both in terms of the number of instructions generated, and in terms of the number of instructions that will be executed).

But we have already looked at the code generated for all the different kinds of conditions, for each of the two approaches, and in fact the control-flow method was never worse, and was sometimes better. Thus, the control-flow method is the winner, in terms of generating less and more efficient code!

Contents

Spim

Digression: Code Generation for Names