Contents

  • Overview
  • Spim
  • Auxiliary Fields and Methods
  • Code Generation for Global Variable Declarations
  • Code Generation for Functions
  • Code Generation for Statements
  • Code Generation for Expressions
  • Control-Flow Code

    Overview

    Code can be generated by a syntax-directed translation while parsing or by traversing the abstract syntax tree after the parse (i.e., by writing a codeGen method for the appropriate kinds of AST nodes). We will assume the latter approach, and will discuss code generation for a subset of the C language. In particular, we will discuss generating MIPS assembly code suitable for input to the Spim interpreter. Some information on Spim is provided in the next section; the following sections discuss code generation for:

    Spim

    Documentation on Spim is available on-line:

    To run the (plain) Spim interpreter, type:

    where <name> is the name of a file that contains MIPS assembly code (the file produced by your compiler). This will cause Spim to process the code in the file; if there are syntax errors, they will be reported, and the code will not execute. Otherwise, the code will execute; output will be printed to your terminal, and you will get error messages for any run-time errors that result.

    To run Spim with an X-windows interface (on a Linux machine), type just: xspim. This will cause a window to open. Click on the load button in that window, then type in the name of your assembly-code file, then press return. If there are syntax errors, you will see error messages. If there are no errors, you can run your program by clicking on run, then (in the small window that will be opened) on OK. If your program generates any output, a new window will be opened to display that output.

    Spim uses the following special registers (there are others; you can see the on-line Spim documentation for those, but you should not need them for the class project):

    Register Purpose
    \$sp stack pointer
    \$fp frame pointer
    \$ra return address
    \$v0 used for system calls and to return int values from function calls, including the syscall that reads an int
    \$f0 used to return double values from function calls, including the syscall that reads a double
    \$a0 used for output of int and string values
    \$f12 used for output of double values
    \$t0 - \$t7 temporaries for ints
    \$f0 - $f30 registers for doubles (used in pairs; i.e., use \$f0 for the pair \$f0, \$f1)

    Auxiliary Constants and Methods

    To simplify the task of code generation, it is convenient to have a set of constants (final static fields) that define the string representations of the registers that will be used in the generated code and the values used to represent true and false, as well as a set of methods for actually writing the generated code to a file. We will assume that we have the following register constants: SP, FP, RA, V0, A0, T0, T1, F0, F12, as well as the constants TRUE and FALSE (and that TRUE is represented as 1, and false as 0). We will also assume that we have the following methods:

    Method Purpose
    generate write the given op code and arguments, nicely formatted, to the output file
    generateIndexed the arguments are: an op code, a register R1, another register R2, and an offset; generate code of the form: op R1, offset(R2)
    genPush(String reg, int bytes) generate code to push the value in the given register onto the stack;
    parameter bytes is 4 for an int and 8 for a double
    genPop(String reg, int bytes) generate code to pop the top-of-stack value into the given register
    nextLabel return a string to be used as a label (more on this later)
    genLabel given a label L, generate: L:
    Code Generation for Global Variable Declarations

    For each global variable v, generate:

    	.data
    	.align 2  # align on a word boundary
        _v: .space N
    
    where N is the size of the variable in bytes. (Scalar integer variables require 4 bytes; double variables require 8 bytes.) This code tells the assembler to set aside N bytes in the static data area, in a location labeled with the name _v.

    Example: Given this source code:

    	int x;
    	double y;
    
    you should generate this code:
    	.data
    	.align 2
        _x: .space 4
    	.data
    	.align 2
        _y: .space 8
    
    It is not actually necessary to generate .data if the previous generated code was also for a global variable declaration; however, since function declarations can be intermixed with global variable declarations (and cause code to be generated in the text area, not the static data area), this may not be the case; it is safe (and easier) just to generate those directives for every global variable.

    Code Generation for Functions

    For every function you will generate code for:

    Function Preamble

    For the main function, generate:

    	.text
    	.globl main
       main:
    

    For all other functions, generate:

            .text
            _<functionName>:
    
    using the actual name in place of <functionName>. This tells the assembler to store the following instructions in the text area, labeled with the given name.

    After generating this "preamble" code, you will generate code for (1) function entry, (2) function body, and (3) function exit.

    Function Entry

    We assume that when a function starts executing, the stack looks like this:

    image/svg+xml caller's AR parameters FP SP
    Before starting to execute the statements in the function body, we want it to look like this:
    image/svg+xml caller's AR parameters FP SP new AR return address control link(saved FP) local variables
    The parameters will already be on the stack (pushed by calling function). So the code for function entry must do the following:
    1. push the return address
    2. push the control link
    3. set the FP
    4. push space for local variables
    Here's the code you need to generate:
      # (1) Push the return addr
        sw	 $ra, 0($sp)
        subu $sp, $sp, 4
      # (2) Push the control link
        sw   $fp, 0($sp)
        subu $sp, $sp, 4
      # (3) set the FP
      # Note: our convention for $sp is that it points to the first unused word of the stack.
      # The reason for adding 8 is that the unused word and the control link each take 4 bytes
        addu $fp, $sp, 8
      # (4) Push space for the locals
        subu $sp, $sp, <size of locals in bytes>
    
    Note: <size of params> and <size of locals> will need to be available to the code generator. The symbol-table entry for the function name will have information about the parameters (because that will have been used for type checking). For example, it might have a list of the symbol-table entries for the parameters. You could also store the total size of the parameters in the function name's symbol-table entry, or you could write a method that takes the list of parameters as its argument and computes the total size. It is not so easy to compute the total size of the local variables at code-generation time; it is probably a better idea to do that during name analysis. The name-analysis phase will be computing the offsets for the parameters and local variables anyway; it should not be difficult to extend that code to also compute the total size of the locals (and to store that information in the function name's symbol-table entry).

    Function Body

    Note: we are talking about the codeGen method for the FnBodyNode, whose subtree will look like this:

    image/svg+xml FnBodyNode StmtListNode DeclListNode

    There is no need to generate any code for the declarations. So to generate code for the function body, just call the codeGen method of the StmtListNode, which will in turn call the codeGen method of each statement in the list. What those methods will do is discussed below in the section on Code Generation for Statements.

    Function Exit

    Just before a function returns, the stack looks like this:

    image/svg+xml caller's AR parameters FP SP new AR return address control link(saved FP) local variables
    We need to generate code to pop off this function's AR, then to jump to the address in the "return address" field. Popping off the AR means restoring the SP and FP. Note that we want to move the SP to where the FP is currently pointing, but if there may be an interrupt that could use the stack, we don't want to change the SP until we're finished with all of the values in the current AR (in particular, the control link, which is used to restore the FP). Therefore, we use a temporary register (t0) to save the address that is initially in the FP

    Here is the code that needs to be generated:

      lw   $ra, 0($fp)      # load return address
      move $t0, $fp         # FP holds the address to which we need to restore SP
      lw   $fp, -4($fp)     # restore FP
      move $sp, $t0         # restore SP
      jr   $ra              # return
    
    Note that there are two things that cause a function to return:
    1. A return statement is executed, or
    2. The last statement in the function is executed (i.e., execution "falls off the end" of the function).
    You could generate the "return" code given above for each return statement as well as after the last statement in the function body. A more space-efficient approach would be:

    What about a return statement that returns a value? As discussed below, the codeGen method for the returned expression will generate code to evaluate that expression, leaving the value on the stack. The MIPS convention is to use register V0 to return a int value from a function and to use register F0 to return a double value. So the codeGen method for the return statement should generate code to pop the value from the stack into the appropriate register (before generating the "return" code or the jump to the return code discussed above).

    Code Generation for Statements

    You will write a different codeGen method for each kind of StmtNode. You are strongly advised to write this method for the WriteIntStmtNode, WriteDblStmtNode, and WriteStrStmtNode first. Then you can test code generation for the other kinds of statements and the expressions by writing a program that computes and prints a value. It will be much easier to find errors in your code this way (by looking at the output produced when a program is run) than by looking at the assembly code you generate.

    Write Statement

    To generate code for a write statement whose expression is of type int you must:

    1. Call the codeGen method of the expression being printed. That method will generate code to evaluate the expression, leaving that value on the top of the stack.
    2. Generate code to pop the top-of-stack value into register A0 (a special register used for output of strings and ints)
    3. Generate code to set register V0 to 1.
    4. Generate a syscall instruction.

    Below is the code you would write for the codeGen method of the WriteIntStmtNode.

        // step (1)
        myExp.codeGen();
        
        // step (2)
        genPop(A0, 4);
        
        // step (3)
        generate("li", V0, 1);
        
        // step (4)
        generate("syscall");
    
    The code for the WriteStrStmtNode and the WriteDblStmtNode is similar, except for the following:

    If-Then Statement The AST for an if-then statement looks like:

                        ------------
                       | IfStmtNode |
                        ------------
                     /       |        \
           ---------   --------------   -------------
          | ExpNode | | DeclListNode | | StmtListNode|
           ---------   --------------   -------------
    
    There are two different approaches to generating code for statements that involve conditions (e.g., for if statements and while loops):
    1. The numeric method, and
    2. the control-flow method.
    We will discuss code generation for if-then statements assuming the numeric method here; the control-flow method will be discussed later. The code generated by the IfStmtNode's codeGen method will have the following form:
    1. Evaluate the condition, leaving the value on the stack.
    2. Pop the top-of-stack value into register T0.
    3. Jump to FalseLabel if T0 == FALSE.
    4. Code for the statement list.
    5. FalseLabel:

    Labels

    Note that the code generated for an if-then statement will need to include a label. Each label in the generated code must have a unique name (although we will refer to labels in these notes using names like "FalseLabel" as above). As discussed above, we will assume that there is a method called nextLabel that returns (as a String) a new label every time it is called, and we will assume that there is a method called genLabel that prints the given label to the assembly-code file.


    TEST YOURSELF #1

    Question 1: What is the form of the code generated by an IfElseStmtNode's codeGen method?

    Question 2: What is the actual code that needs to be written for the IfStmtNode's codeGen method?

    Question 3: What is the form of the code generated by a WhileStmtNode's codeGen method?

    solution


    Return Statement

    The AST for a return statement is either:

                        ----------------
                       | ReturnStmtNode |
                        ----------------
    
    or:
                        ----------------
                       | ReturnStmtNode |
                        ----------------
                               |
                          -----------
                          | ExpNode |
                          -----------
    
    As discussed above, if a value is being returned, the ReturnStmtNode's codeGen method should call its ExpNode's codeGen method (to generate code to evaluate the returned expression, leaving the value on the stack), then should generate code to pop that value into register V0 or register F0 (depending on its type).

    To generate the code that actually does the return, use one of the following approaches:

    1. For each return statement in the program, generate a copy of the code that pops the AR off the stack then jumps to the return address (that code was discussed above under Function Exit), or
    2. For each return statement in the program, generate a jump to the "return" code that is generated at the end of the function. Note that in this case you will need to label that return code, and you will need to know what that label is when generating code for a return statement.

    Read Statement

    The AST for a read statement is one of the following:

                -------------------                 -------------------
                | ReadIntStmtNode |                 | ReadDblStmtNode |
                -------------------                 -------------------
                       |                                     |
                  ----------                           ----------
                  | IdNode |                           | IdNode |
                  ----------                           ----------
    

    To read an integer value into register V0, you must generate this code:

    The code loads the special value 5 into register V0, then does a syscall. The fact that V0 contains the value 5 tells the syscall to read an integer value from standard input, storing the value back into register V0. So we must start by generating the above code, then we must generate code to store the value from V0 to the address of the IdNode.

    To read a double value, load the value 7 into V0, do a syscall, and expect the input value to be in register F0 (not V0).

    Digression: Code Generation for IdNodes

    Before considering other kinds of statements, let's think about the role identifiers will play in code generation. Names show up in the following contexts:

    The code that needs to be generated for the name will be different in each context:
    1. For a function call, we will need to generate a jump-and-link instruction using the name of the function (the same name that was generated as a label in the function's "preamble" code).
    2. For an expression, we will need to generate code to fetch the current value either from the static data area or from the current Activation Record, and to push that value onto the stack.
    3. For an assignment, we will need to generate code to push the address of the variable (either the address in the static data area, or in the current Activation Record) onto the stack. Then we will generate code to store the value of the right-hand-side expression into that address.
    Therefore, it seems reasonable to write three different code-generation methods for the IdNode class; for example:

    We use "codeGen" for the second case (fetching the value and pushing it onto the stack) since that is what the codeGen methods of all ExpNodes must do.

    genJumpAndLink

    The genJumpAndLink method will simply generate a jump-and-link instruction (with opcode jal) using the appropriate label as the target of the jump. If the called function is "main", the label is just "main". For all other functions, the label is of the form:

    codeGen

    The codeGen method must copy the value of the global / local variable into a register (e.g., T0 for an int variable and F0 for a double), then push the value onto the stack. Different code will be generated for a global and a local variable. Below are four examples.
    lw $t0 _g // load the value of int global g into T0
    lw $t0 0($fp) // load the value of the int local stored at offset 0 into T0
    l.d $f0 _d // load the value of dbl global d into F0
    l.d $f0 -4($fp) // load the value of the dbl local stored at offset -4 into F0

    Note that this means there must be a way to tell whether an IdNode represents a global or a local variable. There are several possible ways to accomplish this:

    genAddr

    The genAddr method must load the address of the identifier into a register (e.g., T0), then push it onto the stack. The code is very similar to the code to load the value of the identifier; we just use the la (load address) opcode instead of the lw (load word) or l.d (load double) opcode.

    Here is the code you need to generate to load the address of a global variable g into register T0 (this works whether g is int or double, since an address always takes 4 bytes):

    and here is the code for a local, stored at offset -8:

    Assignment Statement

    The AST for an assignment statement looks like this:
    		 ----------------
    		| AssignStmtNode |
    		 ----------------
    		         |
    		    ------------
    	 	   | AssignNode |
    		    ------------
    		 /               \
               --------          ---------
              | IdNode |        | ExpNode |
               --------          ---------	
    
    The AssignStmtNode's codeGen method can call the AssignNode's method to do the assignment, but be careful: an AssignNode is a subclass of ExpNode, so like all nodes that represent expressions, it must leave the value of the expression on the stack. Therefore, the AssignStmtNode must generate code to pop (and ignore) that value.

    Code Generation for Expressions

    The codeGen method for the subclasses of ExpNode must all generate code that evaluates the expression and leaves the value on top of the stack. We have already talked about how to do this for IdNodes; in the subsections below we discuss code generation for other kinds of expressions.

    Assign

    The codeGen method for an assignment expression must generate code to:

    1. Evaluate the right-hand-side expression, leaving the value on the stack.
    2. Push the address of the left-hand-side Id onto the stack.
    3. Store the value into the address.
    4. Leave a copy of the value on the stack.
    Most of the work is done by calling the AssignNode's children's methods: the codeGen method of the right-hand-side ExpNode, and the genAddr method of the left-hand-side IdNode. How to accomplish the rest of the assignment is left to you to figure out (it isn't too difficult).

    Literals

    The codeGen methods for IntLitNodes must simply generate code to push the literal value onto the stack. The generated code will look like this:

               li    $t0, <value>        # load value into T0
               sw    $t0, ($sp)          # push onto stack
               subu  $sp, $sp, 4
    

    The code for a DblLitNode is similar, except that the value must be loaded into a double register using the appropriate opcode, and the "push" code is different for a double value. For example:

               li.d  $f0, <value>    # load value into F0
               s.d   $f0, -4($sp)    # push onto stack
               subu  $sp, $sp, 8
    

    For a StringLitNode, the string literal itself must be stored in the static data area, and its address must be pushed. The code to store a string literal in the static data area looks like this:

               .data
      <label>: .asciiz <string value>
    
    Note:
    1. <label> needs to be a new label; e.g., returned by a call to nextLabel.
    2. The <string value> needs to be a string in quotes. You should be storing string literals that way, so just write out the value of the string literal, quotes and all.
    To avoid storing the same string literal value more than once, keep a hashtable in which the keys are the string literals, and the associated information is the static-data-area label. When you process a string literal, look it up in the hashtable: if it is there, use its associated label; otherwise, generate code to store it in the static data area, and add it to the hashtable.

    The code you need to generate to push the address of a string literal onto the stack looks like this:

               .text
               la   $t0, <label>       # load addr into $t0
               sw   $t0, ($sp)         # push onto stack
               subu $sp, $sp, 4
    

    Function Call

    The AST for a function call looks like:

    		 --------------
    		| CallExpNode |
    		 --------------
    		 /             \
                --------         -------------
               | IdNode |       | ExpListNode |
                --------         -------------
    
    We need to generate code to:
    1. Evaluate each actual parameter, pushing the values onto the stack;
    2. Jump and link (jump to the called function, leaving the return address in the RA register).
    3. Push the returned value (which will be in register V0 or F0 depending on the return type) onto the stack.
    Since the codeGen method for an expression generates code to evaluate the expression, leaving the value on the stack, all we need to do for step 1 is call the codeGen method of the ExpListNode (which will in turn call the codeGen methods of each ExpNode in the list). For step 2, we just call the genJumpAndLink method of the IdNode. For step 3, we just call genPush(V0, 4) or genPush(F0, 8).

    Note that there is also a call statement:

    		 --------------
    		| CallStmtNode |
    		 --------------
                           |
    		 --------------
    		| CallExpNode |
    		 --------------
    		 /             \
                --------         -------------
               | IdNode |       | ExpListNode |
                --------         -------------
    
    In this case, the called function may not actually return a value (i.e., may have return type void). It doesn't hurt to have the CallExpNode's codeGen method push the value in V0 (or F0) after the call (it will just be pushing some random garbage), but it is important for the CallStmtNode's codeGen method to pop that value.

    Non Short-Circuited Operators

    In general, the codeGen methods for the non short-circuited operators (PlusNode, MinusNode, ..., NotNode, LessNode, ..., EqualsNode, etc.) must all do the same basic sequence of tasks:

    1. Call each child's codeGen method to generate code that will evaluate the operand(s), leaving the value(s) on the stack.
    2. Generate code to pop the operand value(s) off the stack into register(s) (e.g., T0 and T1 for ints; F0 and F2 for doubles). Remember that if there are two operands, the right one will be on the top of the stack.
    3. Generate code to perform the operation (see Spim documentation for a list of opcodes).
    4. Generate code to push the result onto the stack.
    The relational and equality operators are more complicated if they are applied to double operands (because there are no opcodes that "directly" compute the desired result). However, implementing those cases is not required for the class project (unless you want extra credit), so we will not discuss the details here.

    To minimize the amount of code you must write, you might want to define subclasses of ExpNode that have very similar code-generation methods. For example, you could define a BinaryExpNode class whose codeGen method does all of the steps above, calling an opCode method (that would be implemented in each BinaryExpNode subclass) to get the appropriate opcode for use in step 3.

    Note that the comparison operators (described in Section 2.6 of the Spim Reference Manual) produce one when the result of the comparison is true. However, the not operator does a bitwise logical negation, so it will not work correctly if you use 0 and 1 to represent true and false. Thus, it is better to use the seq operator instead of the not operator.

    Example: Recall that the AST for an addition looks like this:

    		 ----------
    		| PlusNode |
    		 ----------
    		 /        \
              ---------      ---------
             | ExpNode |    | ExpNode |
              ---------      ---------
    
    Here is the codeGen method for the PlusNode, assuming that both operands are ints.

    To illustrate how code is generated for an expression involving several operators, consider generating code for the expression: b + c * d. Here is the AST for the expression:

                    PlusNode
                    /       \
    	     IdNode   TimesNode
                   b       /     \
                         IdNode  IdNode
                            c      d
    
    
    Below is the sequence of calls that would be made at compile time to generate code for this expression, and a description of what the generated code does.
            Sequence of calls          What the generated code does
    	-----------------	   ----------------------------
    
      +--- PlusNode.codeGen()
      |      IdNode.codeGen()  --------->       push b's value
      | +-   TimesNode.codeGen()
      | |      IdNode.codeGen() --------->      push c's value
      | |      IdNode.codeGen() --------->      push d's value
      | |
      | +------------------------------->   pop d's value into T1
      |                                     pop c's value into T0
      |                                     T0 = T0 * T1
      |                                     push T0's value
      |
      +---------------------------------> pop result of * into T1
                                          pop b's value into T0
                                          T0 = T0 + T1
                                          push T0's value
    

    Short-Circuited Operators

    The short-circuited operators are represented by AndNodes and OrNodes. "Short-circuiting" means that the right operand is evaluated only if necessary. For example, for the expression (j != 0) && (k/j > epsilon), the sub-expression (k/j > epsilon) is evaluated only if variable j is not zero. Therefore, the code generated for an AndNode must work as follows:

      evaluate the left operand
      if the value is true then
         evaluate the right operand;
         that value is the value of the whole expression
      else
         don't bother to evaluate the right operand
         the value of the whole expression is false
    
    Similarly, for an OrNode:
      evaluate the left operand
      if the value is false then
         evaluate the right operand;
         that value is the value of the whole expression
      else
        don't bother to evaluate the right operand
        the value of the whole expression is true
    
    This means that the code generated for the logical operators will need to involve some jumps depending on the values of some expressions.


    TEST YOURSELF #2

    Expand the outlines given above for the code generated for AndNodes and OrNodes, giving a lower-level picture of the generated code. Use the outline of the code generated for an if-then statement as a model of what to write.

    solution


    Control-Flow Code

    As mentioned above in the section on If-Then Statements, there are actually two different approaches to generating code for statements with conditions (like if statements and while loops):

    1. The numeric approach: This is the approach that we have assumed so far. Using the numeric approach, the codeGen method for a statement with a condition generates code to evaluate the condition, leaving the value on the stack. That value is then popped, and a jump is executed if it has a particular value.
    2. The control-flow or jump-code approach: In this case, the code-generation method for the condition has two parameters, both of which are labels (i.e., Strings) named TrueLabel and FalseLabel. Instead of leaving the value of the condition on the stack, the code generated to evaluate the condition jumps to the TrueLabel if the condition is true, and jumps to the FalseLabel if the condition is false.
    We will assume that the new code-generation method for the condition is called genJumpCode. As we will see, the reason to prefer the control-flow approach over the numeric approach is that, using the control-flow approach, we will generate fewer instructions for statements that involve conditions (so the generated code will be smaller and will run faster).

    First, let's reconsider code generation for an if-then statement; this time we'll use the control-flow method instead of the numeric method for evaluating the condition part of the statement. Under this new assumption, the code generated by the IfStmtNode's codeGen method will have the following form:

               -- code to evaluate the condition, jumping to TrueLab if it is true,
    	      and to DoneLab if it is false --
      TrueLab:
               -- code for the statement list --
      DoneLab:
    
    The actual code written for the IfStmtNode's codeGen method will be:

    To implement the control-flow approach, in addition to changing the codeGen method for the statements that have conditions, we must also write a new genJumpCode method for each AST node that could represent a condition. Below, we look at two representative cases, for IdNode and LessNode. We give the code generated for the (old) numeric approach, and for the (new) control-flow approach so that we can see which code is better (in terms of the number of instructions). (The code given below assumes that all operands are int, and it is not quite assembly code -- for example, for clarity, we use "push" instead of the actual two instructions that implement a push operation.)

      IdNode:  numeric                            control-flow
               -------                            ------------
    
                lw  $t0, <var's addr>             lw  $t0, <var's addr>
                push $t0                          beq $t0, FALSE, falseLab
                                                  b   trueLab
    
    Note that in both approaches to generating code for an IdNode, 3 instructions are generated (because "push" is actually 2 instructions). However, while all 3 of the numeric-approach instructions will always execute, the last instruction generated by the control-flow approach will execute only if the value of the Id is true.
      LessNode:  numeric                                     control-flow
                 -------                                     ------------
    
                 -- code to evaluate both operands           -- ditto
                 -- pop values into T1, T0                   -- ditto
                 slt  $t2, $t0, $t1                          blt $t0, $t1, trueLab
                 push $t2                                    b   falseLab
    
    The operands of a LessNode are integer expressions, not boolean expressions, so both approaches start by generating (the same) code to evaluate the operands and to pop their values into registers T0 and T1. After that, however, the numeric approach will generate 3 instructions (to set $t2 to TRUE or FALSE as appropriate, then to push that value onto the stack -- remember that "push" is really two instructions), while the control-flow code will generate only 2 instructions. Furthermore, as was the case for the IdNode, all three of the numeric-approach instructions will always execute, while the last instruction generated by the control-flow approach will only execute if the comparison evaluates to FALSE.

    Now let's consider how to write the new genJumpCode method for the short-circuited operators (AndNodes and OrNodes).

    Recall that the AST for an && expression looks like this:

    		 ---------
    		| AndNode |
    		 ---------
    		 /        \
               ---------       ---------
              | ExpNode |     | ExpNode |
               ---------       ---------	
    
    Here's how the genJumpCode method of the AndNode works:

    The AndNode's genJumpCode method would be:

    Example: Consider the code that would be generated for the statement: if (a && b>0) { ... }, represented by the AST: The IfStmtNode's codeGen method would create two labels, TrueLab and DoneLab, and would call the AndNode's genJumpCode method, passing those labels as the arguments. The AndNode's genJumpCode method would create one new label (NewLab) and then would call the genJumpCode method of its left child (the IdNode), passing NewLab and DoneLab as the arguments. It would then generate the NewLab label, and then would call its right child's genJumpCode method, passing TrueLab and DoneLab. The code generated for the whole condition would look like this: (Of course, the actual label would be something like L3, not NewLab.) After calling the AndNode's genJumpCode method, the IfStmtNode's codeGen method would call genLabel to print TrueLab, then would call its StmtList child's codeGen method to generate code for the list of statements. Finally, it would call genLabel to print DoneLab. So the code generated for this if statement would be like this:

    TEST YOURSELF #3

    Question 1: What is the form of the code generated by an OrNode's genJumpCode method?

    Question 2: What is the form of the code generated by a NotNode's genJumpCode method?

    solution


    How does the code generated for an AndNode using the control-flow method compare to the code generated using the numeric method? Here are outlines of the code generated in each case:

    Note that the numeric code includes 6 instructions (shown in bold) in addition to the ones generated by the codeGen methods of the two children, while in the control-flow case, no instructions are generated by the AndNode itself (just a label); the instructions are all generated by the genJumpCode methods of its two children. Those children could represent names, boolean literals, comparisons or logical expressions. We have already seen that in the first three cases, better code is generated using the control-flow approach. Now we see that for logical expressions (at least for AndNodes) fewer instructions are generated using the control-flow approach than using the numeric approach.

    TEST YOURSELF #4

    Compare the code generated by the two approaches for an OrNode and for a NotNode.

    solution


    Finally, let's compare the code generated by the numeric and control-flow approaches for an if-then statement. Here are outlines of the two different versions of the code that would be generated:

    Note that much of the code is the same for the two methods; the code generated to evaluate the condition will be different, and the numeric method has three extra instructions: a pop, followed by a conditional goto (those instructions are shown in bold font). So as long as the code generated for the condition is no worse for the control-flow method than for the numeric method, the control-flow method is better (both in terms of the number of instructions generated, and in terms of the number of instructions that will be executed).

    But we have already looked at the code generated for all the different kinds of conditions, for each of the two approaches, and in fact the control-flow method was never worse, and was sometimes better. Thus, the control-flow method is the winner, in terms of generating less and more efficient code!