Contents

Motivation and Definition

Recall that the parser must produce output (e.g., an abstract-syntax tree) for the next phase of the compiler. This involves doing a syntax-directed translation -- translating from a sequence of tokens to some other form, based on the underlying syntax.

A syntax-directed translation is defined by augmenting the CFG: a translation rule is defined for each production. A translation rule defines the translation of the left-hand side nonterminal as a function of:

To translate an input string:
  1. Build the parse tree.
  2. Use the translation rules to compute the translation of each nonterminal in the tree, working bottom up (since a nonterminal's value may depend on the value of the symbols on the right-hand side, you need to work bottom-up so that those values are available).
The translation of the string is the translation of the parse tree's root nonterminal.

Example 1

Below is the definition of a syntax-directed translation that translates an arithmetic expression to its integer value. When a nonterminal occurs more than once in a grammar rule, the corresponding translation rule uses subscripts to identify a particular instance of that nonterminal. For example, the rule exp exp PLUS term has two exp nonterminals; exp1 means the left-hand-side exp, and exp2 means the right-hand-side exp. Also, the notation xxx.value is used to mean the value associated with token xxx.
CFG Production Translation rules
exp exp PLUS term exp1.trans = exp2.trans + term.trans
exp term exp.trans = term.trans
term term TIMES factor term1.trans = term2.trans * factor.trans
term factor term.trans = factor.trans
factor INTLITERAL factor.trans = INTLITERAL.value
factor LPARENS exp RPARENS factor.trans = exp.trans

consider evaluating these rules on the input 2 * (4 + 5). The result is the following annotated parse tree:

Example 2

Consider a language of expressions that use the three operators: +, &&, == using the terminal symbols PLUS, AND , EQUALS, respectively. Integer literals are represented by the same INTLITERAL token we've used before, and TRUE and FALSE represent the literals true and false (note that we could have just as well defined a single BOOLLITERAL token that the scanner would populate with either true or false).

Let's define a syntax-directed translation that type checks these expressions; i.e., for type-correct expressions, the translation will be the type of the expression (either int or bool), and for expressions that involve type errors, the translation will be the special value error. We'll use the following type rules:

  1. Both operands of the + operator must be of type int.
  2. Both operands of the && operator must be of type bool.
  3. Both operands of the == operator must have the same (non-error) type.
Here is the CFG and the translation rules:
CFG Production Translation rules
exp exp  PLUS  term if (exp2.trans == int and (term.trans == int) then
      exp1.trans = int
else
     exp1.trans = error
exp exp  AND  term if (exp2.trans == bool and (term.trans == bool) then
      exp1.trans = bool
else
     exp1.trans = error
exp exp  EQUALS  term if (exp2.trans == term.trans) and (term.trans error) then
      exp1.trans = bool
else
     exp1.trans = error
exp term exp.trans = term.trans
term TRUE term.trans = bool
term FALSE term.trans = bool
term INTLITERAL term.trans = int
term LPARENS exp RPARENS term.trans = exp.trans

Here's an annotated parse tree for the input (2 + 2) == 4


TEST YOURSELF #1

The following grammar defines the language of base-2 numbers:

Define a syntax-directed translation so that the translation of a binary number is its base 10 value. Illustrate your translation scheme by drawing the parse tree for 1001 and annotating each nonterminal in the tree with its translation.

solution


Building an Abstract-Syntax Tree

So far, our example syntax-directed translations have produced simple values (an int or a type) as the translation of an input. In practice however, we want the parser to build an abstract-syntax tree as the translation of an input program. But that is not really so different from what we've seen so far; we just need to use tree-building operations in the translation rules instead of, e.g., arithmetic operations.

The AST vs the Parse Tree

First, let's consider how an abstract-syntax tree (AST) differs from a parse tree. An AST can be thought of as a condensed form of the parse tree:

In general, the AST is a better structure for later stages of the compiler because it omits details having to do with the source language, and just contains information about the essential structure of the program.

Below is an example of the parse tree and the AST for the expression 3 * (4 + 2) (using the usual arithmetic-expression grammar that reflects the precedences and associativities of the operators). Note that the parentheses are not needed in the AST because the structure of the AST defines how the subexpressions are grouped.

For constructs other than expressions, the compiler writer has some choices when defining the AST -- but remember that lists (e.g., lists of declarations lists of statements, lists of parameters) should be flattened, that operators (e.g., "assign", "while", "if") go at internal nodes, not at leaves, and that syntactic details are omitted.

Note that in the AST there is just one stmtList node, with a list of three children (the list of statements has been "flattened"). Also, the "operators" for the statements (assign and while) have been "moved up" to internal nodes (instead of appearing as tokens at the leaves). And finally, syntactic details (curly braces and semi-colons) have been omitted.

Translation Rules That Build an AST

To define a syntax-directed translation so that the translation of an input is the corresponding AST, we first need operations that create AST nodes. Let's use java code, and assume that we have the following class hierarchy:

Now we can define a syntax-directed translation for simple arithmetic expressions, so that the translation of an expression is its AST:

solution