original production: in cup form: in cup form with actions: |
stmt -> ident = exp ; stmt ::= ident asg exp semi stmt ::= ident:i asg exp:e semi {: RESULT = new asgNode(i, e, i.linenum, i.colnum); :} |
----------- stmt = | asgNode | ----------- / \ / \ ident exp
Example 2
original production: in cup form with actions: | stmts -> stmt stmts stmts ::= stmt:s1 stmts:s2 {: RESULT = new stmtsNode(s1, s2, s1.linenum, s1.colnum); :} |
------------- stmts = | stmtsNode | ------------- / \ / \ stmt stmts
The parser generator combines the AST trees produced by each production to form the complete tree for the given grammar. Later, type checking and code generation can be done on a node by node basis. Combining the above two examples gives the following AST tree:
------------- | stmtsNode | ------------- / \ / \ ----------- ... (not defined in this example) | asgNode | ----------- / \ / \ ident exp
For project 3, we will build an unparser utility which will walk through the complete AST tree and print out the original tokens in the same way they were entered. This not only facilitates grading, but is a good way to check for errors in the structure of the tree for debugging. Each node will have a member function "void Unparse(int indent)" to print out the information it contains.
For example, the Unparse routine for identNode will be easy to implement. It will print out the serial number for the token, using Registration.toString(). A more complex example is for the asgNode. The code for this is as follows:
void Unparse(int indent){ genIndent(indent); target.Unparse(0); System.out.print(" = "); source.Unparse(0); System.out.println(";"); }
S -> A B | x B-> b A -> a A C -> d |
As discussed last lecture, this grammar has two structural problems:
|
(S) -> A (B) | (x) (B) -> (b) (A) -> (a) A (C) -> (d)
(S) -> (A) (B) | (x) (B) -> (b) (A) -> (a) (A) C -> d
A -> B C B -> lambda C -> lambda | In this example, A goes to lambda indirectly |
S -> A (B) (C) A -> a (B) -> (C) (D) (D) -> d | lambda (C) -> c | lambda
Derivation Step Expression 0 S -> A B C 1 S => a B C 2 => a C D C 3 => a D C 4 => a C 5 => a
Some notation and examples:
=> one derivation step (ex. see above) =>+ one or more derivation steps (ex. S=>+ a) =>* zero or more derivation steps (ex S=>* S)(compare the last two to the use of + and * in regular expressions)
E -> E + T | T T-> T * ID | IDTop Down Parse
|
start here-> E / | \ E + T | | T ID | ID |
Bottom Up Parse
|
E / | \ E | | | | | T | T | | | start here -> ID + ID |
Top Down parsing is simpler but slower because you have to search each production for each non-terminal in the expression. Later, we will learn more precise techniques for finding these parse trees. In general, top-down parsing uses approximately i^3 steps to parse an expression where i is the number of tokens.
Number of Tries Expression to Get Correct Parse i --------------------------------------------------- (a] 7 2 ((a]] 17 4 (((a]]] 37 6So for an average size program of 1000 tokens, it would take 10^6 steps to parse, which translates to hundreds of seconds on a machine of today's standards.