Benjamin Gavin CS 536 Section 1 Classnotes 3-24-98 BEGIN Announcements: --> Read Sections 8.3-8.8 --> Handout - Programming Assignment 4 --> Handout - "Example for CSX-lite" (in file Handout_3_24_98.txt) How do yacc & javaCUP parsers work? * Any LL(1) grammar is OK * Analogy between yacc/javaUP and betting, much better to wait until race or game is just about over to guess the winner. Example: PROG --> XYZ | UVW * LL parsers must make decision immediately (on first input) * LALR parsers can wait until a Y or Z is read, then choose Definition: Configuration -- Production with a progress bar --> The DOT (*) represents how far we have proceeded Start -> STMT --> *ID = EXPR ; |-> STMT --> ID *= EXPR ; |-> STMT --> ID = *EXPR ; ** Add EXPR productions ** |-> EXPR --> *ID | EXPR --> *( EXPR ) |-> EXPR --> ID* (done) ** Done with EXPR, cross it ** |-> STMT --> ID = EXPR *; |-> STMT --> ID = EXPR ;* (done) ** Crossed STMT, now move across it in higher productions ** ^^ Above example has no Shift/Reduce Conflicts ^^ Can Have a Variety of Productions live at the same time: Example 1: STMT --> *IF ( EXPR ) STMT FI | STMT --> *IF ( EXPR ) STMT ELSE STMT FI STMT --> IF *( EXPR ) STMT FI | STMT --> IF *( EXPR ) STMT ELSE STMT FI ................. STMT --> IF ( EXPR ) STMT *FI | STMT --> IF ( EXPR ) STMT *ELSE STMT FI ^^ It is now between these two statements, we didn't have to decide ^^ ^^ until now. Still no Shift/Reduce Conflicts ^^ Example 2: begin: S --> *XYZ | S --> *PQR | ... ** P is a nonterminal, so do closure ** S --> *XYZ | S --> *PQR | P --> *A | P --> *B | P --> *C | ... ** B is read, so move across all B's ** shift(B): S --> *XYZ | S --> *PQR | P --> *A | P --> B* | P --> *C | ... ** Now remove all productions that aren't possible, Q is non-terminal ** reduce: S --> P*QR | Q --> *L | Q --> *M | ... ......... (continue) Procrastination Analogy: -> Students puts off homework in case world ends, they die, LALR parsers put off deciding until they have to, i.e. night before assignment is due. Definition: Set of Configurations => State or Parse State ** Side note, on Lambda Productions, move the dot to the right of the Lambda immediately, signifying that to match Lambda nothing needs to be done. Closure ------- Algorithm: ConfigSet Closure (ConfigSet C) { Repeat If (A --> alpha *B delta && B is non-terminal) { Add all configurations of the form (B --> *gamma) to C } Until (No more configurations are added) Return C } Example: Grammar Closure({S --> *Ab}) S --> Ab S --> *Ab A --> CD A --> *CD C --> D C --> *D | c C --> *c D --> d D --> *d GoTo Operation advances the "*" Symbol past a given symbol Algorithm: ConfigSet GoTo (ConfigSet C, Symbol X) { B = {Empty Set} // Result Set For Each (Configuration F in C) { If ( F is of the form (A --> alpha *X gamma)) { Add A --> alpha X *gamma to B } Next Return Closure(B) } Example: Given Set C = {A --> X *Y Z, B --> *Y, C --> *Z} GoTo(C,Y) = {A --> X Y *Z, B --> Y*} ** We have a shift/reduce conflict ** Shift/Reduce conflicts can cause the grammar to be unparseable (this one doesn't) Solution: Given B --> Y*, Do a reduce: If next token can legally follow B ALTERNATIVELY If next token is in Follow(B) However, if Z is in Follow(B), then DIE!! ** We don't know which production to use! ** Another Possibility: Reduce/Reduce Conflict Example: A --> A Z* B --> Z* Compute follow sets, if Follow(A) intersection Follow(B) is NOT equal to the empty set, then DIE!! We don't know which one to reduce! ******** Refer to Handout for remainder ********** Action/GoTo Tables (Last Page) To build a yacc/javaCUP parser we'll need an Action Table: ACTION TABLE ------------ Possible Actions: REDUCEi ==> Reduce Production i SHIFT ==> Shift across token ACCEPT ==> Parse successful ERROR ==> DIE! Represented by empty spaces in the tables Action[C][T] ==> Defines exactly _one_ action C == State, T == Token (terminal) GoTo[C][X] ==> Go to a particular successor state C == Current State, X == terminal or Non-terminal ** These are the two tables that yacc and javaCUP produce and use ** Driver Program for these tables, uses state numbers Pseudo-Code: Driver { State S; // Top of stack state Token CT; // The current token Push(State 0); While (true) { // OR for (;;) { switch (Action[S][CT]) { case ERROR: syntax_error(CT); return; case ACCEPT: return; case SHIFT: Push (GoTo[S][CT]); CT = Scanner.next_token(); break; case REDUCE(i): // i == Production to reduce // Assume Production i is A --> Y1 ... Ym Pop m States; Let S' == new stack top; Push(GoTo[S'][A]); break; } // switch } // while } // Driver An Example (Refer to Handout for states/transitions): Parse {A = B + C ; } EOF ** Stack is represented from left to right ** Start: | 0 Token {: | 0 1 (shift, push 1) Token A: | 0 1 4 (shift, push 4) Token =: | 0 1 4 8 (shift, push 8) Token B: | 0 1 4 8 12 (shift, push 12) Token +: | 0 1 4 8 11 (reduce 8: EXPR --> ID, pop 12, push 11) Token +: | 0 1 4 8 11 15 (shift, push 15) Token C: | 0 1 4 8 11 15 18 (shift, push 18) Token ;: | 0 1 4 8 11 (reduce 6: EXPR --> EXPR + ID, pop (18,15,11), push 11) Token ;: | 0 1 4 8 11 14 (shift, push 14) Token }: | 0 1 3 (reduce 4: STMT --> ID = EXPR ;, pop (14,11,8,4), push 3) Token }: | 0 1 3 7 (reduce 3: STMTS --> lambda, pop nothing, push 7) Token }: | 0 1 2 (reduce 2: STMTS --> STMT STMTS, pop (7,3), push 2) Token }: | 0 1 2 6 (shift, push 6) Token EOF: | 0 1 2 6 (Accept!!) ** 4 elements on stack and on right side of production 1, so it checks out. END OF LECTURE