NOTES FOR MARCH 1st: Context Free Grammers: Define valid token sequences Define structure of a program Example: Rules: E -> E + T | T T -> T * P | P P -> ID | (E) Parse Tree: E A+B*C Operator predence / \ E + T | / \ T T * P | | | P P | | | | ID ID ID On the right is the entire subtree that does multiplication, then it gets added to the left side. A+B Expression / \ E + Term -> captures operator precedence automatically -> operator associativity ( tells order in which you do operations) -> computers are left associative in the absence of explicit parenthesis When do you need associativity right to left? -> assignment -> exponentiation To change associativity, just change the first rule: from E -> E + T to E -> T + E REMEMBER: operators closest to identifier have higher priority Grammers give a great deal of power! **Problems you can create and then how to deal with them: Example: Grammer: E -> E - E | ID This can give two different parse trees: E OR E / \ / \ E - E E - E (wrong associativity) / | | \ E-E ID ID E-E | | | | ID ID ID ID ID - ID - ID gives two different parse trees (BAD) -> Grammers that allow > 1 parse tree are AMBIGUOUS -> Context Free Grammers allow more than one derivation for same problem YAK, CUP reject ambigous grammers NOTE: In our program, IF THEN ELSE is ambigous -> definition is valid but not what you want, error can not autom. machine Example: (non Term in CAPS; term is in lower) S -> A B (has bugs independent of strings, user may | x or may not have meant to generate) B -> b A -> a A C -> d has flaws: 1. Every non-term must generate at least one string of all terminals (including lambda) -> for our example: infinite recursion, not stopping criteria criteria -> can't generate a string of only terms 2. All non-terminals must be reachable from start string -> for our example: no way to get to C; production never used **How can we program this? Grammer analysis program; essentially worklist algorithm. Example: Algorithm to find non-terminals that fail rule #1 A) mark all terminal symbols B) REPEAT: if all symbols in the right hand side (RHS) are marked, then mark left hand side (LHS) symbols UNTIL NO CHANGES Example: Algorithm to march reachable non-terminals (from S) A) mark S (trivially reachable) B) REPEAT if LHS of production is marked, then mark all non- terminals on RHS UNTIL NO CHANGES Example: Alogrithm to mark lambdas (example: optional semicolon -> ; | lambda ) A) if A -> lambda in a production, then mark A (all occurances) B) REPEAT if B->CD......X in a production and the entire RHS is marked, then mark LHS UNTIL NO CHANGES Any questions? email piltch@cs.wisc.edu (Naomi)