CS536 Class Notes
March 3, 1999

Parsing

Overview

    - reads the input tokens identified by the scanner and matches against a context free grammar
    - needs to process tokens efficiently, because:
            used in compilers, frequently used program
            number of tokens in a usual program is in the thousands
    - often faster than a scanner, because:
            less system calls
            less tokens used by a parser, than chars used by the scanner
    - examples of programs:
            Yacc - early version associated with Unix, produces C code to do the parsing
            JavaCup - produces Java code to do parsing
    - example grammar:
              E -> E + T
                   |  T
              T -> T * ID
                   |  ID
 

Top Down Approach

    this approach discovers a leftmost derivation sequence
    productions are discovered from top down

    the leftmost derivation of input: ID + ID,  ie. E =>* ID + ID
    E => E + T
    E => T + T
    E => ID + T
    E => ID + ID

                 Parse Tree Sequence

                   E
 
 
 
 

 

                  E
               /      \
             E  +   T
 

 

                 E
              /       \
           E   +   T
           |          |
           T         ID
            |
           ID
                 general sequence, and doesn't necessarily correspond to the leftmost derivation
 

Bottom Up Approach

    this approach discovers a rightmost derivation sequence
    productions are discovered from bottom up

    the rightmost derivation of input: ID + ID,  ie. E =>* ID + ID
    E => E + T
    E => E + ID
    E => T + ID
    E => ID + ID

                 Parse Tree Sequence


 
 

                T           T
                 |            |
                ID   +   ID 

                       E
                   /       \
                 E         \
                              |
                 T          T
                  |           |
                 ID   +  ID
                      E
                   /       \
                 E         \
                  |           |
                 T          T
                  |           |
                 ID   +  ID
                   general sequence, and doesn't necessarily correspond to the rightmost derivation
 

Analysis of Top Down Approach

    sample grammar:
    S -> a
        |    (S)
        |    (S]

    (typical to an if/then/else vs. if/then construct - I think in regards to closing constructs like "fi")

 Case  Input      Successive # steps    Total
1        a 1 1
2 (a] 1+1+3+2 7
3 ((a]] 1+1+7+1+7 17
4 (((a]]] 1+(1+17)+(1+17)      37

    Explanation for test case #2 (possible)
    input - (a]
    token - (
.   try - a        n
.   try - (S)    y
        token - a
.       try - a        y
            token - ]
.           should be - )    n

    token (
.   try - (S]    y
        token - a
.       try - a        y
            token - ]
.           should be - ]    y
    ---------------------
    7 total steps

    For the input (a] with i ('s and i ]'s, the number of steps is (5*2^i)-3, which expands very
    quickly.  Even with dynamic programming, the algorithm is O(i^3).
 

A Better Way

    Uses predictions.  The predictset is defined as:
    predict(A->X1 ...Xn) - set of all initial (first) nonterminals derivable from X1 ... Xn
                    or precisely  - { a | a element of Terminals and X1 ... Xn =>* a... }

    Using this method will alleviate the problems of going down wrong paths as in the first
    algorithm presented.



Notes by Brandon Schendel