CS536 Class Notes
March 3, 1999

Parsing

Overview

    - reads the input tokens identified by the scanner and matches against a context free grammar
    - needs to process tokens efficiently, because:
            used in compilers, frequently used program
            number of tokens in a usual program is in the thousands
    - often faster than a scanner, because:
            less system calls
            less tokens used by a parser, than chars used by the scanner
    - examples of programs:
            Yacc - early version associated with Unix, produces C code to do the parsing
            JavaCup - produces Java code to do parsing
    - example grammar:
              E -> E + T
                   | T
              T -> T * ID
                   | ID

Top Down Approach

this approach discovers a leftmost derivation sequence
productions are discovered from top down

    the leftmost derivation of input: ID + ID, ie. E =>* ID + ID
    E => E + T
    E => T + T
    E => ID + T
    E => ID + ID

Parse Tree Sequence

                  E
               /      \
             E +   T

                 E
              /       \
           E   +   T
           |          |
           T         ID
            |
           ID

general sequence, and doesn't necessarily correspond to the leftmost derivation

Bottom Up Approach

this approach discovers a rightmost derivation sequence
productions are discovered from bottom up

    the rightmost derivation of input: ID + ID, ie. E =>* ID + ID
    E => E + T
    E => E + ID
    E => T + ID
    E => ID + ID

Parse Tree Sequence

                T           T
                 |            |
                ID   +   ID

                       E
                   /       \
                 E         \
                              |
                 T          T
                  |           |
                 ID   + ID

                      E
                   /       \
                 E         \
                  |           |
                 T          T
                  |           |
                 ID   + ID

general sequence, and doesn't necessarily correspond to the rightmost derivation

Analysis of Top Down Approach

    sample grammar:
    S -> a
        |    (S)
        |    (S]

(typical to an if/then/else vs. if/then construct - I think in regards to closing constructs like "fi")

Case Input      Successive # steps    Total

1        a 1 1

2 (a] 1+1+3+2 7

3 ((a]] 1+1+7+1+7 17

4 (((a]]] 1+(1+17)+(1+17)      37

    Explanation for test case #2 (possible)
    input - (a]
    token - (
.   try - a        n
.   try - (S)    y
        token - a
.       try - a        y
            token - ]
.           should be - )    n

    token (
.   try - (S]    y
        token - a
.       try - a        y
            token - ]
.           should be - ]    y
    ---------------------
    7 total steps

For the input (a] with i ('s and i ]'s, the number of steps is (5*2^i)-3, which expands very
quickly. Even with dynamic programming, the algorithm is O(i^3).

A Better Way

    Uses predictions. The predictset is defined as:
    predict(A->X1 ...Xn) - set of all initial (first) nonterminals derivable from X1 ... Xn
                    or precisely - { a | a element of Terminals and X1 ... Xn =>* a... }

Using this method will alleviate the problems of going down wrong paths as in the first
algorithm presented.

Notes by Brandon Schendel