Parsing


Contents

Overview

There are algorithms that can be used to parse the language defined by an arbitrary CFG. However, in the worst case, the algorithms take O(n3) time, where n is the number of tokens. That is too slow!

Fortunately, there are classes of grammars for which O(n) parsers can be built (and given a grammar, we can quickly test whether it is in such a class). Two such classes are:

	LL(1)
	^^ ^
	|| |___ one token of look-ahead
	||_____ do a leftmost derivation
	|______ scan the input left-to-right

	LALR(1)
	^ ^^ ^
	| || |__ one token of look-ahead
	| ||____ do a rightmost derivation in reverse
	| |_____ scan the input left-to-right
	|_______ LA means "look-ahead"; this has nothing to do with the
	         number of tokens the parser can look at before it chooses
		 what to do -- it is a technical term that only means
		 something when you study how LR parsers work...
LALR(1) grammars are: So we will learn about LL(1) grammars (remember, if a grammar is LL(1) then it is guaranteed to be LALR(1), too, so when using Java Cup, if your grammar is not LALR(1), you can always make it LL(1) and it will work).

LL(1) Grammars and Predictive Parsers

LL(1) grammars are parsed by top-down parsers. They construct the derivation tree starting with the start nonterminal and working down. One kind of parser for LL(1) grammars is the predictive parser. The idea is as follows:

Here's how the predictive parser works: