There is a link to the Java Cup User's Manual under "Useful Programming
Tools" on the class web page.
Here is the same
link.
Java Cup is a parser generator that produces a parser written in Java.
Here's a picture illustrating how to create a parser using Java Cup:
The output of Java Cup includes a Java source file named parser.java,
which defines a class named parser with a method named parse.
Java Cup also produces a Java source file named sym.java, which
contains a class named sym that declares one public final static
int for each terminal declared in the Java Cup specification.
The parser class has a one-argument constructor;
the argument is of type Yylex (i.e., a scanner).
The parse method of the parser class uses the given scanner
to translate the input (the input stream is an argument passed to the
scanner's constructor) to a sequence of tokens.
It parses the tokens according to the given grammar, and does a
syntax-directed translation of the input using
the actions associated with the grammar productions.
If the input is not syntactically correct, the parser gives an error
message and quits (i.e., it only finds the first syntax error);
otherwise, it returns a Symbol whose value
field contains the translation of the root nonterminal (as defined by
the actions associated with the grammar rules).
See the
Java Cup Reference Manual for a description of this part
of the specification.
All terminal and nonterminal symbols that appear in the grammar must be
declared.
If you want to make use of the value associated with a terminal
(the value field of the Symbol object returned by the
scanner for that token) in your syntax-directed translation, then
you must also declare the type of that value field.
Similarly, you must declare the types of the translations associated
with all of the nonterminals.
A grammar like:
For example:
Sometimes the same operator is used as both a unary and a binary operator, and
the two uses have different precedence levels (for example,
binary minus usually has a low precedence, while unary minus has a high
precedence).
This case can be handled either by rewriting the grammar, or by
declaring a "phony" terminal symbol (e.g., UMINUS), giving it the
appropriate precedence, and using it in the grammar rules part of the
specification to specify the precedence of the operator in a particular
rule (see below).
The heart of the Java Cup specification is the set of grammar rules.
First, there is an optional declaration of the start nonterminal; e.g.:
Below are three example grammar rules, preceded by the appropriate
terminal and nonterminal declarations.
Note that IdTokenVal is a type that was defined in the
scanner specification;
VarDeclNode, TypeNode,
and IdNode are all subclasses of an ASTnode class,
all defined in some other file;
and IntNode and BoolNode are subclasses of
TypeNode (defined in that same file).
The symbols "{:" and ":}" are used to delimit the action associated
with the rule.
An action can contain arbitrary Java code (including declarations
and uses of local variables).
If the left-hand-side nonterminal has been declared to have a type,
the action must include an assignment to the special variable
RESULT;
this assignment sets the value of the nonterminal (its translation).
To use the translations of the right-hand-side nonterminals, and the
values of the right-hand-side tokens, those symbols are followed with
a colon and a name.
For example, using type:t makes t the name of
the translation of nonterminal type, and using ID:i
makes i the name of the value filed of the
Symbol returned by the scanner for the ID token.
Overview
+---------------+
Parser specification ---> | java_cup.Main | ---> Java source code
(xxx.cup) +---------------+ (parser.java and sym.java)
The input to Java Cup is a specification that includes:
The key part of the specification is the last part: the grammar rules
with associated actions.
Those actions are like the syntax-directed translations
rules that we have studied; i.e., they define how to translate
an input sequence of tokens into some value (e.g., an abstract-syntax
tree).
User Code
Terminal and Nonterminal Declarations
terminal name1, name2, ... ; /* terminals without values */
terminal type name1, name2, ... ; /* terminals with values */
non terminal type name1, name2, ... ; /* nonterminals */
Note that Java Cup has some reserved words (e.g., action, parser, import);
these cannot be used as terminal or nonterminal names.
Precedence Declarations
exp -> exp PLUS exp | exp MINUS exp | exp TIMES exp | exp EQUALS exp | ...
is ambiguous, and will cause conflicts:
the parser will not always know how to parse an input.
One way to fix the problem is to rewrite the grammar by adding new
nonterminals;
however, this can make the grammar less clear (and the parser less efficient).
Another option is to include precedence declarations that specify
the relative precedences of the operators, as well as their associativities.
precedence left PLUS, MINUS;
precedence left TIMES, DIVIDE;
precedence nonassoc EQUALS;
The order of precedence is low to high (i.e., in this example, PLUS and
MINUS are given the lowest precedence, then TIMES and DIVIDE, then EQUALS).
The left, right, and nonassoc declarations specify the
associativity of the operators.
Declaring an operator nonassoc means that
it is not legal to have two consecutive occurrences of such
operators with the same precedence (so for example, given the above
declarations, the expression: a == b == c would cause a syntax
error).
Grammar Rules
start with program;
If no such declaration is included, the left-hand-side nonterminal of the
first grammar rule is assumed to be the start nonterminal.
terminal SEMICOLON;
terminal INT;
terminal IdTokenVal ID;
non terminal VarDeclNode varDecl;
non terminal TypeNode type;
non terminal IdNode id;
varDecl ::= type:t id:i SEMICOLON
{: RESULT = new VarDeclNode(t, i);
:}
;
type ::= INT
{: RESULT = new IntNode();
:}
| BOOL
{: RESULT = new BoolNode();
:}
;
id ::= ID:i
{: RESULT = new IdNode(i.idVal);
:}
;
In these rules, lower-case names are used for nonterminals, and upper-case
names are used for terminals.
The symbol "::=" is used instead of an arrow to separate the left
and right-hand sides of the grammar rule.
Each grammar rule ends with a semicolon.