Java Cup

Overview
User Code
Terminal and Nonterminal Declarations
Precedence Declarations
Grammar Rules
How to Run Java Cup

Overview

There is a link to the Java Cup User's Manual under "Useful Programming Tools" on the class web page. Here is the same link.

Java Cup is a parser generator that produces a parser written in Java. Here's a picture illustrating how to create a parser using Java Cup:

                           +---------------+
Parser specification  ---> | java_cup.Main | ---> Java source code
   (xxx.cup)               +---------------+      (parser.java and sym.java)

The input to Java Cup is a specification that includes:

optional package and import declarations
optional user code
terminal and nonterminal declarations
optional precedence and associativity declarations
grammar rules with associated actions

The key part of the specification is the last part: the grammar rules with associated actions. Those actions are like the syntax-directed translations rules that we have studied; i.e., they define how to translate an input sequence of tokens into some value (e.g., an abstract-syntax tree).

The output of Java Cup includes a Java source file named parser.java, which defines a class named parser with a method named parse. Java Cup also produces a Java source file named sym.java, which contains a class named sym that declares one public final static int for each terminal declared in the Java Cup specification.

The parser class has a one-argument constructor; the argument is of type Yylex (i.e., a scanner). The parse method of the parser class uses the given scanner to translate the input (the input stream is an argument passed to the scanner's constructor) to a sequence of tokens. It parses the tokens according to the given grammar, and does a syntax-directed translation of the input using the actions associated with the grammar productions. If the input is not syntactically correct, the parser gives an error message and quits (i.e., it only finds the first syntax error); otherwise, it returns a Symbol whose value field contains the translation of the root nonterminal (as defined by the actions associated with the grammar rules).

User Code

See the Java Cup Reference Manual for a description of this part of the specification.

Terminal and Nonterminal Declarations

All terminal and nonterminal symbols that appear in the grammar must be declared. If you want to make use of the value associated with a terminal (the value field of the Symbol object returned by the scanner for that token) in your syntax-directed translation, then you must also declare the type of that value field. Similarly, you must declare the types of the translations associated with all of the nonterminals.

terminal          name1, name2, ... ;  /* terminals without values */
terminal     type name1, name2, ... ;  /* terminals with values */
non terminal type name1, name2, ... ;  /* nonterminals */

Note that Java Cup has some reserved words (e.g., action, parser, import); these cannot be used as terminal or nonterminal names.

Precedence Declarations

A grammar like:

exp -> exp PLUS exp  |  exp MINUS exp  |  exp TIMES exp  | exp EQUALS exp  |  ...

is ambiguous, and will cause conflicts: the parser will not always know how to parse an input. One way to fix the problem is to rewrite the grammar by adding new nonterminals; however, this can make the grammar less clear (and the parser less efficient). Another option is to include precedence declarations that specify the relative precedences of the operators, as well as their associativities.

For example:

precedence left PLUS, MINUS;
precedence left TIMES, DIVIDE;
precedence nonassoc EQUALS;

The order of precedence is low to high (i.e., in this example, PLUS and MINUS are given the lowest precedence, then TIMES and DIVIDE, then EQUALS). The left, right, and nonassoc declarations specify the associativity of the operators. Declaring an operator nonassoc means that it is not legal to have two consecutive occurrences of such operators with the same precedence (so for example, given the above declarations, the expression: a == b == c would cause a syntax error).

Sometimes the same operator is used as both a unary and a binary operator, and the two uses have different precedence levels (for example, binary minus usually has a low precedence, while unary minus has a high precedence). This case can be handled either by rewriting the grammar, or by declaring a "phony" terminal symbol (e.g., UMINUS), giving it the appropriate precedence, and using it in the grammar rules part of the specification to specify the precedence of the operator in a particular rule (see below).

Grammar Rules

The heart of the Java Cup specification is the set of grammar rules. First, there is an optional declaration of the start nonterminal; e.g.:

start with program;

If no such declaration is included, the left-hand-side nonterminal of the first grammar rule is assumed to be the start nonterminal.

Below are three example grammar rules, preceded by the appropriate terminal and nonterminal declarations. Note that IdTokenVal is a type that was defined in the scanner specification; VarDeclNode, TypeNode, and IdNode are all subclasses of an ASTnode class, all defined in some other file; and IntNode and BoolNode are subclasses of TypeNode (defined in that same file).

terminal                SEMICOLON;
terminal                INT;
terminal IdTokenVal     ID;

non terminal VarDeclNode      varDecl;
non terminal TypeNode	      type;
non terminal IdNode	      id;

varDecl	::= type:t id:i SEMICOLON
	    {: RESULT = new VarDeclNode(t, i);
	    :}
	    ;

type        ::= INT
            {: RESULT = new IntNode();
            :}
            | BOOL
	    {: RESULT = new BoolNode();
	    :}
            ;

id          ::= ID:i
            {: RESULT = new IdNode(i.idVal);
            :}
            ;

In these rules, lower-case names are used for nonterminals, and upper-case names are used for terminals. The symbol "::=" is used instead of an arrow to separate the left and right-hand sides of the grammar rule. Each grammar rule ends with a semicolon.

The symbols "{:" and ":}" are used to delimit the action associated with the rule. An action can contain arbitrary Java code (including declarations and uses of local variables). If the left-hand-side nonterminal has been declared to have a type, the action must include an assignment to the special variable RESULT; this assignment sets the value of the nonterminal (its translation).

To use the translations of the right-hand-side nonterminals, and the values of the right-hand-side tokens, those symbols are followed with a colon and a name. For example, using type:t makes t the name of the translation of nonterminal type, and using ID:i makes i the name of the value filed of the Symbol returned by the scanner for the ID token.

Precedence Declarations for Grammar Rules

As discussed above, sometimes an operator needs different precedences depending on whether it is being used as a unary or a binary operator. For example, the precedence declarations given above gave MINUS the lowest precedence. This is correct for binary minus, but not for unary minus (which should have the highest precedence). To handle this, a new terminal (e.g., UMINUS) can be declared, and given the highest precedence. Then the grammar rule that uses MINUS as a unary operator can be declared to have the (high) precedence of UMINUS:

exp	::= MINUS exp
	{: RESULT = ...
	:}
	%prec UMINUS
	;

How to Run Java Cup

To run the parser generator, type:

java java_cup.Main < xxx.cup

where xxx.cup is the name of the parser specification (it can have any name, but using the .cup extension helps to make it clear that it is a Java Cup specification). If the specification is processed without errors, two Java source files, parser.java and sym.java will be produced.