Contents

Overview

JLex is a scanner generator that produces Java code. Here's a picture illustrating how to create and run a program using JLex:

The input to JLex is a specification that includes a set of regular expressions and associated actions. The output of JLex is a Java source file that defines a class named Yylex. Yylex includes a constructor that is called with one argument: the input stream (an InputStream or a Reader). It also includes a method called next_token, which returns the next token in the input.

The picture above assumes that a class named P2 has been defined that contains the core program of interest. That program will declare an object of type Yylex, and will include calls to the Yylex constructor and its next_token method.

Format of a JLex Specification

A JLex specification has three parts, separated by double percent signs:

  1. User code: this part of the specification will not be discussed here.
  2. JLex directives: This includes macro definitions (described below). See the JLex Reference Manual for more information about this part of the specification.
  3. Regular expression rules: These rules specify how to divide up the input into tokens. Each rule includes an optional state list, a regular expression, and an associated action.
We will discuss the regular expression rules part first.

Regular Expressions Rules

The state-list part of a rule is discussed below. Ignoring state-lists for now, the form of a regular expression rule is:

When the scanner's next_token method is called, it repeats:

  1. Find the longest sequence of characters in the input (starting with the current character) that matches a regular-expression pattern.
  2. Perform the associated action.

until an action causes the next_token method to return. If there are several patterns that match the same (longest) sequence of characters, then the first such pattern is considered to be matched (so the order of the regular-expression rules can be important).

If an input character is not matched in any pattern, the scanner throws an exception. It is not good to have a scanner that can "crash" on bad input, so it is important to make sure that there can be no such unmatched characters!

The regular expressions are similar to the ones discussed in the scanner notes. Here's how they are used to match the input: