Scanner Generator Example


Language description

Consider a language consisting of:

where expr is of the form:

and ID are identifiers following C/C++ rules (can contain
only letters, digits, and underscores; can't start with a digit)


Files

Source files:

To avoid dependency issues, download jlexEx.zip which includes the files listed above as well as everything necessary to run JLex on a CS Department Linux machine.


Modifications and updates

The provided example.jlex includes the framework to allow the following additions to our language:

State declaration in the directives section

The SPECIALINITSTATE state is used to identify when we are reading in an integer literal specified in octal or hexadecimal format. In both cases, the integer literal starts with the number 0 (zero): octal literals look like regular integer literals but start with a 0 (zero), e.g. 0123; hexadecimal literals start with "0x" followed by hexadecimal digits

Additions to regular expression rules

Rules have been added for INTLIT, OCTALLIT, and HEXLIT tokens (that use state information)

Note that these rules for identifying integer literals (INTLIT, OCTALLIT, and HEXLIT) are not quite correct: as written, 0989 is recognized as an octal literal (when it shouldn't be) and 0x12a8b is not recognized as a hexadecimal literal (when it should be). Moreover, 0 (zero) is not recognized as a legal integer literal. It is left as an exercise to fix the rules so that all integer literals are appropriately recognized.

Note also that another issue that will need to (ultimately) be addressed by a scanner is obtaining the integer value from the sequence of characters the scanner has recognized as an integer literal.


Using JLex

To compile and run on a Linux machine (after downloading and unzipping jlexEx.zip):

make
make test