For this assignment you will use JLex to write a scanner for a language called C--, a small subset of the C++ language. Features of C-- that are relevant to this assignment are described below. You will also write a main program and an input file to test your scanner. You will be graded both on the correctness of your scanner and on how thoroughly your input file tests the scanner.
Skeleton files on which you should build are in: ~cs536-1/public/prog2
The files are:
Use the on-line JLex reference manual, and/or the on-line JLex notes for information about writing a JLex specification.
To run JLex you'll need to modify your CLASSPATH environment variable. Taking into account the change needed to use jikes, your .cshrc.local file should include the line:
int bool void true false if
else while return cin cout
""
"&!#"
"use \n to denote a newline character"
"include a quote like this \" and a backslash like this \\"
Examples of things that are not legal string literals:
"unterminated
"also unterminated \"
"backslash followed by space: \ is not allowed"
"bad escaped character: \a AND not terminated
{ } ( ) [ ] , = ;
+ - * / ! && || == !=
< > <= >= << >>
// this is a comment
# and so is this
The scanner should recognize and ignore comments (but there is no
COMMENT token).
Your scanner will return this information by creating a new Symbol object in the action associated with each regular expression that defines a token (the Symbol type is defined in java_cup.runtime). A Symbol includes a field of type int for the token name, and a field of type Object (named value), which will be used for the line and character numbers, as well as the token value (for identifiers and literals). See c.jlex for examples of how to call the Symbol constructor.
In your compiler, the value field of a Symbol will actually be of type TokenVal; that type is defined in c.jlex. Every TokenVal includes a linenum field, and a charnum field. Subtypes of TokenVal with more fields will be used for the values associated with identifier, integer literal, and string literal tokens. One subtype, IntLitTokenVal, is defined in c.jlex. You will need to add definitions for the subtypes to be used for identifier and string literal tokens.
Line counting is done by the scanner generated by JLex (the variable yyline holds the current line number, counting from 0), but you will have to include code to keep track of the current character number on that line. The code in c.jlex does this for the tokens that it defines, and you should be able to figure out how to do the same thing for the new tokens that you add.
The JLex scanner also provides a method yytext that returns the actual text that matches a regular expression. You will find it useful to use this method in the actions you write in your JLex specification.
Note that, for the integer literal token, you will need to convert a String (the value scanned) to an int (the value to be returned). You should use code like the following:
double d = (new Double(yytext())).doubleValue(); // convert String to double // INSERT CODE HERE TO CHECK FOR BAD VALUE -- SEE ERRORS AND WARNINGS BELOW int k = (new Integer(yytext())).intValue(); // convert to int
Use the fatal and warn methods of the Errors class
to print error and warning messages.
Be sure to use exactly the wording given above for each message
so that the output of your scanner will match the output that we expect
when we test your code.
The Main Program
In addition to specifying a scanner, you should extend the main
program in P2.java.
The main program expects one command-line argument: the name of the file
to be scanned.
That file is opened for reading;
then the program loops, calling the scanner's next_token method
until the special end-of-file token is returned.
The tokens returned by the scanner should be printed (to System.out),
one per line, preceded by the token's line and character numbers:
<linenum>:<charnum> <token>where <token> means the token name as defined in sym.java. For ID, STRINGLITERAL, and INTLITERAL tokens, the main program should also print the value returned (on the same line as the token, enclosed in parentheses):
<linenum>:<charnum> <token> (<value>)
You are to write one test file named test.C.
Use comments in the file to explain what aspects of the scanner are
being tested.
To test your scanner on a file that has text on the last line but no
final newline, you can either use emacs to create the file (first make
sure that your .emacs file does not include
(setq require-final-newline t), and do not give the file
any extension),
or you can create the file by writing a Java program that uses
System.out.print, and redirecting the output to a file. An
example file with no final newline is in:
~cs536-1/public/prog2/eof.txt.
You can tell that there is no newline by typing: cat eof.txt
You should see your command-line prompt at the end of the
last line of the output instead of at the beginning of the following
line.
Working in Pairs
Graduate and special students must work alone on this assignment. Undergraduates may work alone or in pairs. Please send mail to hasti@cs.wisc.edu saying whether you want to work with a partner. If you want to work with a particular person, send that person's name (note: both partners must request each other); otherwise, mention that you need to be matched with a partner.
Below is some advice on how to work in pairs.
This assignment involves three main tasks:
I suggest that you proceed as follows:
It is very important to set deadlines and to stick to them. I suggest that you choose one person to be the "project leader" (plan to switch off on future assignments). The project leader should propose a division of tokens, as well as deadlines for completing phases of the program, and should be responsible for keeping the most recent version of the combined code (be sure to keep back-up versions, too, perhaps in another directory or using RCS).
To share your code, you can either use e-mail, or the project leader can create a directory for the combined code (not the directory in which that person develops the code). I suggest that you create a new top-level directory (i.e., at the same level as your public and private directories), named something like cs536-P2. To set the permissions of the directory for the combined code to allow your partner to write into it, change to that directory and type:
fs setacl . <login> write
fs setacl . system:anyuser none
fs listacl
| 9/30/2004 | Program released. |
See the assignments page for information about how to submit your code. The late policy is also found on the assignments page.
Electronically submit all of the files that are needed to create and run your main program as well as your Makefile and your test.C. Do not copy any ".class" files, and do not create any subdirectories in your handin directory.
If you are working with a partner only one of you should hand in files. Include a comment at the top of P2.java with the names of both partners.
General information on program grading criteria can be found on the Grading Criteria for Programs page.