Project 1: Warm-up Project

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu .

Due: Friday, 9/12, by 5pm.

Clarifications

9/11: End of Clarifications. Due to popular demand, there will be no more clarifications for this project.

9/10: Errors after FILE_TOO_LONG. If there are more lines after you reach the 1024th valid lines, you should print the FILE_TOO_LONG message on the next line, stop reading the file (i.e. ignore the rest of the file), and sort what you have read in so far.

9/9: BAD_KEY and LINE_TOO_LONG. If you encounter a line that has both of these errors, you should print both error messages.

9/8: Blank line. A blank line is invalid. For example, you should print BAD_KEY for the third line of this input:

10 abc
11 abcd

22 abcde

9/8: 1024 valid lines. The original spec says that the input has to be 1024 lines or fewer. The more accurate way to say that is 1024 valid lines. Thus, if you have one or more bad keys in your file, you should not treat them as valid lines. For example, a file that has 1026 lines (comprising of 1024 valid lines and 2 bad keys) should not generate the FILE_TOO_LONG error message.

9/6: Whitespace. Assume that there can be an arbitrary amount of whitespace before and after the integer key. Here, for simplicity, we assume whitespace is just made up of spaces (not tabs or other whitespace characters). The following are thus all legal:

10 abc
 12 abcd
  9   abcde 
This should produce the output:
  9   abcde 
10 abc
 12 abcd
Note that the whitespace is intact in the output.

9/6: 32 bit integer range. You may assume that the numbers are 32-bit integers, i.e. we will not test integers larger than 32 bits

9/5: Invalid file. If the user specifies exactly one file (as desired) but it can't be opened (for whatever reason), they should print:

Error: Cannot open file FILE
where FILE is what the user passed in.

9/4: Line length. Let's say the max size of an input line is 8 characters (and not 512). Which of these lines should be in the final output?

10 abc
11 abcd
22 abcde
The first seemingly has 6 characters (10, space, abc), the second has 7, and the third 8, and thus you might naively think all should be accepted. However, you are forgetting the newline character (\n) which is at the end of each input line. Thus, the first two should be accepted (as they have 7 and 8 characters including the newline). For the third line, you should only accept '22 abcd' and put a \n, and skip the rest of the line, and don't forget to print the LINE_TOO_LONG message.

9/4: Only a number. Let's say there is a line like this:

10
This line only has a number in it (and a newline character). It is a valid line, and should be included in the final output as such.

9/2: Picture required. You now have to also turn in a picture of yourself along with source code. Why? Because we want to learn your names!

Notes

Before beginning: Read this tutorial. It has some useful tips for programming in the C environment.

This project must be done alone. You can talk to your colleagues about it, but every line of code must be written and understood by you. Of course, you can always ask the TAs and professors for help too.

Overview

The first project is simply a warm-up to get you used to how this whole project thing will go. It also serves to get you into the mindset of a C programmer, something you will become quite familiar with over the next few months. Good luck!

You will write a simple sorting program. This program should be invoked as follows:

shell% ./mysort test.txt

The above line means the users typed in the name of the sorting program ./mysort and gave it a single file test.txt as input.

Proper input files should look like this:

10 should be first
20 should be third
15 should be second

The goal of the sorting program is to read in the data from the specified input file, sort it based on the key value in the first column, and output the sorted result (keys and text) to the screen (standard output, or stdout as it is called). Thus, for the aforementioned example, the output to the screen should be:

10 should be first
15 should be second
20 should be third

You should observe how the output file is sorted on the first column of integers, and the accompanying text is kept with the integer key that it was read in with. Sounds easy, right? It should. But there are a few details...

Details

Assumptions

String length: You may assume no line in the input file is longer than 512 bytes. If you encounter a line that is too long, you should print error message LINE_TOO_LONG (as detailed below) and skip the rest of this line.
File length: You may assume the input file has 1024 lines of input or fewer. If you encounter more lines, print the FILE_TOO_LONG error message (as detailed below) and sort and output whatever input you currently have read in.
Bad key: You may encounter a key that is not strictly an integer. If so, you should print the BAD_KEY error message (as detailed below) and simply skip this line, but keep processing the rest of the file.
Keys per line: You can assume that each line of the input file will end with a newline character (“\n”). Thus, two integers that need to be sorted will never be on the same input line.

Error Messages

All error messages encountered while reading the input file should be of this format:

Error in line XXX: specific error message
where the specific error messages are:
FILE_TOO_LONG: You should print the following message: File too long
LINE_TOO_LONG: You should print the following message: Line too long
BAD_KEY: You should print the following message: Bad key . Valid keys are any integer (including negatives).

There are some other possible errors too. For example, if the user doesn't properly specify the input file on the command line (by say, not giving one, or by giving too many files), you should print:

usage: sort <file>
and then exit.

Important: On any error code, you should print the error to the screen using fprintf() , and send the error message to stderr (standard error) and not stdout (standard output). This is accomplished in your C code as follows:

fprintf(stderr, “usage: ... ”);

Useful Routines

For reading in the input file, the following routines will make your life easy: fopen(), fgets(), fclose().

For parsing each line, you may find strtok() useful, as well as isdigit() for checking whether a particular character is a digit and atoi() or strtol() for converting a string into an integer.

The routine strlen() is good to know for getting the length of a string.

Finally, qsort() is great to use for this assignment. No need to write your own sorting code!

If you don't know how to use these functions, use the man pages. For example, typing man qsort at the command line will give you a lot of information on how to use the library sorting routine.

Other Tips

Start small, and get things working incrementally. For example, first get a program that simply reads in the input file, one line at a time, and prints out what it reads in. Then, slowly add features and test them as you go.

Testing is critical. One great programmer I once knew said you have to write 5-10 lines of test code for every line of code you produce; testing your code to make sure it works is crucial. Write tests to see if your code handles all the cases you think it should. Be as comprehensive as you can be. Of course, when grading your projects, we will be. Thus, it is better if you find your bugs first, before we do.

Keep old versions around. Keep copies of older versions of your program around, as you may introduce bugs and not be able to easily undo them. A simple way to do this is to keep copies around, by explicitly making copies of the file at various points during development. For example, let's say you get a simple version of mysort.c working (say, that just reads in the file); type cp mysort.c mysort.v1.c to make a copy into the file mysort.v1.c . More sophisticated developers use version control systems like CVS , but we'll not get into that here (yet).

Keep your source code in a private directory. An easy way to do this is to log into your account and first change directories into private/ and then make a directory therein (say p1 , by typing mkdir p1 after you've typed cd private/ to change into the private directory). However, you can always check who can read the contents of your AFS directory by using the fs command. For example, by typing in fs listacl . you will see who can access files in your current directory. If you see that system:anyuser can read (r) files, your directory contents are readable by anybody. To fix this, you would type fs setacl . system:anyuser “” in the directory you wish to make private. The dot “.” referred to in both of these examples is just shorthand for the current working directory.

Handing It In

You should turn in THREE files. The first, containing your code, should be called mysort.c . We will compile it in the following way:

shell% gcc -Wall -std=c99 -o mysort mysort.c
so make sure it compiles in such a manner.

You should also include a file called README which includes any notes on your program that you think are important.

Finally, you should include a picture of yourself, titled FIRSTNAME.LASTNAME.jpg (or .gif or whatever). This should be a clear picture of yourself such that we can identify you.

You should copy these two files into your handin directory. If you are in section 1 of CS-537, these will be located in ~cs537-1/handin/login/p1 where login is your login. For example, Remzi's login is remzi , and thus he would copy his beautiful code, README, and picture into ~cs537-1/handin/remzi/p1 . If in section 2, copy your stuff into ~cs537-2/handin/login/p1 , of course. Copying of these files is accomplished with the cp program, as follows:

shell% cp mysort.c ~cs537-1/handin/remzi/p1/
shell% cp README ~cs537-1/handin/remzi/p1/
shell% cp remzi.arpaci.jpg ~cs537-1/handin/remzi/p1/
or more succinctly:
shell% cp mysort.c README remzi.arpaci.jpg ~cs537-1/handin/remzi/p1/
(the copy utility knows that if the last thing specified is a directory, it should just copy the files into that directory and give them the same names. Read the man page for cp for more details.)