Project 1: Warm-up Project

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu .

Due: Wednesday, 9/16, by 9pm.

Notes

Before beginning: Read this tutorial. It has some useful tips for programming in the C environment.

This project must be done alone. You can talk to your colleagues about it, but every line of code must be written and understood by you. Of course, you can always ask the TAs and professors for help too.

Overview

The first project is simply a warm-up to get you used to how this whole project thing will go. It also serves to get you into the mindset of a C programmer, something you will become quite familiar with over the next few months. Good luck!

You will write a simple sorting program. This program should be invoked as follows:

shell% ./mysort test.txt

The above line means the users typed in the name of the sorting program ./mysort and gave it a singe input: the input file to sort test.txt .

Proper input files should look like this:

10 hello
20 goodbye
17 working on the project?
15 are you

The goal of the sorting program is to read in the data from the specified input file, sort it based on the key value in the first column, and output the sorted result (keys and text) to the screen (standard output, or stdout as it is called). Thus, for the aforementioned example, the output to the screen should be:

10 hello
15 are you
17 working on the project?
20 goodbye

You should observe how the output file is sorted on the first column of numbers (10, 15, 17, 20), and the accompanying text is kept with the keyword that it was read in with. Sounds easy, right? It should. But there are a few details...

Details

Assumptions and Errors

String length: You may assume no line in the input file is longer than 80 bytes. If you encounter a line that is too long, you should print error message Error: Line too long and exit.
32-bit integer range. You may assume that the numbers are 32-bit integers, i.e. we will not test integers larger than 32 bits.
File length: You may not assume anything about the length of the file, i.e., it may be VERY long.
Invalid files: If the user specifies exactly one file (as desired) but it can't be opened (for whatever reason), the sort should print: Error: Cannot open file and then exit.
Too few or many arguments passed to program: If the user runs mysort without any arguments, or passes more than one file name to mysort, you should print Usage: mysort <filename> and exit.
Only a number: Let's say there is a line that only has a number on it and nothing else. It is a valid line.
Lines with funny keys or empty lines or ... Empty lines, or lines with a funny key (e.g., a string, not an integer in the first column) should all be included in the final output as if they have a key value of 0. Thus, if you have an input file with a valid line with -1 as the key, an empty line, and then a valid line with 1 as the key, you should print out the -1 line, the empty line, and then the 1 line.
Negative numbers as keys. Should work.
Lines with leading space. Should work. The key should be preserved and not transformed into a zero. As always, the exact line should be preserved in the final output (with leading spaces and everything).

Line length: You should accept lines that fit into 80 characters, including end-of-line (\n) but not including the end-of-string delimiter (\0). For example, if the maximum line size was 6 (not 80), and you had this input file:
1
22
333
4444
55555
666666
You should accept 1 ... 55555 but not 666666. Seeing 666666 should cause your program to print an error and exit.
Valid key: A key of the form 123abc is not valid. Thus, after parsing the line with strtok() you may have to write a routine that checks that each character of the key is valid with a routine like isdigit() . All invalid keys should be treated as if they were zero. Remember, though, that you also can have negative numbers.

Important: On any error code, you should print the error to the screen using fprintf() , and send the error message to stderr (standard error) and not stdout (standard output). This is accomplished in your C code as follows:

fprintf(stderr, “whatever the error message is\n”);

Useful Routines

To exit, call exit(1) . The number you pass to exit(), in this case 1, is then available to the user to see if the program returned an error (i.e., return a non-zero) or exited cleanly (i.e., returned 0).

For reading in the input file, the following routines will make your life easy: fopen(), fgets(), fclose().

For parsing each line, you may find strtok() useful. You may also find atoi() or strtol() for converting a string into an integer.

The routine strlen() is good to know for getting the length of a string.

The routine malloc() is useful for memory allocation. Perhaps for adding elements to a list?

Finally, qsort() is great to use for this assignment. No need to write your own sorting code!

If you don't know how to use these functions, use the man pages. For example, typing man qsort at the command line will give you a lot of information on how to use the library sorting routine.

Other Tips

Start small, and get things working incrementally. For example, first get a program that simply reads in the input file, one line at a time, and prints out what it reads in. Then, slowly add features and test them as you go.

For example, the way I wrote this code was first to write some code that used fopen() , fgets() , and fclose() to read the intput file and print it to the string. Then, I wrote code to store each input line into a linked list and made sure that worked. Then I dumped the contents of the list into a contiguous buffer so that qsort() could be used upon it. Then I used qsort().

Testing is critical. One great programmer I once knew said you have to write 5-10 lines of test code for every line of code you produce; testing your code to make sure it works is crucial. Write tests to see if your code handles all the cases you think it should. Be as comprehensive as you can be. Of course, when grading your projects, we will be. Thus, it is better if you find your bugs first, before we do.

Keep old versions around. Keep copies of older versions of your program around, as you may introduce bugs and not be able to easily undo them. A simple way to do this is to keep copies around, by explicitly making copies of the file at various points during development. For example, let's say you get a simple version of mysort.c working (say, that just reads in the file); type cp mysort.c mysort.v1.c to make a copy into the file mysort.v1.c . More sophisticated developers use version control systems like CVS , but we'll not get into that here (yet).

Keep your source code in a private directory. An easy way to do this is to log into your account and first change directories into private/ and then make a directory therein (say p1 , by typing mkdir p1 after you've typed cd private/ to change into the private directory). However, you can always check who can read the contents of your AFS directory by using the fs command. For example, by typing in fs listacl . you will see who can access files in your current directory. If you see that system:anyuser can read (r) files, your directory contents are readable by anybody. To fix this, you would type fs setacl . system:anyuser “” in the directory you wish to make private. The dot “.” referred to in both of these examples is just shorthand for the current working directory.

Handing It In

You should turn in TWO files. The first, containing your code, should be called mysort.c . We will compile it in the following way:

shell% gcc -Wall -o mysort mysort.c -O
so make sure it compiles in such a manner.

You should also include a file called README which includes any notes on your program that you think are important.

You should copy these two files into your handin directory. These will be located in ~cs537-1/handin/login/p1 where login is your login. For example, Remzi's login is remzi , and thus he would copy his beautiful code and READMEinto ~cs537-1/handin/remzi/p1 . Copying of these files is accomplished with the cp program, as follows:

shell% cp mysort.c ~cs537-1/handin/remzi/p1/
shell% cp README ~cs537-1/handin/remzi/p1/
or more succinctly:
shell% cp mysort.c README ~cs537-1/handin/remzi/p1/
(the copy utility knows that if the last thing specified is a directory, it should just copy the files into that directory and give them the same names. Read the man page for cp for more details.)