Project 1a: Text Sorting
You will write a simple sorting program. This program should be invoked as follows:
shell% ./fastsort [ -3] file
The above line means the users typed in the name of the sorting program
If the optional argument is included (
Let's say you have the following file:
this line is first but this line is second finally there is this line
If you run your fastsort and give it this file as input, it should print:
but this line is second finally there is this line this line is firstbecause
If, however, you pass in a flag to sort a different key, you'll get a
different output. For example, if you call
this line is first finally there is this line but this line is secondbecause
In your sorting program, you should just use
If you want to figure out how big in the input file is before reading it
in, use the
To compare strings, use the
To sort the data, use any sort that you'd like to use. An easy way to go is
to use the library routine
To chop lines into words, you could use
To exit, call
If you don't know how to use these functions, use the man pages. For
Assumptions and Errors
The return code upon success is zero. When the program runs normally and no errors are encountered, you should return an error code of 0.
Only space characters (i.e., what you get when you hit spacebar) will be used to separate words in the input. Thus, you don't have to worry about tabs or other whitespace. However, your program should correctly handle the case where there are two or more spaces between words, i.e., it should treat that as one big separator between the words.
Max line length will be 128. If you get a line longer than this (detected by the lack of a newline character in the last position), please print
You should check the arguments of fastsort carefully. If more than two arguments are passed, or two are passed but the second does not fit the format of a dash followed by a number, you should EXACTLY print (to standard error):
Key does not exist on one line of input file: If the specified key does not exist on a particular line of the input file, you should just use the last word of that line as the key. For example, if the user wants to sort on the 4th word (
Empty line: You should use an empty string to sort any empty lines (i.e., lines that are just a newline or spaces and a newline character).
File length: May be pretty long! However, no need to implement a fancy two-pass sort or anything like that; the data set will fit into memory and you shouldn't have to do anything special to handle this. However, if malloc() does fail, please print
Invalid files: If the user specifies an input file that you cannot open (for whatever reason), the sort should EXACTLY print (to standard error):
Important: On any
error code, you should print the error to the screen using
History and a Contest
This sorting assignment is reminiscent yearly competition to make the fastest disk-to-disk sort in the world. See the sort home page for details. If you look closely, you will see that your professor was once -- yes, wait for it -- the fastest sorter in the world.
To continue in this tradition, we will also be holding a sorting competition. Whoever turns in the fastest sorting program on a few different inputs will win a fancy 537 T-shirt. Read more about sorting, including perhaps the NOW-Sort paper , for some hints on how to make a sort run really fast. Or just use your common sense! Hint: you'll have to think a bit about hardware caches.
Restriction: No threads. You cannot implement a multi-threaded sort for this assignment or competition. Just make the fastest single-threaded sort that you can!
Start small, and get things working incrementally. For example, first get a program that simply reads in the input file, one line at a time, and prints out what it reads in. Then, slowly add features and test them as you go.
Testing is critical. One great programmer I once knew said you have to write 5-10 lines of test code for every line of code you produce; testing your code to make sure it works is crucial. Write tests to see if your code handles all the cases you think it should. Be as comprehensive as you can be. Of course, when grading your projects, we will be. Thus, it is better if you find your bugs first, before we do.
Keep old versions around. Keep copies of older versions of your program
around, as you may introduce bugs and not be able to easily undo them. A
simple way to do this is to keep copies around, by explicitly making copies of
the file at various points during development. For example, let's say you get
a simple version of
Keep your source code in a private directory. An easy way to do this is
to log into your account and first change directories into