This is a two-part introductory project to familiarize yourself with programming in C in a Unix environment (Linux), and also with some basic xv6 kernel programming.
For the first part you'll be writing a simple sorting program; for the second you'll be modifying the xv6 kernel to add a new system call of your own.
Both parts of this project are to be done individually. While discussion with classmates of things like library functions and such is fine, sharing of code is not allowed. A good page on academic (mis)conduct and how to maintain appropriate permissions on your code can be found here (originally by Remzi for a previous CS537; lightly edited and repurposed for this semester). Please read it no matter how pure at heart you are.
For this sub-project you are to write a simple program that roughly resembles the standard sort
program.
Your program will be called mysort
(since the name sort
is already taken!), and will read lines of text from either the standard input (stdin) or a named file, sort them, and then print them to standard output (stdout). It will also take two optional command-line flags to modify its behavior. If the -r
flag is given, sorting should be done in reverse order. The -n
flag, which itself takes an additional numeric argument, causes only the first N lines of output (after sorting) to be printed (where N is the argument to the -n
flag). If, after the optional flags, one more argument is given, it is the name of a file from which your input lines should be read. If no additional argument is given after the (optional) flags, input lines should be read from stdin.
Your mysort
program should not output anything but reordered input lines (i.e. no "Enter input lines here" user prompt if reading from stdin or anything else).
You are not expected to write the core sorting algorithm yourself (though if you enjoy that kind of thing you certainly can -- but if you do please make sure you use an O(n log n) [or better?] algorithm). You are instead encouraged to look into the standard library routines qsort()
and strcmp()
to do the bulk of that work for you.
mysort
should work:Read lines from stdin, write sorted lines to stdout:
# mysort
Same as the above, but in reverse order:
# mysort -r
Read lines from the file my_input_file.txt
and write the first five lines (after sorting) to stdout:
# mysort -n 5 my_input_file.txt
mysort
should follow standard Unix command-line conventions, so it should be able to interpret option flags even when they're all squashed together. The following should thus read and reverse-sort lines from my_input_file.txt
and then output the first three:
# mysort -rn3 my_input_file.txt
Flags may appear in any order, but if an -n
flag appears it consumes the entire remainder of that element of argv
as its argument (or the entire next element if it is the final character of one). This would thus be invalid:
# mysort -n3r
(because "3r"
is not a valid argument to -n
).
All this flexible command-line argument processing might sound like a lot of tedious work -- and it is! Or it would be, if you had to do it all manually. Fortunately for you, there is a standard Unix library function called getopt()
that does the hard work for you. You can read about how to use it by running man 3 getopt
.
The man page includes a small example program demonstrating how to use it, and you are strongly encouraged to look at it. If you follow a similar structure getopt()
will take care of all the concerns with flag ordering and such necessary to achieve the behavior described above. Note how the optarg
variable is used to access the argument to a command-line flag and the optind
variable is used to check for further arguments after the optional flags.
Carefully checking your inputs and handling errors appropriately is a very important part of systems programming! Here are some things you should make sure to check:
What if something other than a non-negative integer is passed as the argument to the -n
flag? (Note that zero is a valid value for this argument, if a somewhat strange one.)
What if you receive more than one input filename argument?
What if the input file doesn't exist, or otherwise can't be opened?
If any of these conditions arise (though note that there are probably others!), you should print an error message explaining what's wrong to stderr (not stdout!), and then exit. Which brings us to...
The exit status of a program (i.e. the value you pass to the exit()
function or return from your main()
function) is an important part of a well-behaved Unix program: it lets other programs know whether or not it succeeded. The standard convention for indicating success is an exit status of zero; any other value indicates that an error occurred. Your mysort
program should thus exit with a non-zero status (1 would be a good choice, say) if something goes wrong, and zero if all the inputs look OK and everything proceeds as desired.
No input line will be longer than 1024 characters (including the line feed character at the end).
Input data will be plain text only -- in particular, there will be no NUL ('\0') bytes.
The amount of data to sort will fit easily in memory (no external sorting required).
If you want nice, informative, user-friendly error messages (which you should!), look into errno
and strerror()
.
gdb
and man
are your friends.
valgrind
is also a very useful tool for tracking down memory bugs (such as leaks and buffer overruns) in your code. For a particularly powerful debugging tool, you can combine it with gdb
, using valgrind
to detect and stop on memory errors, and then attaching gdb
to investigate what went wrong -- see here for details on how to do that.
Compile with the -g
flag so that gcc
includes debugging information in your executables. This will make tools like gdb
and valgrind
much more useful.
Compile with warnings enabled (e.g. the -Wall
flag as used in the provided Makefile), and pay attention to them! (i.e. fix your code so the warnings go away.)
Make sure you test that your code handles edge cases and error conditions appropriately.
Once you've made some progress, save a copy so that you can return to that state later if things start to go downhill. You can do this the simple way using the cp
command (e.g. cp mysort.c mysort_STEP1.c
), or if you know how you can use a full-blown revision control system like git or mercurial.
You are responsible for your own testing (you will not be provided with a full test suite), but a good starting point might involve the shuf
command and the file /usr/share/dict/words
.
It is not necessary to use bare file descriptors and the Unix read()
and write()
calls to do your input and output. The C library's FILE *
type and the corresponding functions (e.g. fgets()
, etc.) are fine for this project.
Be aware that strcmp()
performs a case-sensitive string comparison, so if you use it for sorting you will end up with capitalized characters sorted before lower case ones (i.e. 'Z' comes before 'a'). This is both accepted and expected (i.e. you should not perform a case-insensitive sort!).
While emulating the basic form of the example program in the getopt()
man page will do almost all of what you need for parsing command-line flags, you should not emulate its use of atoi()
, since that function cannot detect invalid inputs, and mysort
must not accept anything but a non-negative integer argument to -n
. strtol()
is an alternate string-to-integer conversion function that allows you to detect invalid inputs; use it instead.
The OSTEP lab tutorial has a collection of useful information on using tools like gcc
, gdb
, and make
; reading (or at least skimming) through it is recommended, particularly if you don't already have a fair amount of experience with C.
A skeleton project template can be downloaded here; after downloading it you can unpack it by running tar xzf p1a.tar.gz
. You may modify the provided Makefile if you wish, though it should not be necessary.
While you must do more testing of your own, you may use this shell script to run some basic initial testing on your mysort
program. Put it in the directory with your mysort
program and run bash test-mysort.sh
-- if one or more tests fail, look in the script to see what the inputs that triggered the failure are.
For this sub-project you get to start getting your hands dirty working on a (sort of) real OS kernel! xv6 is a simple Unix-like kernel developed for educational purposes at MIT. This project is just for you to get comfortable working with it; we'll be making more intensive use of it for the remaining projects this semester.
In light of that, your task is a relatively small one: you are to modify the xv6 kernel and add a new system call of your own. This new syscall will be called getforkcount()
, and its purpose is very simple: it returns the number of (successful!) fork()
calls that have been made since the system booted. In C, its declaration would look like this:
int getforkcount(void);
Implementing this system call should be a relatively straightforward affair: look at how existing system calls are implemented (getpid()
would a good simple one to start with), and create corresponding declarations and so forth to implement getforkcount()
. You will also need to make a small modification to the fork
system call to increment (in the right place!) a counter variable that you will have to add; getforkcount()
can then simply read this variable and return it.
You should also write a simple xv6 userspace program (also called getforkcount
) to demonstrate your new system call. It should simply perform the syscall and then print the result to standard output. If you run it multiple times, you should be able to see it print successively greater numbers, as each execution of it implies a fork()
call. This program should output nothing but the number returned by your syscall followed by a linefeed ('\n') character (i.e. no extra text, labels, punctuation, etc).
This part of the project (the syscall and the user program) shouldn't take much more than about 20 lines of code in total; most of the effort here is in reading xv6 code and getting a handle on how things are organized.
Poking around and exploring other corners of xv6 is encouraged! You'll be doing plenty more work with it, so a little extra familiarity with it early can't hurt. If you're looking for more reading material, a book of sorts on xv6 (written by its authors) can be found here (though note that there may be some small differences between the version it describes and the version we use; it has undergone some updates over the years).
The xv6 source code can be downloaded here; after downloading it you can unpack the archive by running tar xzf xv6.tar.gz
.
Once you've got your mysort
program and modified xv6 ready to go, hand in your code by copying the relevant files to your project 1 handin directory, located at ~cs537-2/handin/$USER/p1
(where $USER
is your CS username). Your p1
directory has two subdirectories, linux
for your mysort
code, and xv6
for your xv6 code. Please hand in only source files, not binaries (e.g. compiled .o
files and executables), so do a make clean
before handing in your code.
To hand in your mysort
code, cd
into the directory where your .c
file(s) and Makefile
are and run:
$ make clean
$ cp -r . ~cs537-2/handin/$USER/p1/linux
To hand in your xv6 code, cd
into your xv6 directory (the top level, not the user/
or kernel/
subdirectories!) and run:
$ make clean
$ cp -r . ~cs537-2/handin/$USER/p1/xv6
If you have been working on your project in another environment (e.g. on your own computer instead of the lab machines), make sure your code compiles and runs cleanly on the lab machines before handing it in! (i.e. make sure a plain make
will successfully compile your code.) We will be testing and grading your projects on those systems, so if it doesn't work there you will lose marks.
Finally, write a small README
file and copy it into your p1
handin directory (not the linux
or xv6
subdirectories). This should be a short plain text file (not a word .doc
, not rich text, etc. just simple plain text) just describing anything we should know about your project, such as things you struggled with or didn't quite finish (or maybe bonus features you implemented?).
$ cp README ~cs537-2/handin/$USER/p1
Your README
doesn't have to be long; a few sentences (or less?) is fine.