lightswitch

In which toes are dipped into the waters of C and xv6.

This is a two-part introductory project to familiarize yourself with programming in C in a Unix environment (Linux), and also with some basic xv6 kernel programming.

For the first part you'll be writing a simple sorting program; for the second you'll be modifying the xv6 kernel to add a new system call of your own.

Both parts of this project are to be done individually. While discussion with classmates of things like library functions and such is fine, sharing of code is not allowed. A good page on academic (mis)conduct and how to maintain appropriate permissions on your code can be found here (originally by Remzi for a previous CS537; lightly edited and repurposed for this semester). Please read it no matter how pure at heart you are.

Due: Tuesday, Jan 31 at 11:59 PM. (Late policy)

Hand-in instructions are at the bottom of this page.

Part 1: Sorting

For this sub-project you are to write a simple program that roughly resembles the standard sort program.

Your program will be called mysort (since the name sort is already taken!), and will read lines of text from either the standard input (stdin) or a named file, sort them, and then print them to standard output (stdout). It will also take two optional command-line flags to modify its behavior. If the -r flag is given, sorting should be done in reverse order. The -n flag, which itself takes an additional numeric argument, causes only the first N lines of output (after sorting) to be printed (where N is the argument to the -n flag). If, after the optional flags, one more argument is given, it is the name of a file from which your input lines should be read. If no additional argument is given after the (optional) flags, input lines should be read from stdin.

Your mysort program should not output anything but reordered input lines (i.e. no "Enter input lines here" user prompt if reading from stdin or anything else).

You are not expected to write the core sorting algorithm yourself (though if you enjoy that kind of thing you certainly can -- but if you do please make sure you use an O(n log n) [or better?] algorithm). You are instead encouraged to look into the standard library routines qsort() and strcmp() to do the bulk of that work for you.

Some examples of how mysort should work:

Read lines from stdin, write sorted lines to stdout:

# mysort

Same as the above, but in reverse order:

# mysort -r

Read lines from the file my_input_file.txt and write the first five lines (after sorting) to stdout:

# mysort -n 5 my_input_file.txt

mysort should follow standard Unix command-line conventions, so it should be able to interpret option flags even when they're all squashed together. The following should thus read and reverse-sort lines from my_input_file.txt and then output the first three:

# mysort -rn3 my_input_file.txt

Flags may appear in any order, but if an -n flag appears it consumes the entire remainder of that element of argv as its argument (or the entire next element if it is the final character of one). This would thus be invalid:

# mysort -n3r

(because "3r" is not a valid argument to -n).

All this flexible command-line argument processing might sound like a lot of tedious work -- and it is! Or it would be, if you had to do it all manually. Fortunately for you, there is a standard Unix library function called getopt() that does the hard work for you. You can read about how to use it by running man 3 getopt.

The man page includes a small example program demonstrating how to use it, and you are strongly encouraged to look at it. If you follow a similar structure getopt() will take care of all the concerns with flag ordering and such necessary to achieve the behavior described above. Note how the optarg variable is used to access the argument to a command-line flag and the optind variable is used to check for further arguments after the optional flags.

Error checking

Carefully checking your inputs and handling errors appropriately is a very important part of systems programming! Here are some things you should make sure to check:

If any of these conditions arise (though note that there are probably others!), you should print an error message explaining what's wrong to stderr (not stdout!), and then exit. Which brings us to...

Exit statuses

The exit status of a program (i.e. the value you pass to the exit() function or return from your main() function) is an important part of a well-behaved Unix program: it lets other programs know whether or not it succeeded. The standard convention for indicating success is an exit status of zero; any other value indicates that an error occurred. Your mysort program should thus exit with a non-zero status (1 would be a good choice, say) if something goes wrong, and zero if all the inputs look OK and everything proceeds as desired.

Things you may assume

Hints, tips

The OSTEP lab tutorial has a collection of useful information on using tools like gcc, gdb, and make; reading (or at least skimming) through it is recommended, particularly if you don't already have a fair amount of experience with C.

A skeleton project template can be downloaded here; after downloading it you can unpack it by running tar xzf p1a.tar.gz. You may modify the provided Makefile if you wish, though it should not be necessary.

While you must do more testing of your own, you may use this shell script to run some basic initial testing on your mysort program. Put it in the directory with your mysort program and run bash test-mysort.sh -- if one or more tests fail, look in the script to see what the inputs that triggered the failure are.

Part 2: xv6 Syscall

For this sub-project you get to start getting your hands dirty working on a (sort of) real OS kernel! xv6 is a simple Unix-like kernel developed for educational purposes at MIT. This project is just for you to get comfortable working with it; we'll be making more intensive use of it for the remaining projects this semester.

In light of that, your task is a relatively small one: you are to modify the xv6 kernel and add a new system call of your own. This new syscall will be called getforkcount(), and its purpose is very simple: it returns the number of (successful!) fork() calls that have been made since the system booted. In C, its declaration would look like this:

int getforkcount(void);

Implementing this system call should be a relatively straightforward affair: look at how existing system calls are implemented (getpid() would a good simple one to start with), and create corresponding declarations and so forth to implement getforkcount(). You will also need to make a small modification to the fork system call to increment (in the right place!) a counter variable that you will have to add; getforkcount() can then simply read this variable and return it.

You should also write a simple xv6 userspace program (also called getforkcount) to demonstrate your new system call. It should simply perform the syscall and then print the result to standard output. If you run it multiple times, you should be able to see it print successively greater numbers, as each execution of it implies a fork() call. This program should output nothing but the number returned by your syscall followed by a linefeed ('\n') character (i.e. no extra text, labels, punctuation, etc).

This part of the project (the syscall and the user program) shouldn't take much more than about 20 lines of code in total; most of the effort here is in reading xv6 code and getting a handle on how things are organized.

Poking around and exploring other corners of xv6 is encouraged! You'll be doing plenty more work with it, so a little extra familiarity with it early can't hurt. If you're looking for more reading material, a book of sorts on xv6 (written by its authors) can be found here (though note that there may be some small differences between the version it describes and the version we use; it has undergone some updates over the years).

The xv6 source code can be downloaded here; after downloading it you can unpack the archive by running tar xzf xv6.tar.gz.

Handing in your code

Once you've got your mysort program and modified xv6 ready to go, hand in your code by copying the relevant files to your project 1 handin directory, located at ~cs537-2/handin/$USER/p1 (where $USER is your CS username). Your p1 directory has two subdirectories, linux for your mysort code, and xv6 for your xv6 code. Please hand in only source files, not binaries (e.g. compiled .o files and executables), so do a make clean before handing in your code.

To hand in your mysort code, cd into the directory where your .c file(s) and Makefile are and run:

$ make clean
$ cp -r . ~cs537-2/handin/$USER/p1/linux

To hand in your xv6 code, cd into your xv6 directory (the top level, not the user/ or kernel/ subdirectories!) and run:

$ make clean
$ cp -r . ~cs537-2/handin/$USER/p1/xv6

If you have been working on your project in another environment (e.g. on your own computer instead of the lab machines), make sure your code compiles and runs cleanly on the lab machines before handing it in! (i.e. make sure a plain make will successfully compile your code.) We will be testing and grading your projects on those systems, so if it doesn't work there you will lose marks.

Finally, write a small README file and copy it into your p1 handin directory (not the linux or xv6 subdirectories). This should be a short plain text file (not a word .doc, not rich text, etc. just simple plain text) just describing anything we should know about your project, such as things you struggled with or didn't quite finish (or maybe bonus features you implemented?).

$ cp README ~cs537-2/handin/$USER/p1

Your README doesn't have to be long; a few sentences (or less?) is fine.