In which xv6 takes baby steps toward modernity, and a shell is born.

This project aims to give you some familiarity with the unix model of process orchestration.

For the Linux component you'll be writing your very own shell!

For the xv6 component, you'll be adding a new kernel facility: process exit statuses. You'll then modify the xv6 userspace C runtime to make it slightly easier for user programs to return their exit statuses, and modify the xv6 shell to make use of them.

As with our first project, both parts of this project are to be done individually. The same policy of course applies.

Due: Friday, February 17 at 11:59 PM. (Late policy)

Hand-in instructions are at the bottom of this page.

`sqysh`: a simple shell

For this project you will implement a unix shell, which we'll call sqysh (pronounced "squish"). A shell is a special kind of program built primarily for one task: running other programs.

Like any "real" shell, sqysh will support two modes of operation: interactive and non-interactive (a.k.a. batch or script mode). In either mode, it loops reading a command and then executing it (and then reading the next command...) until no more input is available (i.e. EOF on its input is reached). The only difference between interactive and non-interactive modes is that interactive mode should print a prompt to the user before reading a command. Your prompt should be the string "sqysh$ " (note the space after the $).

sqysh should enter interactive mode if and only if both of the following conditions are met:

There are no command-line arguments, and
A terminal (or TTY) is attached to standard input (this can be tested with the C library function isatty()).

In interactive mode the input source from which commands are read is the standard input.

In non-interactive mode, the input source should be the file named by the first command-line argument if there is one (a script, in essence), or stdin if there isn't a command-line argument.

Each command will appear as a single line of input. After reading one in, you should trim off any leading or trailing whitespace and then split the line into whitespace-separated "words". The first of these words is the command to execute; remaining words are either arguments to the command or special shell syntax modifying the way in which the command is run (see below).

In most cases you will need to execute the command as an external program. You will need to fork() to create a new process, and then execve() (or more likely execvp()) in the child process to execute the desired command, and wait() (or more likely waitpid()) in the parent to wait for the child to exit.

There are a few commands, however, that must be implemented as so-called "built-ins": cd, pwd, and exit. Their execution should not involve any of the fork()/exec()/wait() system calls; they must be implemented within the shell itself. (Technically pwd could be implemented as an external command, but cd and exit cannot be.)

cd you should be familiar with from your own use of "real" shells: it uses the chdir() system call to change the current working directory of the shell to the path given as its single (optional) argument. If multiple arguments are given, it should call:

fprintf(stderr, "cd: too many arguments\n");

If the chdir() call for cd fails, it should call:

fprintf(stderr, "cd: %s: %s\n", path, strerror(errno));

where path is the argument that was passed to the failed chdir().

If no argument is given to cd, it should call chdir() to switch to the user's home directory, the path of which can be obtained by calling getenv("HOME").

The pwd built-in simply prints to stdout the shell's current working directory, which can be obtained by calling getcwd() (or a related function).

The exit built-in simply terminates the sqysh process (i.e. calls the exit() system call with an exit status of zero).

In general, if you encounter any errors from system calls or library functions (which will generally set errno to indicate what went wrong) in the process of attempting to execute a command, you should call:

fprintf(stderr, "%s: %s\n", cmdname, strerror(errno));

where cmdname is the first word of the command.

Special syntax

sqysh must also support three common features of unix shell syntax: < (input redirection), > (output redirection), and & (background execution). All of these will only ever appear after any words to be used as arguments to a command (i.e. only words that appear before the first of these special characters should be passed as arguments to the command). Input and output redirections may occur in either order, so both of the following are valid syntax:

$ some_command < inputfile > outputfile
$ some_command > outputfile < inputfile

If > appears, the word following it is the name of a file to use as the standard output for the command to be executed. This file should be opened with the open() system call using the flags O_WRONLY, O_CREAT, and O_TRUNC (see man 2 open). You should then use the dup2() system call to use the resulting file descriptor as the standard output (file descriptor number 1) of the child process in which the command is executed.

Input redirection with < is similar, but for standard input. Open the file named in the next word with the O_RDONLY flag and use the resulting file descriptor (again via dup2()) as the child process's standard input (file descriptor 0).

If & is given it will always be the last word of a command. If it is, you should not call wait() (or anything similar) to wait for the child process to exit before moving on and printing the next command prompt (or reading the next command if in non-interactive mode). Processes that are started and left running in the background this way must be tracked by your shell, however, so that you can collect their exit statuses and avoid leaving zombie child processes floating around. In interactive mode, you should do this both immediately before any time you issue a command prompt to the user and immediately after receiving a line of input from the user; in batch mode do so before reading each line of input. You should use waitpid() with WNOHANG to check if any of your currently-running background processes have exited (see man 2 waitpid). For any background process that has exited, your shell should call:

fprintf(stderr, "[%s (%d) completed with status %d]\n", cmdname, pid, status);

where cmdname is the first word of the command whose process exited, pid is its process ID number, and status is its exit status.

None of the words involved in these syntax features (&, <, >, or the names of files to be used as stdin or stdout) should be passed as arguments to the program executed in the child process.

Simplifications, assumptions you may make

No "special syntax" will be used with built-in commands
No mechanism for escaping or quoting whitespace (or anything else) is needed
Input lines will be at most 256 characters (including the trailing '\n')
No input line will include more than one output redirection (>), more than one input redirection (<), or more than one background indicator (&). (Any combination of these three must be supported, however.)
Neither > nor < will ever appear as the last word of a command.
The <, >, and & special syntax characters will always appear as their own entire word.
You do not have to support ~ for home-directory substitution.
A command that is empty (or all-whitespace) after removing any "special syntax" words should be ignored -- don't try to execute anything or print an error, just move on to the next command (after re-issuing the prompt if in interactive mode).

Hints, tips, bugs to watch out for

When handling >, make sure you remember to pass the mode argument to open(), since O_CREAT is involved -- mode should be 0644 (note the leading zero, it's important!).
Make sure you use waitpid() to wait for the right process. If you start a background job A and then a foreground job B, you should not move on to the next command if A completes and exits while B is still running (i.e. don't use plain wait()).
Unless you enjoy making life needlessly difficult for yourself, I recommend using the convenient C library wrapper function execvp() instead of the actual underlying execve() system call. This saves you from having to manually search through the elements of your $PATH environment variable to find the command to execute (execvp() does that for you).
Encountering an error while executing a command (built-in or otherwise) should not cause your shell to exit -- you should instead just abort further processing of the current command and move on to the next one.
Note that the wstatus pointer at which waitpid() stores status information receives more than just the exit status itself -- you'll need to use the WEXITSTATUS() macro on it to extract the actual status value to print when a background job completes.
You should interpret EOF (end of file) on your input source as an implicit exit command.
You may not use the system() function (or popen() or anything similar).

A skeleton sqysh project can be downloaded here as a starting point.

While you must do more testing of your own, you may use this shell script to run some very basic initial tests on your shell. Put it in the directory with your sqysh executable and run bash test-sqysh.sh -- if the test fails, compare your shell's behavior to bash's on the commands executed by the test script (they should be the same).

Spiffing up xv6

There are three small xv6 sub-tasks for this project:

kernel support for process exit statuses,
shell support for checking exit statuses, and
C runtime support for returning exit statuses from main().

Process exit-status support

Our baseline version of xv6 has exit() and wait() system calls that take no parameters. If we look at the man pages for these system calls on a less primitive unix system, we see that they each take one parameter:

exit() takes an int to be used as the exiting process's exit status.
wait() takes a pointer to an int where the system call stores the exit status of the waited-for child process. (Note that this is simpler than a "real" unix wait(), which provides more than just the exit status.)

This allows a child process (the one calling exit()) to indicate to its parent process (the one calling wait()) a basic notion of success or failure.

You will need to modify the xv6 versions of these system calls so that they support these parameters and correctly pass the exit status of the child process back to its parent.

You will also need to update the existing xv6 userspace code to pass these new parameters to these system calls. When you do this, examine the surrounding code at each exit() call to determine whether the argument you add should be zero or non-zero. (This should usually be fairly obvious -- for example if the exit() call is right after a call to printf() to print an error message, you can safely infer that the exit status should be non-zero.)

Because the xv6 usertests program is large and has lots of exit() calls (and isn't very relevant here), you may simply disable it for this project (remove the appropriate line from user/makefile.mk so that it isn't compiled).

Checking exit statuses in the shell

Exit statuses are nice to have and a good thing to check. Why? Because it's good (for both users and other programs) to know whether something you've tried to run has succeeded or failed. (Recall project 1's requirement of proper exit statuses.)

In a "real" shell, the exit statuses of commands run by the shell are available for use in scripts via features like conditionals (if statements) and shell variables ($?, specifically). Unfortunately, xv6's shell supports none of these features.

We would, however, still like xv6's shell to have some way of reporting the success/failure status of commands that it has run. So to that end, you will modify it to simply print a message to stderr (file descriptor 2) whenever one of its child processes exits with a non-zero status. This message should be of the form:

"[pid %d exited with status %d]\n"

with the process ID (pid) and exit status of the child process filled in in the appropriate locations (i.e. you should copy & paste this format string).

`return`-from-`main()` support

Some of you may recall encountering a minor stumbling block in P1: a strange-looking error message when a user program simply returned from main() instead of calling exit(). While returning from main() is, in a "normal" C environment, a perfectly legitimate thing to do, xv6's C runtime environment sadly lacks the necessary code to support it. You now get to fix this!

The entry point of an executable is the location at which execution begins when a process calls execve() to execute it. It is determined by the linker, which creates the final executable from object code generated by the compiler (.o files). xv6 simply sets main() as the entry point -- this is what leads to the quirk described above, because there simply is no function that called main() that it can return to (so the return instruction ends up jumping to an invalid address).

Look through the xv6 user makefile to find the flag passed to the linker to tell it what symbol (function, essentially) to use as its entry point (you can also find this in the man page for linker, ld). Modify the makefile to instead use a function called _start (note the leading underscore) as the entry point for user programs. You must then define _start in an appropriate location (so that it gets included with all user programs) and in an appropriate way. Specifically, it must ensure that main() is invoked with the proper argc and argv arguments and that if main() returns then its return value is used as the exit status of the process.

Tying it all together

To exercise all three xv6 components, create two trivial new xv6 user programs: true and false.

The code for true should be:

int main(int argc, char** argv) { return 0; }

The code for false should be:

int main(int argc, char** argv) { return 1; }

(These are actual commands that exist on any real unix system -- and believe it or not, they're actually useful!)

If all three xv6 modifications have been done correctly, upon running these you should see something like the following as your thrilling conclusion:

$ true
$ false
[pid 4 exited with status 1]

The same base xv6 code as used in P1 can be found here.

Handing in your code

Handin instructions are similar to those for the first project:

$ cd $YOUR_SQYSH_CODE_DIRECTORY
$ make clean
$ cp -r . ~cs537-2/handin/$USER/p2/linux

$ cd $YOUR_XV6_DIRECTORY
$ make clean
$ cp -r . ~cs537-2/handin/$USER/p2/xv6

# create a brief README file
$ cp README ~cs537-2/handin/$USER/p2

After doing this please run the handin checker script (bash p2-handin-check.sh) to make sure you've done the handin process correctly.

Project 2