Project 2a: The Unix Shell

Objectives

There are three objectives to this assignment:

Familiarize yourself with the Linux programming environment.
Learn how processes are created, destroyed, and managed.
Gain exposure to the necessary functionality in shells.

Overview

In this assignment, you will implement a command line interpreter (aka a shell). The shell should operate in this basic way: when you type in a command (in response to its prompt), the shell creates a child process that executes the command you entered and then prompts for more user input when it has finished.

The shell you implement will be similar to, but simpler than, the one you run every day in Unix. You can find out which shell you are running by typing "echo $SHELL". You may wish to look at the man pages for csh, bash, or whatever shell you use to learn more about its functionality. For this project, you do not need to implement too much functionality.

Program Specifications

Basic Shell

Your shell is basically an interactive loop: it repeatedly prints a prompt "mysh>", parses the input, executes the command specified on that line of input, and waits for the command to finish. Example (user-typed text is bold):

prompt> ./mysh
mysh> pwd
/home/tyler/cs537/p1
mysh> echo hello > out
mysh> echo world >> out
mysh> ls
out
...
mysh> cat out
hello
world
mysh> cat out | wc
       2       2      12
mysh> exit
prompt>

You should structure your shell such that it creates a new process for each new command (there are a few exceptions to this, which we will discuss below). Running commands in a new process protects the main shell process from any errors that occur in the new command.

For reading lines of input, you may want to look at fgets(). To open a file and get a handle with type FILE *, look into fopen(). Be sure to check the return code of these routines for errors! (If you see an error, the routine perror() is useful for displaying the problem. You may find the strtok() routine useful for parsing the command line (i.e., for extracting the arguments within a command separated by whitespace or a tab). Some have found strchr() useful as well.

Your shell must be able to run any program in your path. The execvp() system call checks in the locations listed in your $PATH environmental variable for the binary. For example, if the command is myfoo a b, execvp will check for a binary named myfoo in the contents of $PATH (e.g., /usr/bin:/bin). This means that you do not need to add /bin/ or similar paths to your command name before passing it to execvp().

Running Programs

Most commands will instruct the shell to run a program with some specified arguments. For example:

mysh> prog arg1 arg2 ...

In order to execute the programs given as commands to your shell, look into fork, execvp, and wait/waitpid system calls. See the UNIX man pages for these functions, and also read the Advance Programming in the UNIX Environment, Chapter 8 (specifically, 8.1, 8.2, 8.3, 8.6, 8.10). Before starting this project, you should definitely play around with these functions.

You will note that there are a variety of commands in the exec family; for this project, you must use execvp. You should not use the system() call to run a command. Remember that if execvp() is successful, it will not return; if it does return, there was an error (e.g., the command does not exist). The most challenging part is getting the arguments correctly specified. The first argument specifies the program that should be executed, with the full path specified; this is straight-forward. The second argument, char *argv[] matches those that the program sees in its function prototype:

int main(int argc, char *argv[]);

Note that this argument is an array of strings, or an array of pointers to characters. For example, if you invoke a program with:

foo 205 535

then argv[0] = "foo", argv[1] = "205" and argv[2] = "535".

Important: the list of arguments must be terminated with a NULL pointer; that is, argv[3] = NULL. We strongly recommend that you carefully check that you are constructing this array correctly!

Built-in Commands

There are three special cases where your shell should execute a command directly itself instead of running a separate process.

First, if the user enters "exit" as a command, the shell should terminate (either by returning from main, or with a call to exit(0)).

Second, if the user enters "cd dir", you should change the current directory to "dir" by using the chdir system call. Users can run programs with paths relative to the working directory without specifying an absolute path. For example, instead of typing "/a/b/c/myprog", a user could type two commands: "cd /a/b/c" followed by "myprog". If the user simply types "cd" (no dir specified), change to the user's home directory. The $HOME environment stores the desired path; use getenv("HOME") to obtain this.

Third, if the user enters "pwd", print the current working directory. This can be obtained with getcwd().

Special Features

Your shell should have three special features: overwrite redirection (">"), append redirection (">>"), and pipes ("|").

Instead of writing a program's output to the terminal, a user may want write the output to a file (redirection) or use the output as the input to another program (piping).

Overwrite redirection: if the user types "program args > outfile", save the output from running the program to outfile, overwriting any file that already exists with that name.

Append redirection: if the user types "program args >> outfile" append the output of the program to the end of the outfile, creating it if it does not already exist.

Pipes: if the user types "program1 args1 | program2 args2", use the output from program1 as the input to program2.

These features are relatively easy to implement. After fork (but before exec), the STDIN and STDOUT file descriptors are already set up to refer to user-typed input and output to the terminal respectively. The dup2 system call is useful for setting STDIN and STDOUT to refer to other sources/destinations of data. The pipe system call may be useful for setting up a pair of file descriptors for piping. You might also want to learn about modes for open() like O_TRUNC and O_APPEND. More on these calls during discussion.

Assumptions

Commands are no longer than 1024 characters.
We will not test paths containing tildas (in most shells, "~username" is a shorthand for a user's home directory).
You don't need to support pipe chains of more than 2 commands (e.g., you don't need to worry about commands such as "cat file | grep test | wc").
Whitespace may appear before or after commands, or between command arguments.
If any problems occur (e.g., a command doesn't follow proper syntax or a program doesn't exist), print "Error!\n" to standard error and continue running.
You do not need to pipe or redirect built-in commands.
Commands use one of either |, >, or >>. For example, you don't have to handle ls | wc > file

Handin

Submit a Makefile so that we can simply run "make" to compile the mysh binary with the correct flags. If you don't know how to write a makefile, there are plenty of tutorials online. Copy all of your .c source files into the appropriate subdirectory. Make sure that your code runs correctly on the Linux machines in the CSL labs.

Grading

We will release some (80%) of the test cases we will use for grading before the project due date. If your programs passes these tests, you will get at least 80% of the grade unless you are not following specifications (for example, you will receive a very low score if you use the system() function instead of fork/exec). Programs clearly written to just pass the test cases (instead of being built for general input) will also receive little or no credit.

The remaining 20% of your grade will be based on tests not released. These will test whether you have implemented all the details in the specifications - so if you miss some corner case, you will likely lose points here.

Hints

Writing your shell in a simple manner is a matter of finding the relevant library routines and calling them properly. To simplify things for you in this assignment, we have suggested a few library routines you may want to use to make your coding easier. (Do not expect this detailed of advice for future assignments!) You are free to use these routines if you want or to disregard our suggestions (the one exception is that you must use the execvp call, not system). To find information on these library routines, look at the manual pages (using the Unix command man).

Remember to get the basic functionality of your program working before worrying about all of the error conditions and corner cases. For example, for your shell, first get a single command running (probably first a command with no arguments, such as "ls"). Then try adding more arguments. Next, try working on multiple commands. Make sure that you are correctly handling all of the cases where there is miscellaneous white space around commands or missing commands. Finally, support for built-in commands, redirection, and pipes.

We strongly recommend that you check the return codes of all system calls from the very beginning of your work. This will often catch errors in how you are invoking these new system calls. And, it's just good programming sense.

Keep versions of your code. More advanced programmers will use a source control system such as CVS. Minimally, when you get a piece of functionality working, make a copy of your .c file (perhaps a subdirectory with a version number, such as v1, v2, etc.). By keeping older, working versions around, you can comfortably work on adding new functionality, safe in the knowledge you can always go back to an older, working version if need be.