Project 1: The Shell

In theory, there's no difference between theory and practice,
but in practice, there is. -Anonymous

Due: Part 1 is due Monday January 31st, 5PM. Part 2 is due Sunday February 13th, at Midnight.

Etc: This project is to be done by yourself in C (not C++). Here are some clarifications for part 2.

Goals

Understand the creation, and lifestyles of processes
Develop your C programming skills
Exposure to the UNIX (POSIX) API
Design and implement a simple shell

Background

You will implement a command line interpreter, the program people interact with to execute and control programs. On UNIX systems this is a user-level program that is referred to as the shell. Typically, different flavors of shells are available (e.g. sh, csh, tcsh, bash), you can see what shell you are currently running by typing echo $SHELL.

To learn more about a shell or any programs/functions listed in this assignment use the man pages, e.g. type man tcsh. Type man man to learn about man (by using man). Man pages are organized by sections, section 1 contains programs (e.g. find), section 2 contains system calls (e.g. open), section 3 contains subroutine libraries (e.g. strtok), and so forth. It helps to specify the section number (e.g. man 2 open) when you use man because some programs/functions may appear in multiple sections. You may see a program or function listed on the web or in papers as foo(3), this means information about foo appears in manual section 3. Man pages often contain platform specific information so make sure to use the man pages on the system you are using. Back to shells...

A shell is generally structured as follows:

print a prompt
read a line from the user
parse the line to determine the name and parameters of program(s) to execute
use the fork system call to create a new child process
- the child process calls the exec system call to execute the specified program
- the parent process (the shell) either continues if the child is to be executed in the background, or calls the wait system call to wait for it to terminate.
goto step 1

Most commands people type are the name of other UNIX programs (e.g. ls, date). Shells also recognize internal commands (e.g. cd, exit) that are directly implemented within the shell process (shells do not fork a process to handle built-in commands).

emperor7(1)% which date
/s/std/bin/date
emperor7(2)% which cd
cd: shell built-in command.

Most shells include an extensive set of internal commands, that typically include a simple programming language (you've probably heard of shell scripting).

emperor7(3)% if (-e /bin/ls) echo "phew... glad we have ls"
phew... glad we have ls
emperor7(4)%

When executing a program in a shell, both the program's input (standard in, or stdin), output (standard out, or stdout) and error output (standard error, or stderr) are set to be read from/written to the terminal, by default. It is often useful to specify that a program's input should instead come from a file, or that its output should be redirected to a file. Input/output redirection lets us do just that. The following example sorts a list of words in file words and writes the output to a new file sorted_words.

emperor7(5)% sort < words > sorted_words
emperor7(6)%

The output of a program executed in a shell can be directed to another program executed in the shell using pipes (the | character denotes a pipe in the shell). Pipes allow the standard ouput of one process to act as the standard input of another. For example, to find the number of files with extension '.c' in my src directory I would type:

emperor7(7)% find ~/src | grep '.c$' | wc -l
      2
emperor7(8)%

Pipes and input/output redirection reflect the UNIX philosophy, programs should be written to do one thing well, and to work together with other programs. For example, rather than write a program that searches for files based on their contents we combine existing programs: find, xargs and grep.

emperor7(9)% find ~/src -type f | xargs grep exec
/u/e/l/eli/src/test.c:        if (-1 == execv(command, args))
emperor7(10)%

Note that the programs themselves do not assume where they get their input/output from; grep may get its input from a file or from the output of another command (the source of grep's input and its functionality should be orthogonal issues). It would be bad if we had to code a version of grep for each of these cases. To further investigate how the above examples work, use the man pages (e.g. man find).

The Assignment

Part 1

You are to implement a simple shell in C. Follow the basic algorithm above, use the fork, execv, wait and waitpid system calls. Child processes should exit with _exit rather than exit. Here is some code which illustrates fork and a Makefile (just some code to get you started, not a template for the project). You can assume the full path of the program will be supplied, e.g. '/bin/ls' rather than just 'ls'. A command that ends with an ampersand (e.g. '/bin/ls &') should be executed in the background, otherwise the shell should wait for the command to complete (an ampersand that does not appear at the end should be considered a parameter). To make your life easier you can assume that a line will be no longer than 1024 characters and that no program will take more than 25 arguments. If you allocate resources when you create a command that executes in the background you should ensure they are deallocated when the command completes.

Use 'prompt> ' as your prompt string. E.g. your shell output should match the following exactly:

prompt> /s/std/bin/cat /dev/null
prompt>

Add a built-in 'exit' command. Your shell should exit when the user enters the exit command, or the end of the input stream is reached (e.g. the user enters Ctrl-D). Your shell should wait for any background processes to exit before it exits if the 'exit' command is used.

You will find the fgets and strtok C library calls handy for obtaining and parsing commands (look at their man pages).

It is a good idea to always check the return values of system and library calls. Use the perror function when you've detected that an erroneous value has been returned. For example, the following illustrates correct output if user elsa tries to execute a non-executable file foo in her directory:

prompt> /u/e/l/elsa/537/p1/foo
/u/e/l/elsa/537/p1/foo: Permission denied
prompt>

Your shell should be robust -- consider arbitrary input. Your shell should handle errors gracefully (not segfault).

Check out the programming resources section on the class homepage. Once you get part 1 working proceed to part 2.

Part 2

Add input/output redirection. The command prog < infile > outfile should cause prog to read its input from file infile and redirect its output to file outfile. Put differently, infile becomes prog's stdin, outfile becomes prog's stdout. Input/output redirection do not have to be used together, e.g. prog < infile is valid; standard out should go to the terminal as usual. Do not worry about redirecting standard error or binary files.

The sort command (if not given a file name) sorts lines it reads from stdin and prints the lines in sorted order to stdout. The following interaction illustrates the correct shell output:

prompt> /bin/cat /u/r/o/romi/letters
c
b
a
prompt> /bin/sort < /u/r/o/romi/letters > /u/r/o/romi/sorted_letters
prompt> /bin/cat /u/r/o/romi/sorted_letters
a
b
c
prompt>

Your output should match exactly. You may find using output and input redirection in your 'real' shell useful for testing the shell you are writing. In general, if you find yourself doing (typing) something over and over again, automate it.

Implement pipes. As mentioned previously, a pipe makes the standard output of one process become the standard input of another. For example, if your current directory contains three files with a '.c' extension named cat.c, dog.c, and mouse.c then your shell should behave as follows:

prompt> /s/std/bin/find | /s/std/bin/grep '.c$' | /s/std/bin/sort
./cat.c
./dog.c
./mouse.c
prompt>

For part 2 use pipe, dup2, open, close, and the STDOUT_FILENO and STDIN_FILENO constants. As always, consult the man pages. Again, your shell should be robust -- consider arbitrary input. Your shell should handle errors gracefully (not segfault).

If you are having dificulty or want to discuss your design, use the TA office hours. You may also setup a meeting with me via email.

Handin

For part 1 copy your source to directory ~cs537-2/handin/NAME/p1-1 where NAME is your login name, e.g. if your login is zissou then your handin directory would be ~cs537-2/handin/zissou/p1-1. For part 2 use directory ~cs537-2/handin/NAME/p1-2.

Copy all of your .c files (you probably just need one, shell.c) and your Makefile. Do NOT copy your binary or .o files. The command 'make shell' (executed in the same directory as your Makefile) should compile your source, resulting in a binary named 'shell', otherwise you will lose points. After the deadlines you will lose permissions to write to this directory. Remember that late projects are not accepted. Plan accordingly.

Most of your grade will be determined by how well your implementation works. We will run your code on a suite of test cases, some will test correct execution, other error cases. Exercise your program's capabilities thoroughly (read try to break it). Develop on whichever system you like, though make sure your shell works on an emperor machine as that is where it will be tested.