CS 537: Introduction to Operating Systems
Project 1: The Shell Game
Executive summary: In this project, you will write a simple shell (command line interpreter) and further familiarize yourself with C programming and the UNIX environment.
Assigned: June 26
Due: July 11 at 5:00 PM (note change from syllabus)
You must work in groups (of two or three students) for this project. If you have any questions, e-mail them to the class list. Then everyone will benefit from your question!
Background
A command-line interpreter is the program you interact with in a terminal window; its primarily functionality is to execute and control programs. On UNIX systems, the command-line interpreter is called the shell. There are many shells; as you might expect, different people prefer different ones. (Some examples include sh, csh, bash, tcsh, ksh, and -- my personal favorite -- zsh.) Most shells include not only basic program control functionality but also programming features like variables, looping and branching constructs, and sophisticated multi-stage evaluation facilities.
What you're going to do
For this project, you will implement a simple shell. (Don't worry -- you won't have to implement any of the programming features!) The basic structure of a simple shell is an endless loop:- print a prompt
- read a line of input
- parse the line to determine
- the name and parameters of programs to execute
- any pipes or input/output redirections
- set up any pipes or input/output redirections
- use the fork system call to create a child process:
- the child process calls the exec system call to execute the specified program
- the parent process (your shell) waits for the child process to complete (with the waitpid system call) or continues, if the child process is to be run in the background
- repeat
Since most of you don't have prior experience writing lexers and parsers (programs that convert from human-readable text into machine-readable data structures), I will provide cllex, a routine to convert a command line into tokens, which will prove easier to deal with than raw strings. More details on this routine will follow soon.
Most shells support built-in commands, such as cd, time, and the aforementioned programming language features. (Read the man page for the chdir system call to see why it wouldn't make sense for cd to be an external command.) Your shell, on the other hand, will only need to support exit, which terminates your shell program. (Although you may find it handy to implement a cd built-in!)
Your shell will also support input/output redirection and pipes. Redirections allow us to specify that the input for a program should be read from a file, or that the output of a program should be written to a file. The following example shows using redirection to provide the sort command with input from a file called words, and writing the output of sort to a file called sorted_words.
sort < words > sorted_words
Pipes provide a mechanism to use the output of one program as the input to another program. Pipes encourage modularity; instead of writing one program that does everything, we can write many small programs that each do one thing well and enable them to cooperate by sharing data via pipes. For example, let's say I want to see how many PDF files are in my home directory:
[willb@chaconne ~]$ find ~ -type f | grep '.pdf$' | wc -l 1982
The grep and wc commands will take input from the terminal, from a file, or from another program -- in fact, they don't make any assumptions about where their inputs come from. (Some programs check to see what kinds of outputs they have -- for example, the gzip program won't write compressed data to your terminal unless you tell it to.) This state of affairs is far preferable to the alternative: a world in which we had to write one version of a program to read input from a terminal, one to read from a pipe, and one to read from a file!
How you're going to do it
The bare essentials
First, implement an extremely simple shell in C. You will follow the basic algorithm given above, and should use the cllex routine that I'll provide, as well as the fork, exec, wait, and waitpid system calls.
Your shell should display 'prompt> ' as the prompt string. You will use the fgets library routine to read a line from standard input, the cllex routine to split this line into tokens, and then fork and exec.
If the user enters ^D (you can tell this because fgets will return NULL and feof(stdin) will return true) or exit, your shell should terminate with the exit call.
A command line ending with an ampersand (&) indicates that the command is to run in the background. Recall that this means your shell will not waitpid for the child process. When a background process exits, your shell will receive the SIGCHLD signal; you should install a signal handler in order to process this notification. Here is the code you'll need to add to install a signal handler:
#include <signal.h>
void handle_sigchld(int s)
{
/* execute non-blocking waitpid, loop because we may only receive
* a single signal if multiple processes exit around the same time.
*/
while (waitpid(0, NULL, WNOHANG) > 0);
}
int main() {
...
/* register the handler */
signal(SIGCHLD, handle_sigchld);
...
}
Real shells print information about exited background processes, but yours does not have to. Your shell should still execute a blocking waitpid to wait on a process that is not to be executed in the background. When it does, it should ignore the error return code (indicating the process was already waited on) because it is possible that the signal handler waited on the child first.
Easy, right? Let's move on to pipes and redirection, then! (Ensure that your solution to this part of the project works properly before moving on.)
Interacting with processes and files
Next, you'll implement input/output redirection and pipes.
Commands and redirections
Each command you'll execute is the path to a program, optionally followed by arguments and input/output redirection. Redirections may appear before, after, or in between arguments; the following three commands are all equivalent:
- cmd arg1 arg2 < infile
- cmd < infile arg1 arg2
- cmd arg1 < infile arg2
The command can not redirect input or output without specifying a file. (In the case of input redirection, the file must exist.) You should print the exact error messages as follows, and your code must not crash on the following examples.
prompt> /bin/ls < Missing name for redirect. prompt> /bin/ls > Missing name for redirect. prompt>
prompt> /bin/cat < bogus bogus: No such file or directory prompt>
If the output file does not exist you should create it:
prompt> /bin/ls newfile /bin/ls: newfile: No such file or directory prompt> /bin/ls > newfile prompt> /bin/cat newfile shell.c prompt>
If the output file already exists you should not overwrite it:
prompt> /bin/ls > shell.c shell.c: File exists prompt>
To open (and optionally create) files you should use the open system call; you read its man page as part of the assignment for Project 0, but feel free to do so again.
Input or output redirection should not be specified more than once for the same command:
prompt> /bin/cat < shell.c < shell.c Ambiguous input redirect. prompt>
You will need to use the dup2 system call and the STDOUT_FILENO and STDIN_FILENO constants to implement input and output redirection. For more information on these calls, consult the man pages and the Stevens book.
Pipes
Pipes are the oldest form of UNIX interprocess communication. They are half-duplex, meaning that data only flows in one direction. When using a shell, you will use the vertical bar operator (|, pronounced "pipe") to indicate that two processes are connected by a pipe. The output of a command on the left side of a pipe becomes the input to the command on the right side of a pipe.
When you implement your shell, you will use the pipe system call to create a pipe. pipe takes an array of two integers and puts the file descriptor for reading from the pipe in the first element and the file descriptor for writing to the pipe in the second.
If a process forks a child (or children) after creating a pipe then both processes have copies of the pipe file descriptors. The producing process closes the 1st file descriptor and writes to the 2nd file descriptor. The consuming process closes the 2nd file second file descriptor and reads from the 1st file descriptor.
A command line may consist of several commands linked together by pipes. Note that if a program is on the left side of a pipe, it may not redirect its output to a file (and vice versa); if a program is on the right side of a pipe, it may not redirect its input from a file.
This code illustrates how your shell could set up a pipe between the two processes in the command line ls | grep foo.
Tips
Coming soon!
