Project 2: The Unix ShellImportant DatesQuestions about the project? Send them to
Regrade Deadline: Monday, October 13th, by 9 pm. Clarifications9/30: Send questions to our Gmail accounts. Since the IMAP server is still down, and if you have any questions regarding the project, you should write an email to all these four addresses : 537-help @cs.wisc.edu, and [gunawi, jingl1345, vandhana.prakash] @gmail.com. 9/28: End of Clarifications. There will be no more clarifications for this project. 9/28: Sample test files ready! The test files are ready. They
are located in 9/27: Too-long command line (part 2). When you encounter a line too long, print the whole line, print the error message, then throw the line away (i.e. do not execute any command in this line), and continue to the next command line. If the too-long command line only consists of whitespaces, you still need to print the whole line. 9/27: Whitespace, and batch file. In the batch file, if there is a blank line or a valid line with just whitespaces (i.e. less than or equal to 512 whitespaces), you should not print the blank line to stdout, i.e. you only print a command line if there is one or more non-whitespace characters in the command line, or if the whitespaces are too long (see the next clarification above). 9/26: README. Please also submit a README file (i.e. filename: "README"). Your README file should describe what functionalities are not working. If you think your code is perfect, simply write "All good". Your handin directory should only store these files: Makefile, README, all *.c and *.h (if any) files. We will run a script that wipes out all non-related files. 9/26: No printf! When you want to print a command line or error message, your shell code should never use printf . Instead, use write(STDOUT_FILENO, ...). For example: write(STDOUT_FILENO, cmdline, strlen(cmdline)); char error_message[30] = "An error has occurred\n"; write(STDOUT_FILENO, error_message, strlen(error_message)); The reason is quite complicated. In short, you will test your shell in an automated-way (See the next comment below), and using printf in your shell code will make the output undeterministic. If you want to know more, email us. 9/26: How to do automated-testing. We will provide test files soon (by Sunday), so make sure your program is ready! For each test file (e.g. sample1.txt), we will provide the expected stdout and stderr files (e.g. stdout1.txt and stderr.txt). Here is one way to perform automated-testing. First, you must copy these files into your working directory. And then simply run: % bash % ./myshell sample1.txt 1> my_stdout1.txt 2> my_stderr1.txt % diff my_stdout1.txt stdout1.txt % diff my_stderr1.txt stderr1.txt The first command (bash) basically runs a new bash shell. Bash is
used so that you can do the automatic redirection 9/26: Shell-related vs. program-related errors. Note that
there is a difference between errors that your shell catches and those
that the program catches. Your shell should catch all the syntax
errors specified in this project page. If the syntax of the command
looks perfect, you simply run the specified program. If there is any
program-related errors (e.g. invalid arguments), let the program
prints its specific error messages to anywhere it wants (e.g. could be
stdout or stderr). For example, if you run 9/25: Redirection, non-existing program, and error message.
When you run a non-existing program with redirection, e.g. More details: When you create a child process with redirection, you
need to intercept stdout to your output fd before you call execvp.
Then, you run execvp(noprogram), and it will return an error because
noprogram cannot be found. Then, your child process will call 9/24: Tabs. A tab is considered as a white space. 9/24: exit, cd, and pwd formats. The formats for exit , cd and pwd are: [optionalSpace]exit[optionalSpace] [optionalSpace]pwd[optionalSpace] [optionalSpace]cd[optionalSpace] [optionalSpace]cd[oneOrMoreSpace]dir[optionalSpace] Any other formats should not be accepted, i.e. do not run the
command, print the error message, and continue processing the next
command. Note that the UNIX shell allows you to redirect pwd output to
a file (e.g. 9/24: Too-long command line. A command line that is too long consists of more than 512 any characters excluding the newline character (Hint: so you must create an array of 514 characters to carry the newline and null-termination character). If you type more than 512 white spaces, it is considered as an invalid command line. Whenever you encounter a too-long command line, you should print the one and only error message, throw away that line, and continue processing the next command line. Hint: if a command line is too long, you need a special routine to take the rest of characters in the invalid command line, and throw those characters away. For example if a command line is 520 characters long, your special handling routine should take the 8 remaining characters and throw them away. Not doing this will make your shell erratic. 9/24: Redirection format. Whenever you find a redirection character '>' in a command, you should check whether the format of the command is correct or not before running the program. The format of a valid redirection command looks like: [optSpace]progAndArgs[optSpace]>[optSpace]outFile[optSpace] progAndArgs is basically the program and all the arguments that a user types in. optSpace implies there could be 0 or more whitespaces (including tabs). The output file, outFile, should consist of characters without any white spaces. Also note that in a valid redirection command, the '>' character only appears once. Whenever you encounter an invalid redirection command, you should print the one and only error message (you should know it by now), and continue processing the next command. Here are some examples of invalid redirection command: ls > out1 out2 ls > out1 out2 out3 ls > out1 > out2 9/22: No tilde (~). You do not have to support tilde (~). Although in the UNIX shell you could go to a user's directory by typing "cd ~mjordan", in this project you do not have to deal with tilde. You should treat it like a common character, i.e. you should just pass the whole word (e.g. "~mjordan") to chdir(), and chdir will return error. 9/22: pwd clarification. In the UNIX shell, when you run "echo $PWD" and "pwd", you sometimes get different outputs. For example: % cd % pwd /afs/cs.wisc.edu/u/m/j/mjordan % echo $PWD /u/m/j/mjordan The reason is there are two places where you can get the current working directory. The pwd call gets the string from getcwd() system call which will give you the absolute path, while the second gets the string from the getenv("PWD") system call. So, you might ask which string should you use? The answer is you should use the absolute path. In that case, you do not have to manage the $PWD variable . This will reduce the code you need to write. Basically, when a user types pwd, you simply call getcwd(). When a user changes the current working directory (e.g. "cd somepath"), you simply call chdir(). Hence, if you run your shell, and then run pwd, it should look like this: % cd % pwd /afs/cs.wisc.edu/u/m/j/mjordan % echo $PWD /u/m/j/mjordan % ./myshell myshell> pwd /afs/cs.wisc.edu/u/m/j/mjordan 9/21: The one and only error message. A section about Error Message has been added. In summary, you should print this one and only error message whenever you encounter an error of any types: char error_message[30] = "An error has occurred\n"; write(STDOUT_FILENO, error_message, strlen(error_message)); The error message should be printed to stdout (not stderr!). Also, do not attempt to add whitespaces or tabs or extra error messages. 9/20: $HOME. When you run "cd" (without arguments), your shell should change the working directory to the path stored in the $HOME environment variable. 9/18: No pipes. We have rewritten the hints for Redirection. Basically, you do not need to use pipes (yay!). This should make your life easier. 9/17: More hints. We have added more hints for Built-in Commands NotesThis project must be done alone. You can talk to your colleagues about it, but every line of code must be written and understood by you. Of course, you can always ask the TAs and professors for help too. ObjectivesThere are four objectives to this assignment:
OverviewIn this assignment, you will implement a command line interpreter or shell. The shell should operate in this basic way: when you type in a command (in response to its prompt), the shell creates a child process that executes the command you entered and then prompts for more user input when it has finished. The shells you implement will be similar to, but much simpler than,
the one you run every day in Unix. You can find out which shell you
are running by typing Program SpecificationsBasic ShellYour basic shell is basically an interactive loop: it repeatedly
prints a prompt emperor1% ./myshell myshell> You should structure your shell such that it creates a new process for each new command. There are two advantages of creating a new process. First, it protects the main shell process from any errors that occur in the new command. Second, it allows easy concurrency; that is, multiple commands can be started and allowed to execute simultaneously. In this project, you are not required to deal with concurrency. Your basic shell should be able to parse a command, and run the
program corresponding to the command. For example, if the user types
The maximum length of a command line your shell can take is 512 bytes (excluding the carriage return). Multiple CommandsAfter you get your basic shell running, your shell is not too fun
if you cannot run multiple jobs on a single command line. To do that,
we use the For example, if the user types Built-in Commands
Whenever your shell accepts a command, it should check whether the
command is a built-in command or not. If it is, it should not be
executed like other programs. Instead, your shell will invoke your
implementation of the built-in command. For example, to implement the
So far, you have added your own Redirection
Many times, a shell user prefers to send the output of his/her program
to a file rather than to the screen. The UNIX shell provides this nice
feature with the
For example, if a user types If the White SpacesZero or more spaces can exist between a command and the shell
special characters (i.e. myshell> ls;ls;ls myshell> ls ; ls ; ls myshell> ls>a; ls > b; ls> c; ls >d If you are unsure whether a particular command is valid or not, the rule of thumb is to try it in the UNIX shell. If the UNIX shell accepts that command, your shell should accept the same command. Batch ModeSo far, you have run the shell in interactive mode. Most of the time, testing your shell in interactive mode is time-consuming. To make testing much faster, your shell should support batch mode . In interactive mode, you display a prompt and the user of the shell will type in one or more commands at the prompt. In batch mode, your shell is started by specifying a batch file on its command line; the batch file contains the same list of commands as you would have typed in the interactive mode. In batch mode, you should not display a prompt. In batch mode you should print each line you read from the batch file back to the user before executing it; this will help you when you debug your shells (and us when we test your programs). To print the command line, do not use printf because printf will buffer the string in the C library and will not work as expected when you perform automated testing. To print the command line, use write(STDOUT_FILENO, ...) this way: write(STDOUT_FILENO, cmdline, strlen(cmdline)); In both interactive and batch mode, your shell terminates when it
sees the To run the batch mode, your C program must be invoked exactly as follows:
The command line arguments to your shell are to be interpreted as follows. batchFile: an optional argument (often indicated by square brackets as above). If present, your shell will read each line of the batchFile for commands to be executed. If not present or readable, you should print the one and only error message (see Error Message section below). Implementing the batch mode should be very straightforward if your shell code is nicely structured. The batch file basically contains the same exact lines that you would have typed interactively in your shell. For example, if in the interactive mode, you test your program with these inputs: emperor1% ./myshell myshell> ls ; who ; ps some output printed here myshell> ls > /tmp/ls-out;;;; ps > /non-existing-dir/file; some output and error printed here myshell> ls-who-ps some error printed here then you could cut your testing time by putting the same input lines to a batch file (for example myBatchFile): ls ; who ; ps ls > /tmp/ls-out;;;; ps > /non-existing-dir/file; ls-who-ps and run your shell in batch mode: emperor1% ./myshell myBatchFile In this example, the output of the batch mode should look like this: ls ; who ; ps some output printed here ls > /tmp/ls-out;;;; ps > /non-existing-dir/file; some output and error printed here ls-who-ps some error printed here Important Note: To automate grading, we will heavily use the batch mode . If you do everything correctly except the batch mode, you could be in trouble. Hence, make sure you can read and run the commands in the batch file. Soon, we will provide some batch files for you to test your program. Defensive Programming and Error Message:As in the first project, in this project defensive programming is also required. Your program should check all parameters, error-codes, etc. before it trusts them. In general, there should be no circumstances in which your C program will core dump, hang indefinitely, or prematurely terminate. Therefore, your program must respond to all input in a reasonable manner; by "reasonable", we mean print the error message (as specified in the next paragraph) and either continue processing or exit, depending upon the situation. Since your code will be graded with automated testing, you should print this one and only error message whenever you encounter an error of any types: char error_message[30] = "An error has occurred\n"; write(STDOUT_FILENO, error_message, strlen(error_message)); The error message should be printed to stdout (not stderr!). Also, do not attempt to add whitespaces or tabs or extra error messages. You should consider the following situations as errors; in each case, your shell should print the error message to stdout and exit gracefully:
For the following situation, you should print the error message to stdout and continue processing:
Your shell should also be able to handle the following scenarios below, which are not errors . The best way to check if something is not an error is to run the command line in the real Unix shell.
HintsWriting your shell in a simple manner is a matter of finding the relevant library routines and calling them properly. To simplify things for you in this assignment, we will suggest a few library routines you may want to use to make your coding easier. (Do not expect this detailed of advice for future assignments!) You are free to use these routines if you want or to disregard our suggestions. To find information on these library routines, look at the manual pages (using the Unix command man ). Basic ShellParsing: For reading lines of input, you may want to look at fgets(). To open a file and get a handle with type FILE * , look into fopen(). Be sure to check the return code of these routines for errors! (If you see an error, the routine perror() is useful for displaying the problem. But do not print the error message from perror() to the screen. You should only print the one and only error message that we have specified above ). You may find the strtok() routine useful for parsing the command line (i.e., for extracting the arguments within a command separated by whitespace or a tab or ...). Executing Commands: Look into fork , execvp , and wait/waitpid . See the UNIX man pages for these functions, and also read the Advance Programming in the UNIX Environment, Chapter 8 (specifically, 8.1, 8.2, 8.3, 8.6, 8.10). Before starting this project, you should definitely play around with these functions. You will note that there are a variety of commands in the int main(int argc, char *argv[]); Note that this argument is an array of strings, or an array of pointers to characters. For example, if you invoke a program with: foo 205 535 then argv[0] = "foo", argv[1] = "205" and argv[2] = "535". Important: the list of arguments must be terminated with a NULL pointer; that is, argv[3] = NULL. We strongly recommend that you carefully check that you are constructing this array correctly! Multiple CommandsIf you get your basic shell running, supporting multiple commands should be straight-forward. The only difference here is that you need to wait for the previous process to finish before creating a new one. To do that, you simply use waitpid() again. Built-in CommandsFor the 'exit' built-in command, you should simply call 'exit();'. The corresponding process will exit, and the parent (i.e. your shell) will be notified. For managing the current working directory, you should use getenv , chdir , and getcwd . The getenv() system call is useful when you want to go to your $HOME directory. You do not have to manage the $PWD environment variable. getcwd() system call is useful to know the current working directory; i.e. if a user types pwd, you simply call getcwd(). And finally, chdir is useful for moving directories. Fore more, read the Advanced UNIX Programming book Chapter 4.22 and 7.9 . RedirectionRedirection is probably the most tricky part of this project. For this you only need dup2() . You actually do not need to use pipe() (unless you want to be fancy). (In fact, you don't (really) even need dup2() , but could just use close() on stdout and then open() on a file. More on this during discussion.) The idea of using dup2 is to intercept the byte stream going to the standard output (i.e. your screen), and redirect the stream to your designated file. dup2 uses file descriptors, which implies that you need to understand what a file descriptor is. You should read the Advanced UNIX Programming book Chapter 1.5 to get an initial understanding of what a file descriptor is. With file descriptor, you can perform read and write to a file. Maybe in your life so far, you have only used fopen() , fread() , and fwrite() for reading and writing to a file. Unfortunately, these functions work on FILE* , which is more of a C library support; the file descriptors are hidden. Hence, it is impossible for you to use dup2 with these particular functions. To work on file descriptor, you should use creat() , open() , read() , and write() system calls. These functions perform their works by using file descriptors. To understand more about file I/O and file descriptors you should read the Advanced UNIX Programming book Section 3 (specifically, 3.2 to 3.5, 3.7, 3.8, and 3.12). Before reading forward, at this point, you should get yourself familiar with file descriptor. The idea of redirection is to make the stdout descriptor points to
your output file descriptor. First of all, let's understand the
STDOUT_FILENO file descriptor. When a command To give yourself a practice, create a simple program where you
create an output file, intercept stdout, and call printf("hello").
When you creat your output file, you should get the corresponding file
descriptor. To intercept stdout, you should call In short, to intercept your 'ls' output, you should redirect stdout before you execute ls, i.e. make the dup2() call before the exec('ls') call. Miscellaneous HintsRemember to get the basic functionality of your shell working before worrying about all of the error conditions and end cases. For example, first get a single command running (probably first a command with no arguments, such as "ls"). Then try adding more arguments. Next, try working on multiple commands. Make sure that you are correctly handling all of the cases where there is miscellaneous white space around commands or missing commands. Finally, you add built-in commands and redirection supports. We strongly recommend that you check the return codes of all system calls from the very beginning of your work. This will often catch errors in how you are invoking these new system calls. Beat up your own code! You are the best (and in this case, the only) tester of this code. Throw lots of junk at it and make sure the shell behaves well. Good code comes through testing -- you must run all sorts of different tests to make sure things work as desired. Don't be gentle -- other users certainly won't be. Break it now so we don't have to break it later. Keep versions of your code. More advanced programmers will use a source control system such as CVS . Minimally, when you get a piece of functionality working, make a copy of your .c file (perhaps a subdirectory with a version number, such as v1, v2, etc.). By keeping older, working versions around, you can comfortably work on adding new functionality, safe in the knowledge you can always go back to an older, working version if need be. HandinTo ensure that we compile your C correctly for the demo, you will
need to create a simple makefile; this way our scripts can just
run
The name of your final executable should be emperor1% ./myshell emperor1% ./myshell inputTestFile Please also submit a README file (i.e. filename: "README"). Your README file should describe what functionalities are not working. If you think your code is perfect, simply write "All good". Hand in your source code (including the README and Makefile). We have created a directory ~cs537-SECTION/handin/NAME/p2, where SECTION is either 1 or 2 and NAME is your login name. For example, if you are in section 2 and your login is jimmyv, your handin directory is ~cs537-2/handin/jimmyv/p2. Copy all of your .c source files into the appropriate subdirectory. Do not submit any .o files. Make sure that your code runs correctly on the linux machines in the 13XX labs (e.g., the royal or emperor clusters). GradingWe will run your program on a suite of test cases, some of which will exercise your programs ability to correctly execute commands and some of which will test your programs ability to catch error conditions. Be sure that you thoroughly exercise your program's capabilities on a wide range of test suites, so that you will not be unpleasantly surprised when we run our tests. Other Shell FunThe shell you are building is very simplistic; it doesn't have a PATH variable, there is no shell history of previous commands you have run, the user can't customize the prompt, etc. Feel free to play around with adding such functionality; it will give you some more insight into how things really work. However, please don't let your fun ruin your code -- keep a working copy of the shell before playing around. |