Unix Scripting Notes

  1. Introduction
  2. Hello World - Writing to standard output, Hash bang, Comments
  3. Using Unix Commands - Simple Redirects, Pipes, Multiline scripts
  4. Using Variables - Defining variables, Accessing variables, String replacement in commands
  5. Using Backticks - Assigning output to a variable
  6. Special Varaibles - Command line arguments, Exit status, Process ID
  7. The if Statement - Using an if statement, test and square brackets
  8. Numbers In Shell Scripts - The 'expr' command, Use of backticks, Globbing and the using the escape character
  9. The while Loop - Syntax of a while loop, Using a while loop
  10. The for Loop - Syntax of a for loop, Lists
  11. The case Statement - Syntax for case statement, Magic 8-Ball, Wildcards in case statements
  12. Unix Redirects In Detail - Redirect input from scripts, Redirect one stream to another
  13. Conditional Execution - && and ||
  14. Variables In Detail - Quoting (' and "), Exporting variables, Variable Constructs
  15. Useful Commands - basename, dirname, exit, wait, read, trap, head, tail, grep, shift

Introduction


A shell script, in its simplest form, is a series of shell commands put together in a text file, which can then be executed in order. The commands used in the scripting language are the same ones the user types in when they are using Unix through a command line. To those of you who are used to writing code in languages like C and Java, shell scripting can seem strange because it doesn't need to be compiled before it is run and because of it's relative lack of structure, but it can prove to be a surprisingly powerful tool.

Shell scripting is useful in situations where you want to get a program up and running without wasting too much time. Often a shell script will only have a fraction of the number of lines of an equivalent C or Java program, and hence, it only takes a fraction of the time to write. In this tutorial, we will look at how to use the Bourne shell, sh, but there are many other shell scripting languages available, such as ksh, csh and bash.

In this tutorial, any code will be written in a box, and comments will be coloured green, as shown below:
 
#!/bin/sh
ls            # Comment


Hello World


In keeping with Computer Science tradition, the first Unix script we will look at is the "Hello World" example. This is a very simple example, but it is worth seeing because it demonstrates a number of interesting aspects of Unix scripting.
 
#!/bin/sh
printf "Hello World\n"  # Writes string to standard output

The program above is a valid shell script, you can run it by typing it into a text file, making the text file executable, and then running it just like any other program. Note that you do not need to compile it, like you would for C, or explicitly run an interpreter, like with Java, it runs as is.

When you look at the program, you will see three distinct parts to it. We will look at each of these seperately.


Using Unix Commands


One of the features of Unix scripting is that any command you can use in Unix is valid code in a shell script. All of the commands you have learnt in Unix can be used in a shell script, for example:
 
#!/bin/sh
ls -alp | more    # Displays the contents of the current directory, and displays them one page at a time 

Try writing this code into a file and executing it. Compare the output from this program with the output produced when you type ls -alp | more into the command line. You should see that they produce the same output.

One thing to note in the above example is the use of the pipe (|). All of the usual Unix redirects are available for use in a Shell script. As an example of how these might be used, here is a program which writes out the login names of everyone connected to our machine in lexiograpic order.
 
#!/bin/sh
who > /tmp/tempfile  # Writes the names of who is logged in to the file /tmp/tempfile
sort /tmp/tempfile   # Sorts the file and writes it to standard output
rm /tmp/tempfile    # Deletes the temporary file

This program runs the 'who' command, which displays a list of everyone logged in, and redirects its output to a temporary file. Note that the temporary file is put in the /tmp directory, it is common practice to put any temporary files in this directory when writing a Unix script. The next line writes a sorted version of this file to standard output, and the final line removes the temporary file which was created by 'who'. We could write this without needing to create a temporary file by piping the output from who into sort, as shown below.
 
#!/bin/sh
who | sort    # Displays logged in users in lexiographic order 

If you want a command to continue onto the next line, you can put a '\' at the end of the line. You can use this as many times as you want, and it is possible to have commands which run over many lines.
 
#!/bin/sh
ls \
-alp

One final thing to remember is if you don't want error messages from the commands to appear on the screen, you should redirect the error stream to null by writing 2>&-. This will stop error messages from the command from reaching the terminal. Take, for example, this script which removes any temporary files which may have been created a text editor. Temporary filenames generally end with a ~, so the script will delete anything ending in ~, but if there are no temporary files, we don't want to display an error message, we should just quit normally.
 
#!/bin/sh
rm *~ 2>&-    # Error messages are sent to null, ie. they are not displayed

Try running this script with and without the redirect, and notice the difference when there is nothing to remove.

We will take a closer look at redirecting input and output later on.


Using Variables


I'm sure that by now you are wondering about how you can declare a variable in shell script. Creating a variable in a shell script is very easy, much easier than in C or Java, all you need to do is write VARIABLENAME=VALUE (Note that there are no spaces on each side of the equals sign. If you put in a space, strange things will happen). To access a variable, put a dollar sign, $, before the variable name, eg. $VARIABLENAME.

Variables in shell scripts operate in a very different way to variables in other computer languages. Some of the things you must remember are:

  1. In shell scripts, all variables can be thought of as strings. Even if you say that a variable contains a number, eg. VAR=5, you can still treat it like a string.
  2. You do not need to declare a variable. The variable is made as soon as you start using it.
  3. If you have a line in a shell script where you access a variable, the line is executed with the variable name replaced by its value. This aspect may seem confusing, so it is best to look at some examples to get a better idea of how they work.

Here we have the example from the previous page, altered to include a variable.
 
#!/bin/sh

TMPFILE=/tmp/tempfile

who > $TMPFILE   # Writes the names of who is logged in to the file /tmp/tempfile
sort $TMPFILE    # Sorts the file and writes it to standard output
rm $TMPFILE     # Deletes the temporary file

When the program gets to a line in this program with '$TMPFILE' in it, it relpaces '$TMPFILE' with '/tmp/tempfile', and then executes the command. Hopefully you can see that if we replace '$TMPFILE' with '/tmp/tempfile' on each line, the same three commands are executed as were in the previous page. One important thing to remember about variables is that they can be used anywhere in a command. For example, a script which simple executes the who command might look like this:
 
#!/bin/sh

COMMAND=who
$COMMAND

When accessing variables, you can optionally put the variable name in { }, for example, the code below runs the ls command, but does so in a strange way. If you can't see how putting the variable name in braces would be useful, don't worry, its purpose will become clear later in the tutorial.
 
#!/bin/sh

COMMAND=l
${COMMAND}s -alp


Using Backticks


One of the more interesting things you can do with variables is to assign the output of a command to a variable. This is done by enclosing the expression in backticks (`). The backtick is the button at the top left of you keyboard, next to the number 1. When a script reaches a line containing a command inside backticks, it executes this command, replaces that command with its output, and then executes the line as per normal. As with most aspects of Unix scripting, this is best explained with an example.
 
#!/bin/sh

OUTPUT=`echo HELLO`

printf "$OUTPUT\n"

This example executes in the following order:

  1. When the script is on the first line, it encounters a command in backticks. This is the first command that is exectued.
  2. Then, the command `echo HELLO` is replaced by its output HELLO, so now the line looks like this 'OUTPUT=HELLO'
  3. The first line is now executed, so now the variable OUTPUT has the value HELLO.
  4. Now we are up to the second line. Before this line is executed, $OUTPUT is replaced by the value of the variable OUTPUT, HELLO, so now it look like 'printf "HELLO\n"'
  5. This line is executed, writing HELLO to standard output.

This was a fairly long winded method of getting something written to the screen, but it does demonstrate how backticks work. Later we will look at more interesting and useful ways of using backticks.


Special Variables


When you start a Unix script, there are already a number of special variables which you can use. The first ones we will look are the ones relating to command line arguments. The important variables are:

As an example, consider the script below. This script simply writes out some of these variables to standard output.
 
#!/bin/sh
# This script requires at least two command line arguments to run properly

printf "The scripts name is $0\n"
printf "The first argument is $1\n"
printf "The second argument is $2\n"
printf "The script was executed with $# command line arguments\n"
printf "The full list of arguments is : $@\n"

Type in and execute this script, with different numbers of command line options.

Another other important special variable is known as the Process ID (PID). In Unix, each process, or running program, is allocated a unique PID between 1 and 32767. You can find out the value of the PID from the $$ variable.
 
#!/bin/sh
printf "The Process ID is $$\n"

Because the PID is unique, it is often used to help name temporary files. Here we have the example given in the Using Variables tutorial, where we used a temporary file.
 
#!/bin/sh

TMPFILE=/tmp/tempfile

who > $TMPFILE   # Writes the names of who is logged in to the file /tmp/tempfile
sort $TMPFILE    # Sorts the file and writes it to standard output
rm $TMPFILE     # Deletes the temporary file

The problem with this is that if two copies of this script are run at once, they will both try to access the same temporary file, and we could get some errors occuring. A way to avoid this problem would be to ensure that each invocation of the script gave its temporary file a unique name. This can be done using the Process ID as shown below:
 
#!/bin/sh

TMPFILE=/tmp/tempfile-$$  # This is a unique filename

who > $TMPFILE   # Writes the names of who is logged in to the temporary file
sort $TMPFILE    # Sorts the file and writes it to standard output
rm $TMPFILE     # Deletes the temporary file

The final variable which you should know about is $?. This contains the exit status of the last command executed. When a program finishes, it passes a number back to the shell telling it how it exited. Generally, if the program finishes normally, the program will return a value of 0, and if the program ended due to an error, it will return a non-zero value. This example below uses cat to display the file $1, and then writes out the exit status of cat.
 
#!/bin/sh
# Requires at least 1 command line argument

cat $1
EXIT_STAT=$?    # EXIT_STAT is the exit status of cat 

printf "\n\ncat had an exit status of $EXIT_STAT\n"

Try running this script with different command line arguments. If the file exists and cat can display it, you will find that the exit status is 0, but if you give it a file it can't read, or one which doesn't exist, the exit status will be something else.


The if Statement


Almost every programming language has a feature that allows you to execute code based on the truth of a condition, and shell scripting is no different. The if statement allows you to check if a given expression is true, and if it is, execute a certain piece of code. The syntax for the if statement is as follows:
 
if EXPRESSION ; then
  DO THIS
  ...
elif EXPRESSION ; then        # else if is optional, and there can be more than 1 of them
  DO THIS
  ...
elif ...

else                        # else is optional, only one per if statement
  DO THIS
  ...

fi                          # fi = end of the if statement (fi is if backwards)

The first important part of the if statement is the expression. The expression is any valid Unix command, and the expression is considered to be true if and only if it has an exit status of zero. The if statement can be a good way to check if a program has worked. An example of how this might work is:
 
#!/bin/sh

if cat $1 2>&- ; then
  printf "\n\ncat finished successfully\n"
else
  printf "\n\ncat failed\n"
fi

This program will attempt to display the file given to it as an argument. If it was displayed successfully, it will write a message to standard output, and if it failed, a different message is written.

While it is nice to see if a program completed successfully, often when we write an if statement we want perform some sort of check on the value of a variable. The test command provides us with this ability. We can run the test command by typing the word test but, to make the code a little more readable, we can just put the thing we want to test in square brackets. In other words, the following commands are equivalent:

The test command will return 0 if the test evaluated to true, and it will return a non-zero value if the test evaluated to false, making it very useful as the expression part of an if statement.

There are many different ways to use the test command (run man test to see them all), but some of the more useful ones include:

The script below uses the test command to perform error checking on the input to a program.
 
#!/bin/sh
# This script will display the file give as the first argument.

if [ $# -eq 0 ] ; then                 # If no command line arguments were given...
  printf "Usage: $0 filename\n"        # write the appropriate error message to standard output 
  exit 1                               # and then exit with a non-zero exit status

elif [ $# -gt 1 ] ; then
  printf "Too many arguments, only the first argument will be dispalayed\n"
fi

if [ -r $1 ] ; then                    # If the file to be displayed if readable
  cat $1                               # display the file
else
  printf "$0: cannot open $1\n"        # Could not open file, error message and exit
  exit 2
fi


Numbers In Shell Scripts


At this stage, we have learnt how we can treat variables as strings, but often we need to treat a variable like it is a number. The expr command allows us to evaluate arbitrary mathematical expressions, and by using backticks, we can store the result. For example:
 
#!/bin/sh

NUM=5

printf "$NUM + 1 = `expr $NUM + 1`\n"    # Prints NUM + 1 to the screen

NUM=`expr $NUM \* 2`                     # NUM equals 2 times the old value of NUM (must put a slash before the *)
printf "NUM is now $NUM\n"               

Before we continue, you should make a note of the '\*' on the third line. A star on its own in a command line is known as the 'glob' character, and if this is in a command, it will be replaced by all of the filenames in the directory. If you want to see how this is done, write a script which simply writes out the number of arguments ($#) to the screen and run this script with * as the argument. You will see that the number of arguments isn't necessarily 1, it is the number of visible files in the directory. To stop Unix from replacing the * with filenames, we put the escape character \ before it. You will also need to put a \ before question marks, open and close brackets, less than and greater than's, dollar signs, etc. if you want them to be treated as normal characters.

Now that that's out of the way, lets look at the above example line by line:

You can also use the expr command to conduct comparisons between two expressions. If the expression is true, it will output 0, else it will output 1.
 
#!/bin/sh

# Compares (5 * 6) + 3 to 5 + (6 * 3)

if expr 5 \* 6 + 3 \> 5 + 6 \* 3 ; then
  printf "Expression evaluated to true\n"
else
  printf "Expression evaluated to false\n"
fi


The while Loop


The only major topic we haven't yet covered in shell scripts is looping. The first type we will look at is the while loop. Those of you who have programmed in Java or C will already be familiar with this type of loop. When a while loop is reached, the condition is tested. If the condition is false, we continue running the script from below the end of the loop, however if it evaluates to true, the code in the body of the loop is executed. Once the body has been executed, we return to the top of the loop and test the condition again. The while loop is useful in situations where we must execute a piece of code a certain number of times. The syntax for the while loop is as follows:
 
while EXPRESSION; do
  BODY OF LOOP
  .
  .
  .
done

Consider the following piece of code which writes out the numbers from 1 to $1 to the terminal.
 
#!/bin/sh

COUNT=1

while [ $COUNT -le $1 ]; do
  printf "${COUNT}\n"         # Write COUNT to standard output
  COUNT=`expr $COUNT + 1`     # Increment COUNT
done

printf "FINISHED\n"

On the first iteration through the loop, COUNT has a value of 1. If COUNT is not less than $1, the test will evaluate to false, the loop is finished and the program continues from after the done, but if it was less than 1, we execute the two commands in the body of the loop and try the test again.


The for Loop


The other way to write a loop in a shell script is to use a for loop. The for loop in a shell script allows you to loop over a list of items, and it works in a completely different way to the for loop in languages like Java and C. The syntax for the for loop is as follows:
 
for VARIABLE in LIST; do
  BODY OF LOOP
  .
  .
  .
done

When a for loop is executed, on the first iteration through the for loop, VARIABLE is the first item in the list. On the next iteration, VARIABLE is the second item on the list, on the third iteration it is the third item, etc..., until the end of the list is reached. When the end of the list is reached, the loop is finished and we continue executing from after the end of the loop. One question you may have at this point is "How do I make a list?". In shell scripts, any variable with spaces in it is a list, for example, in the code fragment below, NAMES is a list with three variables, where 'John' is the first element of NAMES, 'Bill' is the second element and 'Fred' is the third.
 
NAMES="John Bill Fred"       # A list with three elements

A simple example of how a for loop might be used is shown below.
 
#!/bin/sh

KEYBOARD="Q W E R T Y U I O P A S D F G H J K L Z X C V B N M"    # Letters from a QWERTY keyboard

COUNT=1
for LETTER in $KEYBOARD; do
  printf "Letter number $COUNT is $LETTER\n"
  COUNT=`expr $COUNT + 1`
done

In this example, KEYBOARD is a list which contains all of the letters on QWERTY keyboard. The for loop is used to loop over each item in this list, and while we are inside the loop, the variable LETTER contains the current list item.

The for loop can be a very powerful tool when used in conjunction with ls. ls can provide you with a list of filenames matching a regular expression. Some examples of how you might use this include:

An example of how you might use this shown below. This script writes out the name of all files in the current directory which end in .c (C source files), .a (C Archive files)  and .o (C Object files).
 
#!/bin/sh

printf "C files in the current directory are:\n"
COUNT=0

for FILENAME in `ls *.c *.a *.o 2>&-`; do    # If ls gives an error message, send it to null.
  printf "${FILENAME}\n"
  COUNT=`expr $COUNT + 1`
done

printf "\nCounted $COUNT C files\n"


The case Statement


The final control structure available in a Unix script is the case statement. This is similiar to the switch statement in Java and C, and it allows you to choose between multiple paths depending upon the value of a variable. The syntax for a case statement is:
 
case VARIABLE in
OPTION1)
  DO THIS
  ;;
OPTION2)
  DO THIS
  ;;
OPTION3)
.
.
.
esac          # esac = case spelled backwards

When a case statement is executed, the variable is compared to each of the options in order. If the variable matches one of the options, the code after the option is executed, up until the two semicolons. Once the two semicolons are reached, the case statement is completed and we continue from after esac. If two options are found which match the variable, only the first of these is executed. If none of the options match, no code is executed inside the case statement.

Below is an example of how a case statement can be used. This script, like a Magic 8-Ball, will produce an answer to a yes or no question. The random number is generated from the Process ID.
 
#!/bin/sh

printf "Ask the Magic 8-Ball a yes or no question out loud...\n"
sleep 8                # Wait for 8 seconds

printf "Shaking the eight ball...\n"
sleep 2

case `expr $$ % 6` in    # $$ is virtually a random number, $$ mod 6 is a random number from 0 to 5 inclusive
0)
  printf "All signs point to yes\n"
  ;;
1)
  printf "Looks unlikely\n"
  ;;
2)
  printf "Vision is a little hazy, try again\n"
  ;;
3)
  printf "A definate maybe\n"
  ;;
4)
  printf "It looks doubtful\n"
  ;;
5)
  printf "I think so\n"
  ;;
esac

An interesting feature of case statements is that you can use wildcards like ? and * in the options. This next example is an edited version of the script from 'The for Loop' which, as well as writing out all of the C files, writes a comment next to them about what type of file they are.
 
#!/bin/sh

printf "C files in the current directory are:\n"
COUNT=0

for FILENAME in `ls *.c *.a *.o 2>&-`; do    # If ls gives an error message, send it to null.
  printf "${FILENAME}     "

  case $FILENAME in
  *.c)
    printf "C Source code file\n"
    ;;
  *.o)
    printf "C Object file\n"
    ;;
  *.a)
    printf "C Library file\n"
    ;;
  esac

  COUNT=`expr $COUNT + 1`
done

printf "\nCounted $COUNT C files\n"


Unix Redirects In Detail


We have already had a brief look at how we can use pipes, redirect output in Unix scripts, and how we can supress error messages using a command like 2>&-. Now we will take a deeper look at how streams can be redirected.

In Unix, each process has 3 file descriptors. These are:

While all of these files default to the user's terminal, it is possible to redirect them to other places. Some ways of doing this include:

Below is an example whch demonstrates how redirecting input and output can be used. This is a script which tells the user how many times it has been executed by keeping a counter in a file.
 
#!/bin/sh

COUNTFILE=$HOME/.countfile          # This file contains the number of times the script has been executed.

if [ ! -r $COUNTFILE ] ; then        # If the count file doesn't exist, the script has never been run before,
  NUM_TIMES=1                       # so this is the first execution of the script.
else
  NUM_TIMES=`cat < $COUNTFILE`       # cat < $COUNTFILE will write out the contents of the file $COUNTFILE
  NUM_TIMES=`expr $NUM_TIMES + 1`
fi

printf "$NUM_TIMES\n" > $COUNTFILE   # Updating $COUNTFILE

printf "This script has been executed $NUM_TIMES times\n"

The next example demonstrates how we can use << to redirect input to come from the script itself. This seems quite strange because it has C code written directly into a Unix script, but it is valid because the C code is being used as input to another program.
 
#!/bin/sh

SRC=/tmp/temp-$$.c
EXE=/tmp/temp-$$

cat > $SRC << END_C_CODE       # The script itself is used as input to cat, until the line END_C_CODE is reached
int main()
{
    printf("Hello!!!");
    return 0;
}
END_C_CODE

gcc -o $EXE $SRC

$EXE

rm $EXE $SRC 2>&-

Some other useful redirection commands include:


Conditional Execution


In shell scripts it is possible to use the logical operators && and || to group together commands. These logical operators also allow us to execute commands based on exit values much like an if statement. These logical operarors work as follows:

One use for these commands is to link together conditions in an if statement, for example:
 
#!/bin/sh

if [ 5 -eq $1 ] || [ $1 -gt 7 ] ; then # $1 must be either 5 or greater than 7.
  printf "Good choice.\n"
fi

But another way to use them is as a replacement for the if statement.
 
#!/bin/sh
# Tells the user if $1 is a readable file

[ -r $1 ] && printf "$1 is a readable file.\n" || printf "$1 is not a readable file. That's a shame...\n"

If the file $1 is readable, then [ -r $1 ] returns true. This means that the part after the && must be executed. If the file wasn't readable, the test will return false, so the part after the || must be executed. This piece of code is equivalent to an if-then-else statement.


Variables In Detail


We already know how to create and use variables, but there are still some aspects of them which we haven't covered. The first one we will look at are variable constructs. These provide different ways of accessing a variable. The commands we can use are:

These variable constructs can help to make code much shorter and often remove the need to cumbersome if statements. To see how they work, type in and execute the following example.
 
#!/bin/sh

MY_VAR="Value"

printf "MY_VAR = ${MY_VAR}\n"                # Write out MY_VAR

printf "NEW_VAR = ${NEW_VAR:-New}\n"         # NEW_VAR doesn't exist yet, should write New

printf "NEW_VAR = ${NEW_VAR:=Variable}\n"    # Create NEW_VAR

printf "NEW_VAR = ${NEW_VAR:-New}\n"         # NEW_VAR does exist now, should write Variable
printf "NEW_VAR = ${NEW_VAR}\n"              # See that it has been set to Variable

printf "Checking if MY_VAR exists..."
printf "${MY_VAR:?No\n}"                     # MY_VAR does exist...
printf "It does exist\n"

printf "Checking if SOME_VAR exists..."
printf "${SOME_VAR:?No\n}"                   # SOME_VAR doesn't exist, should print No and exit
printf "It does exist\n"                     # We won't get to this line

Another interesting feature of variables is the ability to export them. When a variable is exported, any process started by the shell in which the script is running can also see the variable. Consider the following two scripts:
 
#!/bin/sh
# script1

VAR=abc
script2

#!/bin/sh
# script2

printf "$VAR\n"

When we run script1, it creates a variable VAR and then executes script2. Even though script2 is called from inside script1, it cannot read the variables in script1, but if we alter script1 so that it exports the variable, script2 will be able to see it, and we will get abc written to standard output.
 
#!/bin/sh
# script1

VAR=abc
export VAR        # Exporting the variable, now script2 can see it
script2

#!/bin/sh
# script2

printf "$VAR\n"

The final thing we will look at in variables is the difference between single and double quotes. When a variable is written inside double quotes, it is replaced with its value, but when a variable is written in single quotes, it is not replaced. To get an idea of how this works, type in and execute this script.
 
#!/bin/sh

MY_VAR="Value"

printf "Variable replacement...\n\n"
printf "Using double quotes : $MY_VAR \n"
printf 'Using single quotes : $MY_VAR \n'

printf "\n\nUsing backticks...\n\n"
printf "Using double quotes : `echo HELLO`\n"
printf 'Using single quotes : `echo HELLO`\n'

You will see that when something is written in single quotes, Unix doesn't replace commands in backticks or variables. It will only do the replacement when it is in double quotes.


Useful Commands


There are a number of commands and built-in functions which often prove useful when writing a Unix script. Here we will look at a few of these commands, describe what they do, and give examples of how they can be used.