CS 537
Lecture Notes Part 4
Processes and Synchronization, Continued
Deadlock

Previous Processes and Synchronization
Next Implementation of Processes
Contents

Terminology
Deadlock Detection
Deadlock Recovery
Deadlock Prevention
Deadlock Avoidance

Using Processes (Continued)

Deadlock

[Silb., 6th ed, Chapter 8] [Tanenbaum, Chapter 3]

Terminology

The Dining Philosophers problem isn't just a silly exercise. It is a scale-model example of a very important problem in operating systems: resource allocation. A “resource” can be defined as something that costs money. The philosophers represent processes, and the forks represent resources.

There are three kinds of resources:

sharable
serially reusable
consumable

Sharable resources can be used by more than one process at a time. A consumable resource can only be used by one process, and the resource gets “used up.” A serially reusable resource is in between. Only only process can use the resource at a time, but once it's done, it can give it back for use by another process. Examples are the CPU and memory. These are the most interesting type of resource. We won't say any more about the other kinds.

A process requests a (serially reusable) resource from the OS and holds it until it's done with it; then it releases the resource. The OS may delay responding to a request for a resource. The requesting process is blocked until the OS responds. Sometimes we say the process is “blocked on the resource.” In actual systems, resources might be represented by semaphores, monitors, or condition variables in monitors--anything a process may wait for.

A resource might be preemptable, meaning that the resource can be “borrowed” from the process without harm. Sometimes a resource can be made preemptable by the OS, at some cost. For example, memory can be preempted from a process by suspending the process, and copying the contents of the memory to disk. Later, the data is copied back to the memory, and the process is allowed to continue. Preemption effectively makes a serially reusable resource look sharable.

There are three ways of dealing with deadlocks: detection and recovery, prevention, or avoidance.

Deadlock Detection

[Silb., 6th ed, Section 8.6] [Tanenbaum, Section 3.4]

The formal definition of deadlock:

A set of processes is deadlocked if each process in the set is waiting for an event that only a process in the set can cause.

We can show deadlock graphically by building the waits-for graph. Draw each process as a little circle, and draw an arrow from P to Q if P is waiting for Q. The picture is called a graph, the little circles are called nodes, and the arrows connecting them are called arcs [Silb., 6th ed, Figure 8.7 (b), page 261]. We can find out whether there is a deadlock as follows:


    for (;;) {
        find a node n with no arcs coming out of it;
        if (no such node can be found)
            break;
        erase n and all arcs coming into it;
    }
    if (any nodes are left)
        there is a deadlock;

This algorithm simulates a best-case scenario: Every runnable process runs and causes all events that are expected from it, and no process waits for any new events. A node with no outgoing arcs represents a process that isn't waiting for anything, so is runnable. It causes all events other processes are waiting for (if any), thereby erasing all incoming arcs. Then, since it will never wait for anything, it cannot be part of a deadlock, and we can erase it.

Any processes that are left at the end of the algorithm are deadlocked, and will wait forever. The graph that's left must contain a cycle (a path starting and ending at the same node and following the arcs). It may also contain processes that are not part of the cycle but are waiting for processes in the cycle, or for processes waiting for them, etc. The algorithm will never erase any of the nodes in a cycle, since each one will always have an outgoing arc pointing to the next node in the cycle.

The simplest cycle is an arc from a node to itself. This represents a process that is waiting for itself, and usually represents a simple programming bug:


    Semaphore s = 0;
    ...
    s.down();
    s.up();

If no other process can do s.up(), this process is deadlocked with itself.

Usually, processes block waiting for (serially reusable) resources. The “events” they are waiting for are release of resources. In this case, we can put some more detail into the graph. Add little boxes representing resources. Draw an arc from a process to a resource if the process is waiting for the resource, and an arc from the resource to the process if the process holds the resource. [Silb., 6th ed, Figure 8.7 (a), page 261] [Tanenbaum, Figures 3-4, 3-4, and 3-5, pp. 165-9]. The same algorithm as before will tell whether there is a deadlock. As before, deadlock is associated with cycles: If there is no cycle in the original graph, there is no deadlock, and the algorithm will erase everything. If there is a cycle, the algorithm will never erase any part of it, and the final graph will contain only cycles and nodes that have paths from them to cycles.

Resource Types

[Silb., 6th ed, Section 8.2.2] [Tanenbaum, Section 3.1]

Often, a request from a process is not for a particular resource, but for any resource of a given type. For example, a process may need a block of memory. It doesn't care which block of memory it gets. To model this, we will assume there there some number m of resource types, and some number U[r] of units of resource r, for each r between 1 and m. To be very general, we will allow a process to request multiple resources at once: Each request will tell now many units of each resource the process needs to continue. The graph gets a bit more complicated [Silb., 6th ed, Figure 8.1], but essentially the same algorithm can be used to determine whether there is a deadlock. We will need a few arrays for bookkeeping.


    U[r] = total number of units of resource r in the system
    curAlloc[p][r] = number of units of r currently allocated to process p
    available[r] = number of units of r that have not been allocated to any process
    request[p][r] = number of units of r requested by p but not yet allocated

As before, the algorithm works by simulating a best-case scenario. We add an array of boolean done[] with one element for each process, and initially set all elements to false. In this, and later algorithms, we will want to compare arrays of numbers. If A and B are arrays, we say that A <= B if A[i] <= B[i] for all subscripts i.¹


    boolean lessOrEqual(int[] a, int[] b) {
        for (int i=0; i<a.length; i++)
            if (a[i] > b[i]) return false;
        return true;
    }

Similarly, when we add together two arrays, we add them element by element. The following methods increment or decrement each element of one array by the corresponding element of the second.


    void incr(int[] a, int[] b) {
        for (int i=0; i<a.length; i++)
            a[i] += b[i];
    }
    void decr(int[] a, int[] b) {
        for (int i=0; i<a.length; i++)
            a[i] -= b[i];
    }

We will sometimes need to make a temporary copy of an array


    int[] copy(int[] a) {
        return (int[])a.clone();
    }
    int[][] copy(int[][] a) {
        int[][] b = new int[a.length][];
        for (int i = 0; i < a.length; i++)
            b[i] = copy(a[i]);
        return b;
    }

Finally, note that request is a two dimensional array, but for any particular value of p, request[p] is a one-dimensional array rp corresponding to the p^th row of request and representing the current allocation state of process p: For each resource r, rp[r] = request[p][r] = the amount of resource r requested by process p. Similar remarks apply to to curAlloc and other two-dimensional arrays we will introduce later.

With this machinery in place, we can easily write a procedure to test for deadlock.


    /** Check whether the state represented by request[][] and the
     ** global arrays curAlloc[][] and available[] is deadlocked.
     ** Return true if there is a deadlock.
     */
    boolean deadlocked(int[][] request) {
        int[] save = copy(available);
        boolean[] done = new boolean[numberOfProcesses];
        for (int i = 0; i < done.length; i++)
            done[i] = false;
        for (int i = 0; i < numberOfProcesses; i++) {
            // Find a process that hasn't finished yet, but
            // can get everything it needs.
            int p;
            for (p = 0; p < numberOfProcesses; p++) {
                if (!done[p] && lessOrEqual(request[p], available))
                    break;
            }
            if (p == numberOfProcesses) {
                // No process can continue.  There is a deadlock
                available = save;
                return true;
            }
            // Assume process p finishes and gives back everything it has
            // allocated.
            incr(available, curAlloc[p]);
            done[p] = true;
        }
        available = save;
        return false;
    }

The algorithm looks for a process whose request can be satisfied immediately. If it finds one, it assumes that the process could be given all the resources it wants, would do what ever it wanted with them, and would eventually give them back, as well as all the resources it previously got. It can be proved that it doesn't matter what order we consider the processes; either we succeed in completing them, one at a time, or there is a deadlock.

How expensive is this algorithm? Let n denote the number of processes and m denote the number of resources. The body of the third for loop (the line containing the call to lessOrEqual) is executed at most n² times and each call requires m comparisons. Thus the entire method may make up to n²m comparisons. Everything else in the procedure has a lower order of complexity, so running time of the procedure is O(n²m). If there are 100 processes and 100 resources, n²m = 1,000,000, so if each iteration takes about a microsecond (a reasonable guess on current hardware), the procedure will take about a second. If, however, the number of processes and resources each increase to 1000, the running time would be more like 1000 seconds (16 2/3 minutes)! We might want to use a more clever coding in such a situation.

Deadlock Recovery

Once you've discovered that there is a deadlock, what do you do about it? One thing to do is simply re-boot. A less drastic approach is to yank back a resource from a process to break a cycle. As we saw, if there are no cycles, there is no deadlock. If the resource is not preemptable, snatching it back from a process may do irreparable harm to the process. It may be necessary to kill the process, under the principle that at least that's better than crashing the whole system.

Sometimes, we can do better. For example, if we checkpoint a process from time to time, we can roll it back to the latest checkpoint, hopefully to a time before it grabbed the resource in question. Database systems use checkpoints, as well as a a technique called logging, allowing them to run processes “backwards,” undoing everything they have done. It works like this: Each time the process performs an action, it writes a log record containing enough information to undo the action. For example, if the action is to assign a value to a variable, the log record contains the previous value of the record. When a database discovers a deadlock, it picks a victim and rolls it back.

Rolling back processes involved in deadlocks can lead to a form of starvation, if we always choose the same victim. We can avoid this problem by always choosing the youngest process in a cycle. After being rolled back enough times, a process will grow old enough that it never gets chosen as the victim--at worst by the time it is the oldest process in the system. If deadlock recovery involves killing a process altogether and restarting it, it is important to mark the “starting time” of the reincarnated process as being that of its original version, so that it will look older that new processes started since then.

When should you check for deadlock? There is no one best answer to this question; it depends on the situation. The most “eager” approach is to check whenever we do something that might create a deadlock. Since a process cannot create a deadlock when releasing resources, we only have to check on allocation requests. If the OS always grants requests as soon as possible, a successful request also cannot create a deadlock. Thus the we only have to check for a deadlock when a process becomes blocked because it made a request that cannot be immediately granted. However, even that may be too frequent. As we saw, the deadlock-detection algorithm can be quite expensive if there are a lot of processes and resources, and if deadlock is rare, we can waste a lot of time checking for deadlock every time a request has to be blocked.

What's the cost of delaying detection of deadlock? One possible cost is poor CPU utilization. In an extreme case, if all processes are involved in a deadlock, the CPU will be completely idle. Even if there are some processes that are not deadlocked, they may all be blocked for other reasons (e.g. waiting for I/O). Thus if CPU utilization drops, that might be a sign that it's time to check for deadlock. Besides, if the CPU isn't being used for other things, you might as well use it to check for deadlock!

On the other hand, there might be a deadlock, but enough non-deadlocked processes to keep the system busy. Things look fine from the point of view of the OS, but from the selfish point of view of the deadlocked processes, things are definitely not fine. If the processes may represent interactive users, who can't understand why they are getting no response. Worse still, they may represent time-critical processes (missile defense, factory control, hospital intensive care monitoring, etc.) where something disastrous can happen if the deadlock is not detected and corrected quickly. Thus another reason to check for deadlock is that a process has been blocked on a resource request “too long.” The definition of “too long” can vary widely from process to process. It depends both on how long the process can reasonably expect to wait for the request, and how urgent the response is. If an overnight run deadlocks at 11pm and nobody is going to look at its output until 9am the next day, it doesn't matter whether the deadlock is detected at 11:01pm or 8:59am. If all the processes in a system are sufficiently similar, it may be adequate simply to check for deadlock at periodic intervals (e.g., one every 5 minutes in a batch system; once every millisecond in a real-time control system).

Deadlock Prevention

There are four necessary condition for deadlock.

Mutual Exclusion. Resources are not sharable.
Non-preemption. Once a resource is given to a process, it cannot be revoked until the process voluntarily gives it up.
Hold/Wait. It is possible for a process that is holding resources to request more.
Cycles. It is possible for there to be a cyclic pattern of requests.

It is important to understand that all four conditions are necessary for deadlock to occur. Thus we can prevent deadlock by removing any one of them.

There's not much hope of getting rid of condition (1)--some resources are inherently non-sharable--but attacking (2) can be thought of as a weak form of attack on (1). By borrowing back a resource when another process needs to use it, we can make it appear that the two processes are sharing it. Unfortunately, not all resources can be preempted at an acceptable cost. Deadlock recovery, discussed in the previous section, is an extreme form of preemption.

We can attack condition (3) either by forcing a process to allocate all the resources it will ever need at startup time, or by making it release all of its resources before allocating any more. The first approach fails if a process needs to do some computing before it knows what resources it needs, and even it is practical, it may be very inefficient, since a process that grabs resources long before it really needs them may prevent other processes from proceeding. The second approach (making a process release resources before allocating more) is in effect a form of preemption and may be impractical for the same reason preemption is impractical.

An attack on the fourth condition is the most practical. The algorithm is called hierarchical allocation If resources are given numbers somehow (it doesn't matter how the numbers are assigned), and processes always request resources in increasing order, deadlock cannot occur.

Proof.: As we have already seen, a cycle in the waits-for graph is necessary for there to be deadlock. Suppose there is a deadlock, and hence a cycle. A cycle consists of alternating resources and processes. As we walk around the cycle, following the arrows, we see that each process holds the resource preceding it and has requested the one following it. Since processes are required to request resources in increasing order, that means the numbers assigned to the resources must be increasing as we go around the cycle. But it is impossible for the number to keep increasing all the way around the cycle; somewhen there must be drop. Thus we have a contradiction: Either some process violated the rule on requesting resources, or there is no cycle, and hence no deadlock.

More precisely stated, the hierarchical allocation algorithm is as follows:

When a process requests resources, the requested resources must all have numbers strictly greater than the number of any resource currently held by the process.

This algorithm will work even if some of the resources are given the same number. In fact, if they are all given the same number, this rule reduces to the “no-hold-wait” condition, so hierarchical allocation can also be thought of as a relaxed form of the no-hold-wait condition.

These ideas can be applied to the Dining Philosophers problem. Dijkstra's solution to the dining philosophers problem gets rid of hold-wait. The mutex semaphore allows a philosopher to pick up both forks “at once.” Another algorithm would have a philosopher pick up one fork and then try to get the other one. If he can't, he puts down the first fork and starts over. This is a solution using preemption. It is not a very good solution (why not?).

If each philosopher always picks up the lower numbered fork first, there cannot be any deadlock. This algorithm is an example of hierarchical allocation. It is better than Dijkstra's solution because it prevents starvation. (Can you see why starvation is impossible?) The forks don't have to be numbered 0 through 4; any numbering that doesn't put any philosopher between two forks with the same number would do. For example, we could assign the value 0 to fork 0, 1 to all other even-numbered forks, and 2 to odd-numbered forks. (One numbering is better than the other. Can you see why?)

Deadlock Avoidance

The final approach we will look at is called deadlock avoidance. In this approach, the OS may delay granting a resource request, even when the resources are available, because doing so will put the system in an unsafe state where deadlock may occur later. The best-known deadlock avoidance algorithm is called the “Banker's Algorithm,” invented by the famous E. W. Dijkstra.

This algorithm can be thought of as yet another relaxation of the the no-hold-wait restriction. Processes do not have to allocate all their resources at the start, but they have to declare an upper bound on the amount of resources they will need. In effect, each process gets a “line of credit” that is can drawn on when it needs it (hence the name of the algorithm).

When the OS gets a request, it “mentally” grants the request, meaning that it updates its data structures to indicate it has granted the request, but does not immediately let the requesting process proceed. First it checks to see whether the resulting state is “safe”. If not, it undoes the allocation and keeps the requester waiting.

To check whether the state is safe, it assumes the worst case: that all running processes immediately request all the remaining resources that their credit lines allow. It then checks for deadlock using the algorithm above. If deadlock occurs in this situation, the state is unsafe, and the resource allocation request that lead to it must be delayed.

To implement this algorithm in Java, we will need one more table beyond those defined above.


    creditLine[p][r] = number of units of r reserved by process p but not yet allocated to it

Here's the procedure


    /** Try to satisfy a particular request in the state indicated by the
     ** global arrays curAlloc, creditLine, and available.
     ** If the request can be safely granted, update the global state
     ** appropriately and return true.
     ** Otherwise, leave the state unchanged and return false.
     */
    boolean tryRequest(int p, int[] req) {
        if (!lessOrEqual(req, creditLine[p])) {
            System.out.println("process " + p
                + " is requesting more than it reserved!");
            return false;
        }
        if (!lessOrEqual(req, available)) {
            System.out.println("process " + p
                + " is requesting more than there is available!");
            return false;
        }
        int[] saveAvail = copy(available);
        int[][] saveAlloc = copy(curAlloc);
        int[][] saveLine = copy(creditLine);

        // Tentatively give him what he wants
        decr(available, req);
        decr(creditLine[p], req);
        incr(curAlloc[p], req);

        if (safe()) {
            return true;
        }
        else {
            curAlloc = saveAlloc;
            available = saveAvail;
            creditLine = saveLine;
            return false;
        }
    }
    /** Check whether the current state is safe.  */
    boolean safe() {
        // Assume everybody immediately calls in their credit.
        int[][] request = copy(creditLine);

        // See whether that causes a deadlock.
        return !deadlocked(request);
    }

When a process p starts, creditLine[p][r] is set to p's declared maximum claim on resource r. Whenever p is granted some resource, not only is the amount deducted from available, it is also deducted from creditLine.

When a new request arrives, we first see if it is legal (it does not exceed the requesting process' declared maximum allocation for any resources), and if we have enough resources to grant it. If so, we tentatively grant it and see whether the resulting state is safe. To see whether a state is safe, we consider a “worst-case” scenario. What if all processes suddenly requested all the resources remaining in their credit lines? Would the system deadlock? If so, the state is unsafe, so we reject the request and “ungrant” it.

The code written here simply rejects requests that cannot be granted because they would lead to an unsafe state or because there are not enough resources available. A more complete version would record such requests and block the requesting processes. Whenever another process released some resources, the system would update the state accordingly and reconsider all the blocked processes to see whether it could safely grant the request of any of them.

An Example

A system has three classes of resource: A, B, and C. Initially, there are 8 units of A and 7 units each of resources B and C. In other words, the array U above has the value { 8, 7, 7 }. There are five processes that have declared their maximum demands, and have been allocated some resources as follows:

Process	Maximum Demand			CurrentAllocation
Process	A	B	C	A	B	C
1	4	3	6	1	1	0
2	0	4	4	0	2	1
3	4	2	2	1	1	1
4	1	6	3	0	0	2
5	7	3	2	2	1	0

(The table CurrentAllocation is the array curAlloc in the Java program.)

To run the Bankers Algorithm, we need to know the amount of remaining credit available for each process (credLine[p][r]), and the amount resources left in the bank after the allocations (available[r]). The credit line for a process and resource type is computed by subtracting the current allocation for that process and resource from the corresponding maximum demand.

Process	Remaining Credit
Process	A	B	C
1	3	2	6
2	0	2	3
3	3	1	1
4	1	6	1
5	5	2	2

The value available[r] is calculated by subtracting from U[r] the sum of the r^th column of curAlloc: available = { 4, 2, 3 }.

If process 4 were to request two units of resource C, the request would be rejected as an error because process 4 initially declared that it would never need more than 3 units of C and it has already been granted 2.

A request of five units of resource A by process 5 would be delayed, even though it falls within his credit limit, because 4 of the original 8 units of resource A have already been allocated, leaving only 4 units remaining.

Suppose process 1 were to request 1 unit each of resources B and C. To see whether this request is safe, we grant the request by subtracting it from process 1's remaining credit and adding it to his current allocation, yielding

Process	Current Allocation			Remaining Credit
Process	A	B	C	A	B	C
1	1	2	1	3	1	5
2	0	2	1	0	2	3
3	1	1	1	3	1	1
4	0	0	2	1	6	1
5	2	1	0	5	2	2

We also have to subtract the allocation from the amount available, yeilding available = { 4, 1, 2 }.

To see whether the resulting state is safe, we treat the Remaining Credit array as a Request array and check for deadlock. We note that the amounts in available are not enough to satisfy the request of process 1 because it wants 5 more units of C and we have only 2. Similarly, we cannot satisfy 2, 4, or 5 because we have only one unit remaining of B and they all want more than that. However, we do have enough to grant 3's request. Therefore, we assume that we will give process 3 its request, and it will finish and return those resources, along with the remaining resources previously allocated to it, and we will increase our available holdings to { 5, 2, 3 }. Now we can satisfy the request of either 2 or 5. Suppose we choose 2 (it doesn't matter which process we choose first). After 2 finishes we will have { 5, 4, 4 } and after 5 finishes, our available will increase to { 7, 5, 4 }. However, at this point, we do not have enough to satisfy the request of either of the remaining processes 1 or 4, so we conclude that the system is deadlocked, so the original request was unsafe.

If the original request (1 unit each of B and C) came from process 2 rather than 1, however, the state would be found to be safe (try it yourself!) and so it would be granted immediately.