** Locks ** From the last note, we saw that we had a fundamental problem in concurrent programming: we would like to execute a series of instructions atomically, but due to the presence of interrupts, we couldn't. In this note, we thus attack the problem of how to provide what is called a *lock*: basically, what we are trying to do is put some code around these critical sections and thus enable them to appear to execute as if they were a single atomic instruction. As an example, assume our critical section was code that looked like this: balance = balance + 1; We would then add some code around it to achieve the desired effect: lock(); balance = balance + 1; unlock(); Note that sometimes a lock is called a *mutex*, as it is used to provide mutual exclusion. Thus, when you see the following posix threads code, you should understand that it is doing the same thing as above: pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_lock(&lock); balance = balance + 1; pthread_mutex_unlock(&lock); You might also notice here that the posix version passes a variable to lock and unlock, as we may be using *different* locks to protect different variables. Doing so increases concurrency: instead of one big lock that is used any time any critical section is accessed (a *coarse-grained* locking strategy), one will often protect different data and data structures with different locks, thus allowing more threads to be in locked code at once (a more *fine-grained* approach). [REQUIREMENTS] Before building some synchronization primitives (such as locks), we first list three requirements that we'd like for our solution: *Mutual exclusion*: Obviously, we'd like to ensure that only one process at a time can enter a critical section. Thus, our most basic requirement is that our solution does so. *Deadlock freedom:* When using locks and other synchronization primitives, we are basically allowing one thread to enter a critical section while preventing others from doing so. Thus, when other threads run and try to enter a critical, they may be forced to *wait*. In more complex locking scenarios, we may accidentally cause all threads to wait, thus prohibiting the program from making progress. We call such a situation *deadlock*, and we'll be discussing it in some detail later on. Any solution we have should not lead to deadlock when used properly; thus, our solution should ensure that threads can make *progress*. *Starvation free:* Because we have multiple threads potentially trying to enter a critical section, and many of them wait, what our ideal solution would guarantee is that eventually, all of the waiters will enter the critical section. We sometimes call this requirement *bounded wait*, as we would like to ensure that a waiting thread only needs to wait for a bounded, finite amount of time before entering a critical section. (NEED: some references here to the early papers that lay out these properties) Beyond these requirements, we of course will be aware of the usual things. For example, simplicity is good; the primitives that we develop should be easy to use. Performance is also paramount; paying a high cost to enter and leave critical sections will defeat one of the purposes of using threads, which is higher performance through parallelism. [FIRST TRY: TURN OFF THOSE PESKY INTERRUPTS] Given that interrupts were the problem, we could attack them directly in our solution. Thus, here is one way to implement lock() and unlock(): -------------------------------------------------------------------------------- void lock() { DisableInterrupts(); } void unlock() { EnableInterrupts(); } -------------------------------------------------------------------------------- By turning off interrupts before entering a critical section, we ensure that the code inside the critical section will *not* be interrupted, and thus will execute as if it were atomic. When we are finished, we reenable interrupts and thus the program proceeds as usual. The main positive of this approach is its simplicity. You certainly don't have to scratch your head too hard to figure out why this works. The negatives, unfortunately, are many. First, this approach requires us to allow any calling thread to perform a *privileged* operation (turning interrupts on and off), and thus *trust* that this facility is not abused. As you already know, any time we are required to trust an arbitrary program, we are probably in trouble. Here, we could get in trouble in a few ways: a greedy program could call lock() at the beginning of its execution and thus monopolize the processor; worse, an errant or malicious program could call lock() and go into an endless loop. In this latter case, the OS will never regain control of the system, and the only way to address the problem is to restart the system. Second, the approach does not work on multiprocessors. Each CPU typically has its own interrupt mask and thus turning off one will not prevent a thread running on another processor from entering the critical section. Third, and probably least important, this approach can be inefficient. Compared to normal instruction execution, code that masks or unmasks interrupts tends to be executed slowly by modern processors. For these reasons, turning off interrupts is only used in limited contexts as a mutual-exclusion primitive. For example, in a single CPU system, the operating system itself will sometimes use interrupt masking to guarantee atomicity when accessing its own data structures. This usage makes sense, as the OS always trusts itself to perform privileged operations anyhow. [NEXT TRY: PETERSON'S ALGORITHM, OR WHAT TO DO WITHOUT HARDWARE SUPPORT] We next develop an approach to build a locking primitive but without resorting to the interrupt disabling of above. This approach was developed by G. Peterson of the University of Rochester in 1981; hence, the moniker Peterson's algorithm. The neat thing about this approach is that it does not assume much of the hardware. Specifically, it assumes that loads and stores are atomic and that they execute in order. No special synchronization instructions are required (below, we will add some new hardware instructions to help us with synchronization). [FIRST ATTEMPT: TEST AND SET] We now start to sketch out key pieces of Peterson's approach. The first component is a *flag* variable. Assume in this example, we only have two threads that may (or may not) enter our critical section. We thus add a flag which can (hopefully) be used to test whether or not a thread is in the critical section. Note that in all of these examples, we will always write three pieces of pseudocode: an init() code to set any variables we need to their desired initial values, and of course lock() and unlock(). Also note that we will not write this code using a lock variable (as in the pthread example above); it is a straightforward exercise to add such a variable to the code snippets if so desired. -------------------------------------------------------------------------------- void init() { // 0 indicates that lock is available, 1 that it is held by a thread flag = 0; } void lock() { while (flag == 1) ; // spin-wait (do nothing) flag = 1; } void unlock() { flag = 0; } -------------------------------------------------------------------------------- [FIGURE: TRY #1 WITH A SINGLE FLAG] In this first attempt, the idea is quite simple: use a simple variable to indicate whether some thread has a lock. The first thread that enters the critical section will call lock(), which *tests* whether the flag is equal to 1 (in this case, it is not), and then *sets* the flag to 1 to indicate that the thread now *holds* the lock. When finished with the critical section, the thread calls unlock() and clears the flag, thus indicating that the lock is no longer held. If another thread happens to call lock() while that first thread is in the critical section, it will simply *spin-wait* in the while loop for that thread to call unlock() and clear the flag. Once that first thread does so, the waiting thread will fall out of the while loop, set the flag to 1 for itself, and proceed into the critical section. Unfortunately, this piece of code has two problems: one of correctness, and another of performance. The correctness problem is simple to see once you get used to thinking about concurrent programming. Imagine the following code interleaving (assume we start in the state flag=0): -------------------------------------------------------------------------------- Thread 0 Thread1 call lock() while (flag == 1) // it doesn't, so continue [INTERRUPT, SWITCH TO THREAD 1] call lock() while (flag == 1) // doesn't, so continue flag = 1; // set flag to 1 [INTERRUPT, SWITCH TO THREAD 0] flag = 1; // also sets flag to 1(!) -------------------------------------------------------------------------------- [FIGURE: WHY TRY #1 FAILS] As you can see from this interleaving, with timely (untimely?) interrupts, we can easily produce a case where *both* threads set their flags to 1 and both threads are thus able to enter the critical section. This is bad! We have obviously failed to provide the most basic requirement: providing mutual exclusion. The performance problem, which we will address more later on, is the fact that the way a thread waits to acquire a lock that is already held: it endlessly checks the value of flag, a technique known as *spin-waiting*. Spin-waiting wastes time waiting for another thread to release a lock. The waste is exceptionally high on a uniprocessor, where the thread that the waiter is waiting for cannot even run! Thus, as we move forward and develop more sophisticated solutions, we should also consider ways to avoid this kind of waste. [The MALICIOUS SCHEDULER: THE WAY YOU SHOULD THINK] What we also get from this example is a sense of the approach we need to take when trying to understand concurrent execution. What you are really trying to do is to pretend you are a *malicious scheduler*, one that interrupts threads at the most inopportune of times in order to foil their feeble attempts at building synchronization primitives. Although the exact sequence of interrupts may be *improbable*, it is *possible*, and that is all we need to show to demonstrate that a particular approach does not work. [SECOND ATTEMPT: PER-THREAD FLAGS] Our next try will avoid this problem by adding a single flag per thread. The code for this attempt is found here: -------------------------------------------------------------------------------- void init() { // 1 indicates that the thread wants to enter the critical section flag[0] = flag[1] = 0; } void lock() { flag[self] = 1; while (flag[1-self] == 1) ; // spin-wait } void unlock() { flag[self] = 0; } -------------------------------------------------------------------------------- [FIGURE: TRY #2 WITH A PER-THREAD FLAGS] A small note about the code: we now will still assume that there are only two threads that may enter this code. We will further assume that each has access to some kind of thread identification number which we will call *self*. For thread 0, self=0, and for thread 1, self=1. To get access to the other thread, one simply should access 1 minus self (i.e., because for thread 1, 1-self is 0, and for thread 0, 1-self is 1). This approach tries to get around the test-then-set approach by first setting one's own flag and then testing the other flag (i.e., set-then-test). Thus, it provides mutual exclusion: there is no way for two threads to enter the critical section. Excellent! However, it has a different problem: deadlock. Not excellent. Check out this interleaving: -------------------------------------------------------------------------------- Thread 0 Thread1 call lock() flag[0] = 1; [INTERRUPT, SWITCH TO THREAD 1] flag[1] = 1; while (flag[0] == 1) ; //spins forever! ... [INTERRUPT, SWITCH TO THREAD 0] while (flag[1] == 1) ; // spins forever too! -------------------------------------------------------------------------------- [FIGURE: WHY TRY #2 FAILS] As you can see, it is possible for thread 0 to sets its own flag and an interrupt to occur *before* the test of thread 1's flag. In that case, thread 1 sets its own flag and then tests to see if thread 0 has set its flag, which it already has! Thus, thread 1 spins waiting for thread 0 to release the lock. Unfortunately, when thread 0 runs (eventually, after the time slice is up and the timer interrupt goes off), it too hits the while loop, tests the other thread's flag, finds that it is set as well, and also spins. Thus, we have a *deadlock*; both processes spin indefinitely. [THIRD ATTEMPT: WHOSE TURN IS IT?] We will now try a different approach to getting into the critical section, using something we will call a *turn* variable. The basic idea here is to use a single variable to determine whose turn it is to enter the critical section. Because setting the turn to 1 or 0 is atomic, we should be able to use the turn to make sure only one thread gets into the critical section. The code for this approach is found here: -------------------------------------------------------------------------------- void init() { turn = 0; } void lock() { // wait for my turn (or rather, for it NOT to be the other thread's turn) while (turn == (1 - self)) ; // spin-wait } void unlock() { // now I am done, so I will make it the other thread's turn turn = 1 - self; } -------------------------------------------------------------------------------- [FIGURE: TRY #3 WITH A TURN VARIABLE] This approach also provides mutual exclusion, because the turn is set atomically (i.e., the C statement that sets "turn = 1 - self" boils down to a single store without worry of a race condition), and the test is quite simple: just wait for it to be your turn. In this way, we achieve mutual exclusion. Unfortunately, the turn variable has a problem. While it will work well when there are two threads alternating which thread gets to enter the critical section, it does *not* work when one thread tries to enter the critical section twice in a row: -------------------------------------------------------------------------------- Thread 0 Thread1 call lock() while (turn == (1 - self) // it doesn't, so continue ... // do whatever it is you do in critical section call unlock() turn = 1 - self; // sets turn to 1 call lock() while (turn == (1 - self) // it does, alas ; // spin forever! -------------------------------------------------------------------------------- [FIGURE: WHY TRY #3 FAILS] As you can see from the diagram, although thread 0 can acquire the lock once, the next time it tries, the turn is set to 1, and thus thread 0 waits forever for its turn to come. Clearly, a desirable solution will allow a critical section to be entered twice in a row by the same thread. [FINALLY: PETERSON'S ALGORITHM] We now have all the ingredients to assemble Peterson's algorithm. The idea is simple: combine the per-thread flags to indicate intent to enter the lock, and use the turn variable to determine which thread should enter in the rare case that both wish to enter at the same time. Here is the code: -------------------------------------------------------------------------------- void init() { flag[0] = flag[1] = 0; // 1 -> thread wants to acquire lock (intent) turn = 0; // whose turn is it? (thread 0 or thread 1?) } void lock() { flag[self] = 1; turn = 1 - self; // be generous: make it the other thread's turn while ((flag[1-self] == 1) && (turn == 1 - self)) ; // spin-wait while other thread has intent AND it is other thread's turn } void unlock() { flag[self] = 0; // simply undo your intent } -------------------------------------------------------------------------------- [FIGURE: TRY #4, A.K.A. PETERSON'S ALGORITHM] To understand why this approach works, let us go through a few interesting cases, examining where previous approaches failed and why this one will succeed. First, let us assume we just have one of our two threads repeatedly entering the critical section. With just the turn variable, we saw this was a problem before, as the turn was sometimes set to the other thread and thus required threads to alternate in acquiring the lock. Here, it seems like it could be worse, as the calling thread always sets the turn variable to the other thread's turn. Fortunately, we have the intent flag to help us out: -------------------------------------------------------------------------------- Thread 0 Thread1 call lock() flag[0] = 1; turn = 1; // set it to other thread's turn while ((flag[1] == 1) && (turn == 1)) // does not spin, because other thread has no intent (flag[1] = 0) ... critical section ... flag[0] = 0; // next call to lock by thread 0 will proceed the same way -------------------------------------------------------------------------------- [FIGURE: REPEATED CALLS TO LOCK WORK FINE] Thus, we can see that the problem with turn variables is addressed with the addition of per-thread flags. What about mutual exclusion? See if you can prove to yourself that Peterson's does what we need it to do, assuming (again) that loads and stores to memory occur in order and are atomic. [THE PROBLEM WITH PETERSON'S] Peterson's algorithm is a great way to start thinking about multi-threaded programming. Unfortunately, it has a few problems that prevent us from using it. First, spin-waiting can be highly wasteful, as a thread can spend a great deal of CPU time waiting for another thread to release a lock. Secondly, for reasons that are outside the scope of this document, Peterson's algorithm actually does not work on modern out-of-order processors (turns out modern CPUs do some crazy things behind the scenes to improve performance; those things cause instructions to be executed out of order as compared to how they are issued and that makes algorithms such as this not work as expected). Thus, we are going to need some help from the hardware and the OS itself to get locking to work properly.