** Monitors **

Around the time concurrent programming was becoming a big deal,
object-oriented programming was also gaining ground. Not surprisingly, people
started to think about ways to merge synchronization into a more structured
programming environment.

One such approach that emerged was the *monitor*. First described by Per
Brinch Hansen [1] and later refined by Tony Hoare [2], the idea behind a
monitor is quite simple. Consider the following pretend monitor written in C++
notation:

--------------------------------------------------------------------------------
monitor class account {
private:
    int balance = 0;

public:
    void deposit(int amount) {
        balance = balance + amount;
    }
    void withdraw(int amount) {
        balance = balance - amount;
    }
};
--------------------------------------------------------------------------------
                    [FIGURE: A PRETEND MONITOR CLASS]

(Note: it is pretend because C++ does not support monitors, and hence the
*monitor* keyword does not exist. However, Java does support monitors, with
what are called *synchronized* methods. Below, we will examine both how to
make something quite like a monitor in C/C++, as well as how to use Java
synchronized methods).

In this example, you may notice we have our old friend the account and some
routines to deposit and withdraw an amount from the balance. As you also may
notice, these are *critical sections*; if they are called by multiple threads
concurrently, you have a race condition and the potential for an incorrect
outcome. 

In a monitor class, you don't get into trouble, though, because the monitor
guarantees that *only one thread can be active within the monitor at a time*.
Thus, our above example is a perfectly safe and working piece of code;
multiple threads can call deposit() or withdraw() and know that mutual
exclusion is preserved.

How does the monitor do this? Simple: with a lock. Whenever a thread tries to
call a monitor routine, it implicitly tries to acquire the monitor lock. If it
succeeds, then it will be able to call into the routine and run the method's
code. If it does not, it will block until the thread that is in the monitor
finishes what it is doing. Thus, if we wrote a C++ class that looked like the
following, it would accomplish the exact same goal as the monitor class above:

--------------------------------------------------------------------------------
class account {
private:
    int balance = 0;
    pthread_mutex_t monitor;

public:
    void deposit(int amount) {
        pthread_mutex_lock(&monitor);
        balance = balance + amount;
        pthread_mutex_unlock(&monitor);
    }
    void withdraw(int amount) {
        pthread_mutex_lock(&monitor);   
        balance = balance - amount;
        pthread_mutex_unlock(&monitor);
    }
};
--------------------------------------------------------------------------------
               [FIGURE: A C++ CLASS THAT ACTS LIKE A MONITOR]

Thus, as you can see from this example, the monitor isn't doing too much for
you automatically. Basically, it is just acquiring a lock and releasing it. By
doing so, we achieve what the monitor requires: only one thread will be active
within deposit() or withdraw(), as desired.

[WHY BOTHER WITH MONITORS?]

You might wonder why monitors were invented at all, instead of just using
explicit locking. At the time, object-oriented programming was just coming
into fashion. Thus, the idea was to gracefully blend some of the key concepts
in concurrent programming with some of the basic approaches of object
orientation. Nothing more than that.

[DO WE GET MORE THAN AUTOMATIC LOCKING?]

Back to business. As we know from our discussion of semaphores, just having
locks is not quite enough; for example, to implement the producer/consumer
solution, we previously used semaphores to both put threads to sleep when
waiting for a condition to change (e.g., a producer waiting for a buffer to be
emptied), as well as to wake up a thread when a particular condition has
changed (e.g., a consumer signaling that it has indeed emptied a buffer).

Monitors support such functionality through an explicit construct known as a
*condition variable*. Let's take a look at the code for the producer/consumer
solution, but written with monitors and condition variables. 

--------------------------------------------------------------------------------
monitor class BoundedBuffer {
  private:
    int buffer[MAX];
    int fill, use;
    int fullEntries = 0;
    cond_t empty;
    cond_t full;

  public:
    void produce(int element) {
        if (fullEntries == MAX)    // line P0
            wait(&empty);          // line P1
        buffer[fill] = element;    // line P2
        fill = (fill + 1) % MAX;   // line P3
        fullEntries++;             // line P4
        signal(&full);             // line P5
    }

    int consume() {
        if (fullEntries == 0)      // line C0
            wait(&full);           // line C1
        int tmp = buffer[use];     // line C2
        use = (use + 1) % MAX;     // line C3
        fullEntries--;             // line C4
        signal(&empty);            // line C5
        return tmp;                // line C6
    }
}
--------------------------------------------------------------------------------
       [FIGURE: PRODUCER/CONSUMER WITH MONITORS AND HOARE SEMANTICS]

In this monitor class, we have two routines, produce() and consume(). A
producer thread would repeatedly call produce() to put data into the bounded
buffer, while a consumer() would repeatedly call consume(). The example is a
modern paraphrase of Hoare's solution [2].

You should notice some similarities betewen this code and the semaphore-based
solution in the previous note. One major difference is how condition variables
must be used in concert with an explicit *state variable*; in this case, the
integer *fullEntries* determines whether a producer or consumer must wait,
depending on its state. Semaphores, in contrast, have an internal numeric
value which serves this same purpose. Thus, condition variables must be paired
with some kind of external state value in order to achieve the same end.

The most important aspect of this code, however, is the use of the two
condition variables, empty and full, and the respective *wait()* and
*signal()* calls that employ them. These operations do exactly what you might
think: wait() blocks the calling thread on a given condition; signal() wakes
one waiting thread that is waiting on the given condition. 

However, there are some subtleties in how these calls operate; understanding
the semantics of these calls is critically important to understanding why this
code works. In what researchers in operating systems call *Hoare semantics*
(yes, a somewhat unfortunate name), the signal() immediately wakes one waiting
thread and runs it; thus, the monitor lock, which is implicitly held by the
running thread, immediately is transferred to the woken thread which then runs
until it either blocks or exits the monitor. Note that there may be more than
one thread waiting; signal() only wakes one waiting thread and runs it, while
the others must wait for a subsequent signal.

A simple example will help us understand this code better. Imagine there are
two threads, one a producer and the other a consumer. The consumer gets to run
first, and calls consume(), only to find that fullEntries = 0 [line C0], as
there is nothing in the buffer yet. Thus, it calls wait(&full) [C1], and waits
for a buffer to be filled. The producer then runs, finds it doesn't have to
wait [line P0], puts an element into the buffer [P2], increments the fill
index [P3] and the fullEntries count [P4], and calls signal(&full) [P5]. In
Hoare semantics, the producer does not continue running after the signal;
rather, the signal immediately transfers control to the waiting consumer,
which returns from wait() [C1] and immediately consumes the element produced
by the producer [C2 and so on]. Only after the consumer returns will the
producer get to run again and return from the produce() routine.

[WHERE THEORY MEETS PRACTICE]

Tony Hoare, who wrote the solution above and came up with the exact semantics
for signal() and wait(), was a theoretician. Clearly a smart guy, too; he came
up with quicksort() after all [3]. However, the semantics of signaling and
waiting, as it turns out, were not ideal for a real implementation. As the old
saying goes, in theory, there is no difference between theory and practice,
but in practice, there is.

A few years later, Butler Lampson and David Redell of Xerox PARC were building
a concurrent language known as *Mesa*, and decided to use monitors as their
basic concurrency primitive [4]. They were well-known systems researchers, and
they soon found that Hoare semantics, while more amenable to proofs, were hard
to realize in a real system (there are a lot of reasons for this, but perhaps
too detailed to go through here).

In particular, to build a working monitor implementation, Lampson and Redell
decided to change the meaning of signal() in a subtle but critical way. The
signal() routine now was just considered a *hint* [5]; it would move a single
waiting thread from the blocked state to a runnable state, but it would not
run it immediately. Rather, the signaling thread would retain control until it
exited the monitor and was naturally descheduled.

[OH OH, A RACE]

Given these new *Mesa semantics*, let us again reexamine the code
above. Imagine again a consumer (consumer 1) who enters the the monitor and
finds the buffer empty and thus waits [C1]. Now the producer comes along and
fills the buffer and signals that a buffer has been filled, moving the waiting
consumer from *blocked* on the full condition variable to *ready*. The
producer keeps running for a while, and eventually gives up the CPU.

But Houston, we have a problem. Can you see it? Imagine a different consumer
(consumer 2) now calls into the consume() routine; it will find a full buffer,
consume it, and return, setting fullEntries to 0 in the meanwhile. Can you see
the problem yet? Well, here it comes. Our old friend consumer 1 now finally
gets to run, and returns from wait(), expecting a buffer to be full [C1...];
unfortunately, this is no longer true, as consumer 2 snuck in and consumed the
buffer before consumer 1 had a chance to consume it. Thus, the code doesn't
work, because in the time between the signal() by the producer and the return
from wait() by consumer 1, the condition has changed. This timeline
illustrates the problem:

--------------------------------------------------------------------------------
 Producer                     Consumer1                     Consumer2
                                 C0 (fullEntries=0)
                                 C1 (Consumer1 -> blocked)
  P0 (fullEntries=0)
  P2 
  P3
  P4 (fullEntries=1)
  P5 (consumer1 -> ready)
                                                              C0 (fullEntries=1)
                                                              C2
                                                              C3
                                                              C4 (fullEntries=0)
                                                              C5
                                                              C6
                                 C2 (oh oh, using a buffer,
                                     but fullEntries=0!)
--------------------------------------------------------------------------------
        [FIGURE: WHY THE CODE DOESN'T WORK WITH HOARE SEMANTICS]

Fortunately, the switch from Hoare semantics to Mesa semantics requires only a
small change by the programmer to realize a working solution. Specifically,
when woken, a thread should *recheck* the condition it was waiting on; because
signal() is only a hint, it is possible that the condition has changed (even
multiple times) and thus may not be in the desired state when the waiting
thread runs. In our example, two lines of code must change, lines P0 and C0:

--------------------------------------------------------------------------------
  public:
    void produce(int element) {
        while (fullEntries == MAX) // line P0 (CHANGED IF->WHILE TO WORK W/ MESA)
            wait(&empty);          // line P1
        buffer[fill] = element;    // line P2
        fill = (fill + 1) % MAX;   // line P3
        fullEntries++;             // line P4
        signal(&full);             // line P5
    }

    int consume() {
        while (fullEntries == 0)   // line C0 (CHANGED IF->WHILE TO WORK W/ MESA)
            wait(&full);           // line C1
        int tmp = buffer[use];     // line C2
        use = (use + 1) % MAX;     // line C3
        fullEntries--;             // line C4
        signal(&empty);            // line C5
        return tmp;                // line C6
    }
}
--------------------------------------------------------------------------------
        [FIGURE: PRODUCER/CONSUMER WITH MONITORS AND MESA SEMANTICS]

Not too hard after all. Because of the ease of this implementation, virtually
any system today that uses condition variables with signaling and waiting uses
Mesa semantics. Thus, if you remember nothing else at all from this class, you
can just remember: *always recheck the condition after being woken!* Put in
even simpler terms, *use while loops* and not if statements when checking
conditions. Note that this is always correct, even if somehow you are running
on a system with Hoare semantics; in that case, you would just needlessly
retest the condition an extra time.

[PEEKING UNDER THE HOOD A BIT]

To understand a bit better why Mesa semantics are easier to implement, let's
understand a little more about the implementation of Mesa monitors. In their
work [7], Lampson and Redell describe three different types of queues that a
thread can be a part of at a given time: the *ready* queue, a *monitor lock*
queue, and a *condition variable* queue. Note that a program might have
multiple monitor classes and multiple condition variable instances; there is a
queue per instance of said items.

With a single bounded buffer monitor, we thus have four queues to
consider: the ready queue, a single monitor queue, and two condition variable
queues (one for the full condition and one for the empty). To better
understand how a thread library manages these queues, what we will do is show
how a thread transitions through these queues in the producer/consumer
example.

In this example, we walk through a case where a consumer might be woken up but
find that there is nothing to consume. Let us consider the following
timeline. On the left are two consumers (Con1 and Con2) and a producer (Prod)
and which line of code they are executing; on the right is the state of each
of the four queues we are following for this example: the ready queue of
runnable processes, the monitor lock queue called Monitor, and the empty and
full condition variable queues. We also track time (t), the thread that is
running (square brackets around the thread on the ready queue that is
running), and the value of fullEntries (FE).

----------------------------------------------------------------------------------------------------------------
 t | Con1 Con2 Prod | Ready             | Monitor | Empty | Full | FE | Comment
----------------------------------------------------------------------------------------------------------------
 0 | C0             | [Con1],Prod,Con2  |         |       |      |  0 | 
 1 | C1             | [Con1],Prod,Con2  |         |       | Con1 |  0 | Con1 waiting on full (remove self from ready)
 2 |<Context switch>|       [Prod],Con2 |         |       | Con1 |  0 | switch from Con1 to Prod
 3 |           P0   |       [Prod],Con2 |         |       | Con1 |  0 | 
 4 |           P2   |       [Prod],Con2 |         |       | Con1 |  0 | Prod doesn't wait because FE=0
 5 |           P3   |       [Prod],Con2 |         |       | Con1 |  0 | 
 6 |           P4   |       [Prod],Con2 |         |       | Con1 |  1 | Prod changes value of fullEntries
 7 |           P5   | [Prod],Con2,Con1  |         |       |      |  1 | Prod signals, moving Con1 back to ready queue
 8 |<Context switch>| Prod,[Con2],Con1  |         |       |      |  1 | switch from Prod to Con2
 9 |       C0       | Prod,[Con2],Con1  |         |       |      |  1 | switch to Con2 (next on ready queue)
10 |       C2       | Prod,[Con2],Con1  |         |       |      |  1 | Con2 doesn't wait because FE=1
11 |       C3       | Prod,[Con2],Con1  |         |       |      |  1 | 
12 |       C4       | Prod,[Con2],Con1  |         |       |      |  0 | Con2 changes value of fullEntries
13 |       C5       | Prod,[Con2],Con1  |         |       |      |  0 | Con2 signals empty (no one is waiting on it)
14 |       C6       | Prod,[Con2],Con1  |         |       |      |  0 | Con2 all done
15 |<Context switch>| Prod,Con2,[Con1]  |         |       |      |  0 | switch from Con2 to Con1
16 |  C0            | Prod,Con2,[Con1]  |         |       |      |  0 | recheck fullEntries, still 0!
17 |  C1            | Prod,Con2,[Con1]  |         |       | Con1 |  0 | wait on full again (remove self from ready)
----------------------------------------------------------------------------------------------------------------
                 [FIGURE: TRACING QUEUE STATUS DURING THE PRODUCER/CONSUMER PROBLEM]

As you can see from the timeline, consumer 2 (Con2) sneaks in and consumes the
available data (t=9..14) before consumer 1 (Con1), who was waiting on the full
condition to be signaled (since t=1), gets a chance to do so. However, Con1
does get woken by the producer's signal (t=7), and thus runs again even though
the buffer is empty by the time it does so. If Con1 didn't recheck the state
variable fullEntries (t=16), it would have erroneously tried to consume data
when no data was present to consume. Thus, this natural implementation is
exactly what leads us to Mesa semantics (and not Hoare).

[OTHER USES OF MONITORS]

In their paper on Mesa, Lampson and Redell also point out a few places where a
different kind of signaling is needed. For example, consider the following
memory allocator:

--------------------------------------------------------------------------------
monitor class allocator {
    int available; // how much memory is available?
    cond_t c;

    void *allocate(int size) {
        while (size > available) 
            wait(&c);
        available -= size;
        // and then do whatever the allocator should do 
        // and return a chunk of memory
    }

    void free(void *pointer, int size) {
        // free up some memory
        available += size;
        signal(&c);
    }
};
--------------------------------------------------------------------------------
                       [A SIMPLE MEMORY ALLOCATOR]

Many details are left out of this example, in order to allow us to focus on
the conditions for waking and signaling. It turns out the signal/wait code
above does not quite work; can you see why?

Imagine two threads call allocate. The first calls allocate(20) and the second
allocate(10). No memory is available, and thus both threads call wait() and
block. Some time later, a different thread comes along and calls free(p, 15),
and thus frees up 15 bytes of memory. It then signals that it has done
so. Unfortunately, it wakes the thread waiting for 20 bytes; that thread
rechecks the condition, finds that only 15 bytes are available, and calls
wait() again. The thread that could have benefitted from the free of 15 bytes,
i.e., the thread that called allocate(10), is not woken.

Lampson and Redell suggest a simple solution to this problem. Instead of a
signal() which wakes a single waiting thread, they employ a *broadcast()*
which wakes *all* waiting threads. Thus, all threads are woken up, and in the
example above, the thread waiting for 10 bytes will find 15 available and
succeed in its allocation. In this way, 

In Mesa semantics, using a broadcast() is *always* correct, as all threads
should recheck the condition of interest upon waking anyhow. However, it may
be a performance problem, and thus should only be used when needed. In this
example, a broadcast() might wake hundreds of waiting threads, only to have
one successfully continue while the rest immediately block again; this
problem, sometimes known as a *thundering herd* [6], is costly, due to all the
extra context switches that occur.

[USING MONITORS TO IMPLEMENT SEMAPHORES]

You can probably see a lot of similarities between monitors and
semaphores. Not surprisingly, you can use one to implement the other. Here, we
show how you might implement a semaphore class using a monitor.

--------------------------------------------------------------------------------
monitor class Semaphore {
    int s; // value of the semaphore
   
    Semaphore(int value) {
        s = value;
    }

    void wait() {
        while (s <= 0) 
            wait();
        s--;
    }

    void post() {
        s++;
        signal();
    }
};
--------------------------------------------------------------------------------
           [FIGURE: IMPLEMENTING A SEMAPHORE WITH A MONITOR]

As you can see, wait() simply waits for the value of the semaphore to be
greater than 0, and then decrements its value, whereas post() increments the
value and wakes one waiting thread (if there is one). It's as simple as that. 

To use this class as a binary semaphore (i.e., a lock), you would just do
the following:

--------------------------------------------------------------------------------
Semaphore s(1);

s.wait(); // grab lock (value of semaphore goes from 1 -> 0
...       // do the critical section
s.post(); // release lock (value of semaphore goes from 0 -> 1)
--------------------------------------------------------------------------------
               [FIGURE: USING THE SEMAPHORE CLASS]

And thus we have shown that monitors can be used to implement semaphores. 

[MONITORS IN THE REAL WORLD]

We already mentioned above that we were using "pretend" monitors in that C++
has no such concept. We now show how can make a monitor-like class in C++, and
how Java uses synchronized methods to achieve a similar end.

[A C++ MONITOR OF SORTS]

Here is the producer/consumer code written in C++ with locks and condition
variables:

--------------------------------------------------------------------------------
class BoundedBuffer {
  private:
    int buffer[MAX];
    int fill, use;
    int fullEntries;
    pthread_mutex_t monitor; // monitor lock
    pthread_cond_t empty;
    pthread_cond_t full;

  public:
    BoundedBuffer() {
        use = fill = fullEntries = 0;
    }
    void produce(int element) {
        pthread_mutex_lock(&monitor);
        while (fullEntries == MAX) 
            pthread_cond_wait(&empty, &monitor); 
        buffer[fill] = element;    
        fill = (fill + 1) % MAX;   
        fullEntries++;             
        pthread_cond_signal(&full);             
        pthread_mutex_unlock(&monitor);
    }

    int consume() {
        pthread_mutex_lock(&monitor);
        while (fullEntries == 0)   
            pthread_cond_wait(&full, &monitor);           
        int tmp = buffer[use];     
        use = (use + 1) % MAX;
        fullEntries--;             
        pthread_cond_signal(&empty);            
        pthread_mutex_unlock(&monitor);
        return tmp;                
    }
}
--------------------------------------------------------------------------------
      [FIGURE: C++ PRODUCER/CONSUMER WITH LOCKS AND CONDITION VARIABLES]

You can see in this code example that there is little difference between the
pretend monitor code and the working C++ class we have above. Of course, one
obvious difference is the explicit use of a lock "monitor". More subtle is the
switch to the POSIX standard *pthread_cond_signal()* and *pthread_cond_wait()*
calls. In particular, notice that when calling pthread_cond_wait(), one also
passes in the lock that is held at the time of waiting. The lock is needed
inside pthread_cond_wait() because it must be released when this thread is put
to sleep and reacquired before it returns to the caller (the same behavior as
within a monitor but again with explicit locks).

[A JAVA MONITOR]

Interestingly, the designers of Java decided to use monitors as they thought
they were a graceful way to add synchronization primitives into a language. 
To use them, you just use add the keyword *synchronized* to the method or set
of methods that you wish to use as a monitor (here is an example from Sun's
own documentation site [7]):

--------------------------------------------------------------------------------
public class SynchronizedCounter {
    private int c = 0;
    public synchronized void increment() {
        c++;
    }
    public synchronized void decrement() {
        c--;
    }
    public synchronized int value() {
        return c;
    }
}
--------------------------------------------------------------------------------
           [FIGURE: A SIMPLE JAVA CLASS WITH SYNCHRONIZED METHODS]

This code does exactly what you think it should: provide a counter that is
thread safe. Because only one thread is allowed into the monitor at a time,
only one thread can update the value of "c", and thus a race condition is
averted. 

[JAVA AND THE SINGLE CONDITION VARIABLE PROBLEM]

In the original version of Java, a condition variable was also supplied with
each synchronized class. To use it, you would call either *wait()* or
*notify()* (sometimes the term notify is used instead of signal, but they mean
the same thing). Oddly enough, in this original implementation, there was no
way to have two (or more) condition variables. You may have noticed in the
producer/consumer solution, we always use two: one for signaling a buffer has
been emptied, and another for signaling that a buffer has been filled.

To understand the limitations of only providing a single condition variable,
let's imagine the producer/consumer solution with only a single condition
variable. Imagine two consumers run first, and both get stuck waiting. Then, a
producer runs, fills a single buffer, wakes a single consumer, and then tries
to fill again but finds the buffer full (MAX=1). Thus, we have a producer
waiting for an empty buffer, a consumer waiting for a full buffer, and a
consumer who had been waiting about to run because it has been woken.

The consumer then runs and consumes the buffer. When it calls notify(),
though, it wakes a single thread that is waiting on the condition. Because
there is only a single condition variable, the consumer might wake the waiting
*consumer*, instead of the waiting producer. Thus, the solution does not work.

To remedy this problem, one can again use the broadcast solution. In Java, one
calls *notifyAll()* to wake all waiting threads. In this case, the consumer
would wake a producer and a consumer, but the consumer would find that
fullEntries is equal to 0 and go back to sleep, while the producer would
continue. As usual, waking all waiters can lead to the thundering herd problem.

Because of this deficiency, Java later added an explicit Condition class, thus
allowing for a more efficient solution to this and other similar concurrency
problems [8]. 

[SUMMARY]

We have seen the introduction of monitors, a structuring concept developed by
Brinch Hansen and and subsequently Hoare in the early seventies. When running
inside the monitor, a thread implicitly holds a monitor lock, and thus
prevents other threads from entering the monitor, allowing the ready
construction of mutual exclusion.

We also have seen the introduction of explicit condition variables, which
allow threads to signal() and wait() much like we saw with semaphores in
the previous note. The semantics of signal() and wait() are critical; because
all modern systems implement *Mesa* semantics, a recheck of the condition that
the thread went to sleep on is required for correct execution. Thus, signal()
is just a *hint* that something has changed; it is the responsibility of the
woken thread to make sure the conditions are right for its continued
execution. 

Finally, because C++ has no monitor support, we saw how to emulate monitors
with explicit pthread locks and condition variables. We also saw how Java
supports monitors with its synchronized routines, and some of the limitations
of only providing a single condition variable in such an environment.

[REFERENCES]

[1] "Operating System Principles", Per Brinch Hansen. 
Prentice-Hall. 1973.
Available: http://portal.acm.org/citation.cfm?id=540365
One of the first books on operating systems; certainly ahead of its time.

[2] "Monitors: An Operating System Structuring Concept", C.A.R. Hoare.
Communications of the ACM. Volume 17, Number 10. pages 549-557. October 1974.

[3] "Quicksort: Algorithm 64", C.A.R. Hoare.
Communications of the ACM. Volume 4, Number 7. pages 321. July 1961.

[4] "Experience with Processes and Monitors in Mesa", B.W. Lampson, D.R. Redell.
Communications of the ACM. Volume 23, Number 2. pages 105-117. February 1980.

[5] Lampson, a famous systems researcher, loved using hints in the design of
computer systems. A hint is something that is often correct but can be wrong;
in this use, a signal() is telling a waiting thread that it changed the
condition that the waiter was waiting on, but not to trust that the condition
will be in the desired state when the waiting thread wakes up. 
If interested, you should read Lampson's paper "Hints on Computer Systems Design" 
(ACM Operating Systems Review 15, 5, October 1983, pages 33-48, and
available here: http://research.microsoft.com/Lampson/33-Hints/WebPage.html). 
In this paper about hints for designing systems, one of Lampson's general
hints is that you should use hints. It is not as confusing as it sounds. 

[6] Origin of the term is not known (by me, at least). If you know who first
called threads that behave like this a thundering herd, let me know!

[7] "Synchronized Methods", Sun documentation.
http://java.sun.com/docs/books/tutorial/essential/concurrency/syncmeth.html

[8] "Condition Interface", Sun documentation.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/Condition.html