** Address Spaces **

In the early days, building computer systems was easy. Why, you ask? Because
users didn't expect too much. It is those darned users with their expectations
of "ease of use", "high performance", "reliability", and so forth that really
have caused us all these headaches. So next time you meet one of those
computer users, you can thank them for all the problems they have caused.

From the perspective of memory, early machines didn't provide much of an
abstraction to users. Basically, the physical memory of the machine looked
something like this:

0    |---------------------|
4    |  Operating System   |
...  | (code, data, etc.)  |
     |                     |
64k  |---------------------|
     |                     |
     |                     |
     | The current running |
     |       program       |
     | (code, data, etc.)  |
     |                     |
     |                     |
     |                     |
max  |---------------------|

       FIGURE: EARLY DAYS

Basically, the OS was a set of routines (a library, really) that sat in memory
(starting at physical address 0 in this example), and there would be one
running program (a process) that currently sat in physical memory (starting at
physical address 64k in this example) and used the rest of memory. There were
few illusions here, and the user didn't expect much from the OS.

After a time, because machines were expensive, people began to develop the
desire to share machines. The era of *multiprogramming* was upon us, where
multiple jobs would be run at once, with the OS deciding which one should run
at any given time. 

Of course, one way to implement multiprogramming would be to run one process
for a while, giving it full access to all memory (like the picture above),
then stop it, save all of its state to disk (including all of physical
memory!), load some other processes's state, run it for a while, and thus
implement some kind of crude sharing of the machine.

Unfortunately, this approach has a big problem: it is way too slow. While
saving and restoring register-level state (e.g., the PC, general-purpose
registers, etc.) to the PCB is fast, saving the entire contents of memory to
disk is brutally slow. Thus, what we'd rather do is leave the contents of each
currently active process in memory while we context-switch between them, thus
allowing the OS to quickly switch between a large number of
jobs. Conceptually, we can think of this as follows:

0    |---------------------|
4    |  Operating System   |
...  | (code, data, etc.)  |
     |                     |
64k  |---------------------|
     |     process A       |
     | (code, data, etc.)  |
     |                     |
     |                     |
128k |---------------------|
     |     process B       |
     | (code, data, etc.)  |
196k |---------------------|
     |                     |
     |      (free)         |
256k |---------------------|
     |     process C       |
     | (code, data, etc.)  |
     |                     |
322k |---------------------|
     |                     |
     |      (free)         |
     |                     |
     |                     |
     |                     |
512k |---------------------|

     FIGURE: SHARING MEMORY


In the diagram, there are three processes (A, B, and C) and each of them have
a small part of the 512-KB physical memory carved out for them. Assuming a
single CPU, the OS will choose at any one time to run one of the processes
(say A), while the others (B and C) would be in the scheduler's ready queue
waiting to be run.

The goal of such sharing is to allow for the machine to be used as
*efficiently* as possible; for example, when process A initiates an I/O and
thus moves from the running to the blocked state, the OS can quickly switch to
B or C and thus better utilize the CPU.

However, we have to keep those pesky users in mind, and doing so requires the
OS to create an *easy to use* abstraction of physical memory. We call this
abstraction the *address space*, and it is the running program's view of
memory in the system. Understanding this fundamental OS abstraction of memory
is key to your understanding of how memory is virtualized by the OS.

The address space of a process contains all of the memory state of the running
program. For example, the *code* of the program (the instructions) have to
live in memory somewhere, and thus they are in the address space. The program,
while it is running, uses a *stack* to keep track of where it is in the
function call chain as well as to allocate local variables and pass parameters
and return values to and from routines. Finally, the *heap* is used for
dynamically-allocated, user-managed memory, such as that you might receive
from a call to malloc() in C or the new in an object-oriented language such as
C++ or Java. Of course, there are other things in there too (like
statically-initialized variables, and a few other details), but for now let us
just assume those three components: code, stack, and heap.

Thus, our simple view of the address space is as follows:

0    |---------------------|
4    |    program code     |              all the instructions
...  |                     |              live up in this part
     |                     |
1K   |---------------------|
     |        heap         |              the heap contains all
     |                     |              malloc'd (new'd) data
     |                     |                   structures
2K   |---------------------|               [it grows downward]
     |          |          |
     |          v          |
     |                     |
     |                     |
     |                     |
     |                     |
     |                     |
     |        free         |
     |                     |
     |                     |
     |                     |
     |                     |
     |                     |
     |                     |
     |          ^          |
     |          |          |                [it grows upward]
15k  |---------------------|             the stack contains local
     |        stack        |            (stack-allocated) variables,
     |                     |             arguments to routines, 
16k  |---------------------|              return variables, etc.

      FIGURE: ADDRESS SPACE

In this example, we have a tiny address space (only 16 KB) (we will often use
small examples like this because it is a pain to represent a 32-bit address
space and the numbers start to become hard to deal with). The program code
lives at the top of the address space (starting at 0 in this example, and is
packed into the first 1K of the address space). Code is static (and thus easy
to deal with), so we can place it at the top of the address space and know
that it won't need any more space as the program runs.

Next, we have the two regions of the address space that may grow (and shrink)
while the program runs. Those are the heap (at the top) and the stack (at the
bottom). We place them like this because each wishes to be able to grow, and
by putting them at opposite ends of the address space, we can allow such
growth: they just have to grow in opposite directions. The heap thus starts
just after the code (at 1KB) and grows downward (say when a user requests
more memory via malloc()); the stack starts at 16KB and grows upward (say when
a user makes a procedure call).

Of course, when we describe the address space, what we are describing is the
*abstraction* that the OS is providing to the running program. The program
really isn't in memory at physical addresses 0 through 16KB, rather it is
loaded at some arbitrary address(es). Examine processes A, B, and C in
[FIGURE: SHARING MEMORY]; there you can see how each process is loaded into
memory at a different address. And now, hopefully you can see the problem.

[THE CRUX OF THE PROBLEM]

The problem is this: How can the OS build this abstraction of a private,
potentially large address space for multiple running processes (all sharing
memory) on top of a single, physical memory? When the OS does this, we say the
OS is *virtualizing* memory, because the running program thinks it is loaded
at an address (say 0) and has a potentially very large address space (say
32-bits), but the reality is quite different. 

When, for example, process A in [FIGURE: SHARING MEMORY] tries to perform a
load at address 0 (which we will call a *virtual address*), somehow the OS, in
tandem with some hardware support, will have to make sure the load doesn't
actually go to physical address 0 but rather to physical address 64KB (where A
is loaded into memory). 

Thus we have the job of the OS: to virtualize memory. Some goals:

- Transparency: the OS should do this in a way that is *transparent* to the  
  running program. The program behaves as if it has its own private
  memory. The OS (with hardware support) does all the work to multiplex memory
  among many different jobs.

- Efficiency: the OS should strive to make the virtualization as *efficient*
  as possible. As we will see, the OS will have to rely on hardware support
  for this, including hardware features such as TLBs (which we will learn
  about in due course).

- Protection: finally, the OS should make sure to *protect* both processes
  from one another and the OS itself from processes. When one process performs
  a load, a store, or an instruction fetch, it should not be able to access or
  affect in any way the memory contents of any other process or the OS itself
  (that is, anything *outside* it's address space). Protection thus enables us
  to deliver the property of *isolation* among processes; each process should
  be running in it's own isolated cocoon, safe from the ravages of other
  faulty or even malicious processes.