** Relocation ** Our first attempts at virtualizing memory will be very simple, almost laughably so. Go ahead, laugh all you want. Pretty soon it will be the OS laughing as you try to understand the nine thousand things that happen on a page fault, so you might as well enjoy life now (while you can). Specifically, we will assume for now that the user's address space must be placed *contiguously* in memory. We will also assume, for simplicity, that the size of the address space is not too big; specifically, that it will fit entirely into physical memory. Finally, we will also assume that each address space is exactly the same size. Don't worry if these sound unrealistic; we will relax all the assumptions as we go, making a more realistic and interesting virtualization of memory possible. Thus, our goal for now: to be able to realize [FIGURE: SHARING MEMORY] from the discussion on address spaces. Each process A, B, and C in that figure should think it has its own private memory, whereas the reality is that all three are sitting at different locations in physical memory. [STATIC RELOCATION] Our first approach is one that is quite simple and requires no hardware support. It is called *static relocation*, as the OS will perform a one-time change of the addresses within the program in order to relocate it into a different part of memory. Let's look at an example. Imagine there is a process whose address space looks like this: 0 |---------------------| 4 | (program code) | ... | | | | 128 | load R1, 0(15KB) | // load the value at address 15KB into R1 132 | add R2, 3, R1 | // add 3 to that value, put it into R2 136 | store 0(15KB), R2 | // store R2 back into the same address | | | | | | 1K |---------------------| | (heap) | | | | | 2K |---------------------| | | | | v | | | | | | | | | | | | (free) | | | | | | | | ^ | | | | 14K |---------------------| | | | | 15K | 2008 | // the value at 15 KB (a variable) | (stack) | | | 16K |---------------------| FIGURE: A PROCESS What we are going to examine here is a short code sequence that loads a value from memory, increments it by three, and then stores the value back into memory. You can imagine the high-level language representation of this might look like this: void func() { int x = 2008; // x is "stack allocated", and thus lives on the "stack" ... x = x + 3; // this is the line of code we are interested in ... } The compiler turns this line of code into assembly, which might look something like this: load R1, 0(15KB) // load the value at address 15KB into R1 add R2, 3, R1 // add 3 to that value, put it into R2 store 0(15KB), R2 // store R2 back into the same address In [FIGURE: A PROCESS], you can how both the code and stack-allocated data are laid out in the process's address space; the three-instruction code sequence is located at address 128 (in the code section near the top), and the value of the variable x at address 15KB (in the stack near the bottom). Note that the initial value of x is set to 2008 before this code sequence runs. When these instructions run, from the perspective of the process, the following memory accesses take place. - Fetch instruction at address 128 - Execute this instruction (a load from address 15KB) - Fetch instruction at address 132 - Execute this instruction (just an add, no memory references here) - Fetch the instruction at address 136 - Execute this instruction (a store to address 15KB) However, the OS wishes to place this process somewhere in physical memory, not necessarily starting at address zero. Thus, we have the problem: how can we place this process somewhere else in memory in a way that is transparent to the process? One early solution to this problem is known as *static relocation*. When the user submits the program to the system to be run, the OS loader (the part of the OS that gets a process up and running) *rewrites* the addresses of the process so that they refer to addresses where the process was placed. For example, if the OS wished to place the process above at the physical address 2048, it would go through and change all relevant addresses to increment them by 2048. For example, the three-instruction sequence above would now become: 2176 load R1, 0(17KB) // load the value at address 15KB + offset into R1 2180 add R2, 3, R1 // add 3 to that value, put it into R2 2184 store 0(17KB), R2 // store R2 back into the same address (+ offset) Thus, the process is loaded, the addresses rewritten, and the program runs otherwise unmodified at the desired offset in physical memory. Some form of memory sharing and multiprogramming has been achieved! The strengths of this approach are clear: it is *simple*. Unfortunately, there are negatives. Most importantly, there is *no protection*; thus, a process could generate an address outside of its address space and potentially read or write values in other processes's address spaces. This is a huge problem for building a robust operating environment, and thus static relocation is simply not appropriate in general-purpose settings. [DYNAMIC RELOCATION] The remedy the weaknesses of static relocation (in particular to add protection), the OS needs some help from our old friend: the hardware. Specifically, we are going to add two registers: one is called the *base* register, and the other the *bounds*. This base/bounds pair is going to allow us to both relocate a process and do so with protection. The program is still written as if it is loaded at address zero. However, when it starts running, the OS decides where in physical memory it should be loaded and sets the *base* register to that value. In the example above, the OS wishes to load the process at physical address 2KB and thus sets the base register to this value. Interesting things start to happen when the process is running. Now, when any memory reference is generated by the process, it is *translated* by the processor in the following manner: physical address = virtual address + base The process generates a *virtual* address; the hardware in turn adds the contents of the base register to this address and the result is a real physical address that can be issued to the memory system. Thus, when the following instruction is fetched and executed: 128 load R1, 0(15KB) The program fetches the instruction from address 128, which the CPU adds to the base register value of 2KB (2048), to get a physical address of 2176, which is issued to the memory system. The contents of that address (the instruction) are fetched, and the processor begins executing it. At some point, the process then issues the load from address 15KB, which the processor takes and again adds to the base register, getting the final physical address of 17KB and thus the desired contents. This entire process is called *address translation*, in which we take a virtual address the process thinks it is issuing and turn it into a physical address which is where the data actually resides. Say it again: address translation. We are going to be doing a lot of it. Now you might be asking: what happened to that *bounds* register? After all, isn't this supposed to be the base-and-bounds approach? Indeed, it is. And as you might have guessed, the bounds register is there to help with protection. Specfically, the processor will first add the base to the virtual address to get the desired physical address, but then it will check if the physical address is within the base and bounds. If not, the processor will signal some kind of fault and the process will likely be terminated. The point of the bounds is thus to make sure that all addresses generated by the process are legal and within the "bounds" of the process. We should note that the base and bounds registers are hardware structures kept on the chip and managed by the CPU. Sometimes people call the part of the processor that helps with this type of address translation the *memory management unit* (MMU). One small note about bounds: you can build this in one of two ways. In one way, it would hold the *size* of the address space, and thus the hardware would check the virtual address against it first before adding the base. In the second way, it holds the *physical address* of the end of the address space, and thus the hardware would first add the base and then make sure the address is within bounds. [AN EXAMPLE] BASE : 16KB BOUNDS : 4KB (in the size form, not physical address) What will happen on the following virtual memory references? 400 -> translates to 16KB + 400 = 16784 1024 -> ? 2800 4000 4100 4200 5000 [OS ISSUES] From the perspective of the OS, the base and bounds introduces a small change to the context switch code. There is only one base and bounds register in the system, after all, and their values different for each running program, as each program is loaded at a different physical address in memory. Thus, the OS must save and restore the base/bounds pair when it context switches between processes; they are now part of the PCB (process control block) of the process. [SEGMENTATION] So far we have been putting the entire address space of each process in memory. With the base and bounds registers, the OS can easily relocate processes to different parts of physical memory. However, you might have noticed something interesting about these address spaces of ours: there is a big chunk of "free" space right in the middle, between the stack and the heap. |---------------------| | program code | | | | | |---------------------| | heap | | | |---------------------| | | | | v | | | | | | | | free | | | | | | | | ^ | | | | |---------------------| | stack | | | |---------------------| FIGURE: ADDRESS SPACE (AGAIN) As you can see in the figure, although it is not being used by the process at the given time, the free space is still taking up physical memory when we perform dynamic relocation of the entire address space. Thus, a problem: [THE CRUX OF THE PROBLEM] How do we support a large address space with (potentially) a lot of free space in between the stack and the heap? Note that in our examples, with tiny (pretend) address spaces, the waste doesn't seem too bad. Imagine, however, a 32-bit address space (4 GB in size); a typical program will only use megabytes of memory, but still would demand that the entire address space be resident in memory. Sounds like a bad idea, no? [SEGMENTATION: A SLIGHT GENERALIZATION OF BASE/BOUNDS] Thus, an idea was born, and it is called *segmentation*. Instead of having just one base and bounds pair in our MMU, why not have a base/bounds pair per logical *segment* of the address space? A segment is just a contiguous portion of the address space of a particular length, and in our canonical address space, we have three logically different segments: code, stack, and heap. What segmentation allows the OS to do is to place each one of those segments (or rather, the allocated parts of them) in different parts of physical memory, and thus avoid the problem of vast amounts of unused space sitting around in physical memory. Let's look at an example. Here is the address space we wish to place in physical memory: 0 |---------------------| | (program code) | | | 1K |---------------------| | (free) | | | | | | | | | 4K |---------------------| | (heap) | | | 5K |---------------------| | | | | v | | | | | | | | | | | | (free) | | | | | | | | ^ | | | | 14K |---------------------| | | | (stack) | | | 16K |---------------------| As you can see, the spaces between the code and the heap (2K and 4K) and between the heap and the stack (5K and 14K) are free. With a base/bounds pair per segment, we can instead place each segment *independently*. For example, here is a picture of physical memory with the three segments placed within it: 0 |---------------------| | OS | | | .... | | | | | | 16K |---------------------| | (program code) | | | 17K |---------------------| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxx free xxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| 20K |---------------------| | | | | | (stack) | | | 22K |---------------------| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxx free xxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| 28K |---------------------| | | | (heap) | 29K |---------------------| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxx free xxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxxxxxxxx| 32K |---------------------| [FIGURE: PLACING SEGMENTS IN MEMORY] As you can see in the diagram, only used memory is allocated space in physical memory, and thus large address spaces with large amounts of unused address space (which we sometimes call *sparse* address spaces) can be accommodated. The hardware structure in our MMU required to support segmentation is just what you'd expect: a set of three base/bounds pairs known collectively as the *segment table*. The segment table for the above might look like this: Segment | Base | Bounds --------------------------- Code | 16K | 1K Heap | 28K | 1K Stack | 20K | 2K The hardware would then use this table to perform the exact translation that we did before. However, a new question arises: how does the hardware know which entry in the segment table to use? (or put another way: how does the hardware know if the memory reference is coming from the stack, the heap, or the code segment?) Turns out there are a number of ways. One quite common approach is to use the top few bits (say two) to designate the region of the address space. In our example above, if the top two bits of an address were '00' it would be from the code segment, '01' from the heap, and '11' from the stack. Other approaches include using a segment register to explicitly designate which segment one is using, or implicitly knowing the segment because of the source of the address (e.g., if the address comes from the PC register, then it must be for the code segment). [OS SUPPORT] The OS support for segmentation has two components. The first is quite similar to that of dynamic relocation: the segment table must be saved and restored on a context switch. The second is managing the free space in physical memory. The OS has to be able to find space for a new segment, and this is becoming increasingly painful to manage. Previously, we assumed that each address space was the same size, and thus physical memory could be thought of as a bunch of slots where processes would fit in. Now, each segment might be of different sizes. Thus, we might end up with a memory with a bunch of free spaces that are not contiguous (see [FIGURE: PLACING SEGMENTS IN MEMORY]). This problem, known as *external fragmentation*, is a problem because imagine now a process comes along and wishes to allocate a 10K segment. In that example, there are 12K free, but no 10K contiguous free. Thus, the OS would have trouble placing the segment in physical memory (perhaps it could spend time moving segments around to *compact* physical memory, but that would be costly). [SEGMENTATION: THE GOOD AND BAD] Segmentation solves a number of problems, and helps us build a more effective virtualization of memory. Beyond just dynamic relocation, segmentation can better support sparse address spaces, and avoid the huge potential waste of space between logical segments of the address space. A fringe benefit arises too: code sharing. Because the code segment is now separate from the rest of the address space, it could potentially be shared across multiple running programs (e.g., imagine running multiple shell processes at once). A little extra hardware might be required to label each segment read-only (as they would be when shared), or read-write, and so forth, but the benefit is substantial, reducing the burden on memory. However, as we saw above, allocating a bunch of variable-sized segments in memory leads to the problem of external fragmentation. External fragmentation makes memory a pain to manage for the OS, and thus we must look for a different way to support sparse virtualized address spaces. The technique we will use is something we call *paging*.