CS 537
Lecture Notes Part 7a
More About Paging

Previous Paging
Next Segmentation
Contents

Paging Details

Real-world hardware CPUs have all sorts of “features” that make life hard for people trying to write page-fault handlers in operating systems. Among the practical issues are the following.

Page Size

How big should a page be? This is really a hardware design question, but since it depends on OS considerations, we will discuss it here. If pages are too large, lots of space will be wasted by internal fragmentation: A process only needs a few bytes, but must take a full page. As a rough estimate, about half of the last page of a process will be wasted on the average. Actually, the average waste will be somewhat larger, if the typical process is small compared to the size of a page. For example, if a page is 8K bytes and the typical process is only 1K, 7/8 of the space will be wasted. Also, the relative amount of waste as a percentage of the space used depends on the size of a typical process. All these considerations imply that as typical processes get bigger and bigger, internal fragmentation becomes less and less of a problem.

On the other hand, with smaller pages it takes more page table entries to describe a given process, leading to space overhead for the page tables, but more importantly time overhead for any operation that manipulates them. In particular, it adds to the time needed to switch form one process to another. The details depend on how page tables are organized. For example, if the page tables are in registers, those registers have to be reloaded. A TLB will need more entries to cover the same size “working set,” making it more expensive and require more time to re-load the TLB when changing processes. In short, all current trends point to larger and larger pages in the future.

If space overhead is the only consideration, it can be shown that the optimal size of a page is sqrt(2se), where s is the size of an average process and e is the size of a page-table entry. This calculation is based on balancing the space wasted by internal fragmentation against the space used for page tables. This formula should be taken with a big grain of salt however, because it overlooks the time overhead incurred by smaller pages.

Restarting the instruction

After the OS has brought in the missing page and fixed up the page table, it should restart the process in such a way as to cause it to re-try the offending instruction. Unfortunately, that may not be easy to do, for a variety of reasons.

Variable-length instructions: Some CPU architectures have instructions with varying numbers of arguments. For example the Motorola 68000 has a move instruction with two arguments (source and target of the move). It can cause faults for three different reasons: the instruction itself or either of the two operands. The fault handler has to determine which reference faulted. On some computers, the OS has to figure that out by interpreting the instruction and in effect simulating the hardware. The 68000 made it easier for the OS by updating the PC as it goes, so the PC will be pointing at the word immediate following the part of the instruction that caused the fault. On the other hand, this makes it harder to restart the instruction: How can the OS figure out where the instruction started, so that it can back the PC up to retry?
Side effects: Some computers have addressing modes that automatically increment or decrement index registers as a side effect, making it easy to simulate in one step the effect of the C statement *p++ = *q++;. Unfortunately, if an instruction faults part-way through, it may be difficult to figure out which registers have been modified so that they can be restored to their original state. Some computers also have instructions such as “move characters,” which work on variable-length data fields, updating a pointer or count register. If an operand crosses a page boundary, the instruction may fault part-way through, leaving a pointer or counter register modified.

Fortunately, most CPU designers know enough about operating systems to understand these problems and add hardware features to allow the OS to recover. Either they undo the effects of the instruction before faulting, or they dump enough information into registers somewhere that the OS can undo them. The original 68000 did neither of these and so paging was not possible on the 68000. It wasn't that the designers were ignorant of OS issues, it was just that there was not enough room on the chip to add the features. However, one clever manufacturer built a box with two 68000 CPUs and an MMU chip. The first CPU ran “user” code. When the MMU detected a page fault, instead of interrupting the first CPU, it delayed responding to it and interrupted the second CPU. The second CPU would run all the OS code necessary to respond to the fault and then cause the MMU to retry the storage access. This time, the access would succeed and return the desired result to the first CPU, which never realized there was a problem.

Locking Pages

There are a variety of cases in which the OS must prevent certain page frames from being chosen by the page-replacement algorithm. For example, suppose the OS has chosen a particular frame to service a page fault and sent a request to the disk scheduler to read in the page. The request may take a long time to service, so the OS will allow other processes to run in the meantime. It must be careful, however, that a fault by another process does not choose the same page frame! A similar problem involves I/O. When a process requests an I/O operation it gives the virtual address of the buffer the data is supposed to be read into or written out of. Since DMA devices generally do not know anything about virtual memory, the OS translates the buffer address into a physical memory location (a frame number and offset) before starting the I/O device. It would be very embarrassing if the frame were chosen by the page-replacement algorithm before the I/O operation completes. Both of these problems can be avoided by marking the frame a ineligible for replacement. We usually say that the page in that frame is “pinned” in memory. An alternative way of avoid the I/O problem is to do the I/O operation into or out of pages that belong to the OS kernel (and are not subject to replacement) and copying between these pages and user pages.

Missing Reference Bits

At least one popular computer, the Digital Equipment Corp. VAX computer, did not have any REF bits in its MMU. Some people at the University of California at Berkeley came up with a clever way of simulating the REF bits in software. Whenever the OS cleared the simulated REF bit for a page, it mark the hardware page-table entry for the page as invalid. When the process first referenced the page, it would cause a page fault. The OS would note that the page really was in memory, so the fault handler could return without doing any I/O operations, but the fault would give the OS the chance to turn the simulated REF bit on and mark the page as valid, so subsequent references to the page would not cause page faults. Although the software simulated hardware with a real real REF bit, the net result was that there was a rather high cost to clearing the simulated REF bit. The people at Berkeley therefore developed a version of the CLOCK algorithm that allowed them to clear the REF bit infrequently.

Fault Handling

Overall, the core of the OS kernel looks something like this:


    // This is the procedure that gets called when an interrupt occurs
    // on some computers, there is a different handler for each "kind"
    // of interrupt.
    void handler() {
        save_process_state(current_PCB);
            // Some state (such as the PC) is automatically saved by the HW.
            // This code copies that info to the PCB and possibly saves some
            // more state.
        switch (what_caused_the_trap) {
            case PAGE_FAULT:
                f = choose_frame();
                if (is_dirty(f))
                    schedule_write_request(f);  // to clean the frame
                else
                    schedule_read_request(f);    // to read in requested page
                record_state(current_PCB);
                    // to indicate what this process is up to
                make_unrunnable(current_PCB);
                current_PCB = select_some_other_ready_process();
                break;
            case IO_COMPLETION:
                p = process_that_requested_the_IO();
                switch (reason_for_the_IO) {
                    case PAGE_CLEANING:
                        schedule_read_request(f); to read in requested page
                        break;
                    case BRING_IN_NEW_PAGE:
                    case EXPLICIT_IO_REQUEST:
                        make_runnable(p);
                        break;
                }
            case IO_REQUEST:
                schedule_io_request();
                record_state(current_PCB);
                    // to indicate what this process is up to
                make_unrunnable(current_PCB);
                current_PCB = select_some_other_ready_process();
                break;
            case OTHER_OS_REQUEST:
                perform_request();
                break;
        }
        // At this point, the current_PCB is pointing to a process that
        // is ready to run.  It may or may not be the process that was
        // running when the interrupt occurred.
        restore_state(current_PCB);
        return_from_interrupt(current_PCB);
            // This hardware instruction restores the PC (and possibly other
            // hardware state) and allows the indicated process to continue.
    }