Virtual Memory

Much of this subject straddles the areas of architecture and operating
  systems.  Must understand both to get it right.


  -- Physical memory is too small.
     (Smaller than the address space of the machine,
      for example, 32-bit address = 4K Mbytes, or 4 Gbytes,
      but main memory is only 16 Mbytes.)

      We want to have just parts of a program in memory, not the
      entire (potentially very large) program

  -- Want ability to have multiple processes somewhere in their
     execution.  Want to increase throughput of computer by allowing
     multiprogramming, multitasking.

     The programs (processes) need to SHARE main memory.
        need RELOCATION, OS-controlled SWAPPING

  -- Large, inexpensive memory (disk) is slow.


  Make main memory a cache for the larger disk.

  Give OS some "hooks" to ease its job.

  Introduce notion of an ADDRESS SPACE.

Address Space
Associated with every program (process) is a set of addresses
 that it may reference.

  PROCESS -- a program, together with some processor state.

 For a 32-bit address machine, want 32 bits of address space available
 to the program.  And, every process wants access to the full
 address space.

 To do this, introduce the notion of a VIRTUAL ADDRESS SPACE.

     virtual address            easily generated by SW

         \ /


         \ /

   physical address            used by HW to access memory
   (also called a real address)

  TO REMEMBER:  need the translation to be fast, since it must
                be done for every memory access.
		That means that it must be done by HW.

Base and Bounds
 Give 2 HW registers, can implement virtual memory.
 base --  base address for a process -- a physical (real) address
 bounds -- last valid virtual address process may access
 virtual addresses generated by process are offsets from the base

  -- for each reference, must check if the address is within bounds.
  -- each process has its own space
  -- each process must be allocated contiguously in memory
  -- cheap to implement in HW.  Addition and comparison can be
     done in parallel.

  -- impossible to share code between 2 programs (while keeping
     other code/data private to a program). This is because there
     is only 1 segment for each process. 


  Permit portions of process to be split into more than 1 area of
  memory (SEGMENTS)
     one for code
     another for heap
     another for stack

  Use separate base and bounds for each segment.
  Could add some sort of protection bit for each segment, to allow


  -- Need table for segments/base/bounds info.  Could get big.
  -- External fragmentation of memory.

  break up main memory into evenly sized PAGES.

  PAGE TABLE give address of either
     1) base of physical page in main memory
  or 2) address of page on disk

    each entry also has PRESENT bit (called VALID bit in book,
    to draw an analogy to cache's valid bit).  Present bit
    identifies whether physical page is resident in main memory
    or not.

  Have one page table for each process.  Its base address is kept in
    a register in the CPU.

  Placement of pages in memory is easy.  OS keeps a free list, and
    just take one from list.  


  -- It takes 2 memory accesses to get one piece of data.
         One to access the page table entry.
         One to get the data.
         TLB (Translation Lookaside Buffer) as a cache for page table.
  -- For reasonable size memory, page table is huge.
       Keep base/bounds for page table, so only have pages needed
       in table
  -- Internal fragmentation. (Page size is aways wrong for some


  To reduce table sizes, use 2 levels of mapping.

  Each segment contains an integral number of pages.  The number of
    pages can vary for each segment.

  Keep a page table for each segment.

  Both external and internal fragmentation are kept at bay.


  -- If tables are in memory, can take more than one memory access
     to get at a piece of data.
     Solution:   TLB

TLB (Translation Lookaside Buffer)

  A process typically accesses only a few of its pages at time, and
  accesses them repetitively.  (Showing both spacial and temporal

  TLB conatins a mapping of virtual page number (the TAG) to a
    physical page number.

  Also included,
     Present bit
     Referenced bit(s) - To help know chose victim if no free pages.
     Protection - to facilitate sharing
     Dirty bit - done at page level, not cache level.
     Process ID - sometimes added so that the TLB does not have
        to be flushed at each context switch.

  TLB typically contains 98% or more of the pages accessed.

  Typical size is 64-128 entries.  Small TLB can be made
    fully associative.

  There is a difference between miss in the TLB and a page fault.


  What happens if the page is not resident in main memory.

 - In translation, present bit is off.

   - Many machines trap (take an exception).

     - OS invoked to allocate free physical page from free list,
      or choose a "victim,"  the physical page to be thrown out
      of main memory to make room for faulted page.  Have to
      write current victim page to disk if dirty.

       - OS initiates disk access (really slow, 1msec or more),
         brings page into memory, updates page table and TLB,
         sets present bit

         - Instruction that cause page fault is restarted.

 Note that restarting instructions can be a tricky business.
    IBM 370 ran "long" instructions twice,
         First time just to access each memory location (read),
         to generate any potential page faults.
         Second time to do the actual instruction, guaranteed not
         to page fault.
    Z8000 solution, have 2 processors, one just for handling page