Memory Management

Allocating Main Memory

Operations

allocate: allocate a virtual memory block to use
deallocate: free a virtual memory block allocated before
reallocate: grow or shrink virtual memory allocated before. Tries to return the same address but may also return a new address

Situations to use the above operations

user process wants to get a (virtual) memory block, e.g., malloc
kernel wants to increase/decrease its memory space

General

User programs keep track of allocated blocks
Kernel (memory manager) keeps track of holes (=free blocks)
Data structure
free blocks
doubly-linked together - coded within the block itself: wastes no space
headers at both end of each free block - size of the block and other information
This makes coalescing adjacent blocks easy
allocated blocks
headers at both end of the block
the address returned to the user process - address of the block - size of header
buggy C/C++ can overwrite the header and corrupt malloc chain

Algorithms for memory management

best fit: return the smallest hole which is larger than the requested size
expensive search
external fragmentation: generates small and unusable blocks
first fit: return the first hole which is larger than the requested size
small holes tend to accumulate near the beginning of the free list
solution - next fit: start search where the last one left off
buddy system
all blocks and holes have sizes which are powers of two
every block or hole starts at an address that is an exact multiple of its size
Then every block has its buddy with the same size. Always coalesce a block with its buddy, generating a free block twice as large as the free block

What do you do when you run out of memory?

return error,
compaction, or
garbage collection

Compaction

procedure
new location of each block is calculated
pointers to the block being moved are updated
blocks are actually moved
problems
needs expensive data copy
pointers must be kept track of

Garbage Collection

Pointers to blocks need to be kept track of, but not necessarily - Conservative garbage collection
treat values which look like pointers as pointers: maybe fail to free a garbage block but never free a block being used
Algorithms
Reference counting: free blocks whose reference counts drop zero
practical only when there is some "higher level" software to keep track of the counts
could NOT find cyclic garbage
Mark-and-Sweep:
for each pointer "from outside", mark all non-garbage blocks by doing a depth-first-search
free every block not marked
problem:
jumping around memory
amount of work is proportional to non-garbage blocks
Generational Collection
memory is divided into spaces
When a space is chosen for garbage collection, all subsequent references to objects in that space cause the object to be copied to a new space. The old space will become sparse so mark-and-sweep is worth doing.
By carefully moving objects to different spaces, we can put objects which have similar life style into the same space

Paging

Generals

Virtual address, physical address, MMU
Procedure to translate virtual to physical
CPU gives out a virtual address
MMU searches page table of the current process
start address of the page table (=base register) + page number field of virtual address
if the page table entry is NOT valid,
MMU generates page fault
OS handles the interrupt
get the address of Disk having the page
a page is kicked out if necessary
bring the page into the memory
run the instruction again, which caused the page fault
get physical address
read the data and return it to CPU

Page Table

A page table for each process
An entry for each virtual page: valid bit, protection bits, modified bit, ref bit
Challenges of page table
reference should be very very fast -> TLB
the size of the table should not be so big -> Page table organization issue
Page table organization
page table in registers: only feasible to very small machines
page table in memory: page table is too big to fit in memory
page table in virtual memory = multiple level of indirection
inverted table (physical to virtual): slowest due to search of the table
TLB: Translation Look-aside Buffer
if TLB hit ratio is high enough, how TLB miss is handled does not much matter

Cases when a page fault occurs

Bug in the program
kills the program ("memory fault")
Request to grow a segment (data segment or stack area) - reference to the just beyond the end of a process's stack
allocates a frame and updates page table
The requested page is not in memory but in disk
brings the page into memory

Page Replacement for a single process

Random
FIFO: kick the oldest page out
Does not distinguish between hot and cold pages
Even worse than Random
Belady's MIN: kick the page which will not be used for the longest time in the future
Serves only as an theoretical yard stick
LRU: Least Recently Used - approximate optimum
No hardware supports LRU
NRU: Not recently used
Hardware sets referenced bit of page table whenever a page is referenced
OS periodically resets the bits
Select a page with the referenced bit unset
Sampled LRU: similar to NRU
Not Freq. Used: increment a counter before unsetting the referenced bit & select the page with the smallest counter
Better Way: shift right with the ref. bit before unsetting the referenced bit & select the page with the smallest value
Clock
If the ref. bit set, unset the bit and proceed to the next frame, giving the second chance to the frame
If dirty bit set, schedule disk write for the frame and proceed to the next frame
otherwise, return the frame
Belady's Anomaly
In some replacement algorithm, more memory causes more page faults for a specific page references sequence

Page Replacement for multiple processes

The problems of naive extension of methods for single process
Wall clock is not a correct criterion for replacement
Solution: virtual time introduced
A greedy process can take away pages, with little performance increase, from other processes, causing severe performance decrease
Solution: memory should be partitioned to each process
Working set model
Working Set W(t): the set of pages referenced during the last t memory references
Medium-term scheduling (also called load control): swapping out & in to let each runnable process has its working set in memory
Work set is just a model. We need algorithms:
Fixed Allocation
Give fixed number of pages to each process
Problem: Difficult to decide how much pages to give a process
This can be used in real time system, where a fixed number of and well-known processes run over long period
Page-Fault Frequency
Monitor page fault rate for each process
If the rate is too high, give more pages or swap the process out
If the rate is too low, take some pages from the process and give them to other processes or swap another process in
Problem: what are the correct value of "too high" and "too low"? System could be unstable, oscillating between "too high" (= trashing) and "too low"
Working Set
Periodically run checkers, which
scan all pages currently in memory
if a page is not in the working set of the owner process, i.e., Virtual_time_of_the_process - Last_reference_time > t
put the page in the free pool
calculate the size of the working set, w, of each process
When a page fault occurs, allocate a page from the free pool
When no page is available from the free pool, swap a process out
Global Clock
Just another naive extension of single process paging algorithm
But clock hand moving too fast is a sign of threshing -> swap out
Don't know which page to kick out!
WSClock
Assumptions:
virtual time of each process
owner of a page frame is somehow known
the approximation to the time of the last reference is available
Algorithm

Threshing situation
every page has not been referenced but is in working set of the owner process
every page is dirty

Segmentation

Introduction

The virtue of paging
gives each process its own virtual memory, which is bigger than actual memory and starts at address zero
simplifies relocation and as a consequence program can be compiled w/o worrying about where it will actually placed on memory
relaxes programmer from being frugal at using memory. If a process defines a very big array and uses only a portion of it, paging will bring only the portion in memory!
Segment goes further. It gives multiple virtual memories, which can:
be any size. Segments can be of different sizes.
starts at any address possibly conflicting with other segments
is access-controlled independently
Address Space of a process = union of segments
Unix Segments
text: executable code and read-only data
data: global data
stack: call stack (arguments, local variables, registers)
shared memory
kernel: kernel code and data structure

Difference between paging and segment

  paging segment
size fixed variable
start addr multiple of page size any address
physical addr frame number concate offset start addr + offset

Multics

Intel x86

virtual address ->(by segment)-> linear address ->(by paging)-> physical address
virtual address: 16-bit segment selector + 16(or 32)-bit segment offset
segment selector indexes into the segment descriptor table, which is an array of SDW (segment descriptor word)
SDW(64-bit): segment base(32-bit) + segment length(21-bit) + protection and other stuff
linear address = segment base + segment offset
linear address(32-bit): page number(10-bit) + page number(10-bit) + page offset(12-bit)

Other Issues

Capability based addressing
Converting a swap-based system to do paging in an architecture lacking page-referenced bits