WSClcok - A Simple and Effective Algorithm for Virtual Memory Management

R. Carr and J. Hennessy @ Stanford

Proceedings of the Eighth ACM Symposium on Operating System Principles, December 1981, pages 87-95

Introduction

Memory management policies
Local policy: as WS
Make an explicit measurement of each process's locality and allocate sufficient memory to each process
Global policy: global LRU and its approximations
Does NOT make the measurement for each process and consider only global memory requirement and collective behavior of processes. Does NOT guarantee that every process running gets sufficient memory
Problem
Local policy is better at preventing threshing
Global policy is a lot simpler and has less computational overhead
Goal of this paper
Combine the operational advantage of WS with the simplicity and efficiency of Clock

General Model of a Virtual Memory Computer System

Task model
Task is modeled as a memory reference string, which is a sequence of (p, d) such that
p is a page in the virtual address space of the process
d is dirty flag
Virtual time of a task: the number of completed references to pages for the task
Operating system model
A task is either
in the active queue,
in the ready queue, or
dormant (or blocked)
Load control moves tasks between the ready and the active queue
Tasks in the active queue get scheduled according to a scheduling discipline
Working Set
Working set determination: finding pages leaving W is difficult
Needs to scan all the resident pages of the process, or needs to maintain the set of pages in W to scan only pages in W
Load control and replacement
Needs to maintain a free list of pages which do not belong to any W
Page Writing and Reclamation
Pages in the free list should be reclaimed if it is referenced before being kicked out
Global Clock
Naive extension of Clock algorithm but with some feedback mechanism (such as clock hand movement rate) to control multiprogramming level
Comparing WS and Clock
WS requires more memory and complex algorithm
WS-scan procedure
Space for LR(p) (Latest Reference time of the page) doubles page table. Remember that the size of page table does matter
Algorithm to maintain free list of pages, including a method to reclaim pages from the list
Clock requires a fine tuned adaptive feedback load control mechanism
In Clock, a greedy process can monopolize memory

WSClock

Retains the advantages of WS:
thrashing-preventive load control
task isolation property
While avoiding:
WS-scan: Check and update only pages pointed by clock hand
the space for LR(p): LR(p) is stored in page frame not in page table
Remember that Clock checks pages frames (not page table entries) in clockwise order
free list
page reclamation procedure

Algorithm

Load control
WSClock does not maintain w (the size of W of each process), so it has no clue whether to start a new process or not!
If the Clock-scan fails to find a replaceable page, it signals overcommittment
Every page is dirty, or
Every page is in the working set of the owner process
WSClock may detect overcommittment, but still can't decide whether to start a new process or not

LT/RT Control

Process is either in loading phase or running phase
Loading phase: W has not been loaded yet
Problem
Loading time (wall-clock-time to bring pages in W in memory) is proportional to the number of loading processes because all the loading processes contend for a single auxiliary memory
Many loading processes increase the probability that
running processes finish their job while loading processes contending each other and 
they are replaced by even more loading processes
=> Disk is over committed, but memory is under committed!
Solution
Decrease the loading time by controlling the number of loading processes at a moment
A process is in loading phase until
v virtual time elapsed, or
the process issues an I/O request, otherwise I/O bound process can prevent new task activation undefinitely
LT/RT can be used to refrain from loading another process
while some of memory resident processes are still in loading phase, where
the size of the working set is not known yet!

Experiment: compared with WS, but not compared with Global-Clock

LT/RT significantly improves even pure WS
WSClock performs as effective as WS without OS overhead

Evaluation

Pros
Operational strongness of WS and computational strongness of Clock are combined together: Hybrid is always good :-)
Refraining from activating more job while processes are in loading phase is a good idea, which can be used to WS too
Cons
The size of working set of each process can't be computed. Hence the WS's algorithm of when to start a new job cannot be used. Of course we can use the speed of clock hand movement as an criterion:
When the hand moves slow enough, a job can be activated
When too fast, a job should be swapped out
Compared with WS, but not with Clock