Question:

* WSClock and WS accuracy?

According to the paper: The main dissimilarity between WS and
WSClock lies in the difference between W and RW and its use in the 
WS load control. When a task is activated, W (or RW) frames
of memory are "committed" to the task. Memory frames are "allocated" only
as the task executes and demands them by referencing them. If,
at activation, a p in (W join R') is not reference for tau,
the periodic WS-scan of WS will remove p from W, while WSClock
lacks a mechanism to perform this operation. WSClock can remove
inactive pages only if they are resident. But this inaccuracy
of WSClock is only limited in the first tau after each activation.
Their claim: if a task has executed for at least tau units of 
virtual time since activation, W and RW will be identical.

The largest discrepancy between R and RW occurs immediately after
activation and decreases rapidly during the loading phase. Since
RW <= W during the loading phase, WSClock underestimates the amount
of memory that a loading task may require. This implies that WSClock
has a tendency to make additional activations and overcommit memory
during this phase. Thus, the LT/RT control not only prevents 
overcommitment to auxiliary memory, but also prevents overcommitment 
of main memory by WSclock.


Take away:
- what is WS model? algorithm?
- why it is impractical? (keeping track of time...)
- what is the approximation... (see the wsclock paper)
- load control?
- WS-clock only approximate "resident" working set --> hence there can be
overcommitment. Solution: LT/RT

Why LT/RT?

# Discussion WSClock

# MODEL OF VIRTUAL MEMORY SYSTEM
- Task is model by 
	+ address space P = {pi | i = 1..m} of pages
	+ reference string {rt | t = 1..T}, each rt is ordered pair (p, d) where 
		> p in P 
		> d is boolean, true if the reference dirties the page
- Configuration Model:
	+ a central processor
	+ main memory with M page frames
	+ set of I/O devices
- OS model: scheduler, memory manager, load control
 + Scheduler
   ~ each task is part of *ready* queue or *active* queue
		> task remains in active queue until: 
		(1) the time-slice is exhausted;
		(2) the task is deactivated by the load control; 
		(3) task completes and become dormant
   ~ use multi-level load-balancing queue

 + Memory Manager
   	~ allocate main memory frames to each task, requests paging IO if needed
   	~ at any given time, each task address space P is partitioned into
	  a resident set R (pages which occupy main memory frames), and a missing
  	  set R'.
   	~ page fault happens when processor execute reference (p,d) and p not in R
   	~ demand paging (Page in only if need to, that is no pre-page-in)

 + Load Control:
   ~ goal: maximize the number of active task without inducing thrashing
   ~ thrashing occurs when so many tasks are active that the sum of their
     memory needs exceeds the size of main memory --> overcommitted
   ~ *load* control: moving task back and forth between ready queue & active
	 queue
	 > when memory is under-committed, may move tasks from ready queue to active
	 > when memory is overcommitted, move task from active to ready queue

# WORKING SET POLICY

Working set definition
----------------------
- Working set W to all pages p in P referenced in previous θ units of task's 
  virtual time (virtual time of a task is the number of references that have 
  been completed for that task)
- Arrival of new p in W is signaled by a page fault
- To detect the departure of the page from W is more difficult. It requires
  + LR(p): last reference time of a page 
  + a procedure to detects pages for which VT - LR(p) >= θ
- 2 general approach to calculate LR(p)
  + Software: 
	~ use some approximate, e.g. page-frame use-bit. The use-bit is
      tested and cleared, and if the page was recently referenced, set LR(p)
      to the task current VT (of course, this process running on background,
      at some regular interval).
    ~ there is a WS-scan that performs periodically to update the working set
  + Hardware support: each page frame is associated with a counter that 
    approximate the virtual-time-since-last-reference directly (i.e. VT-LR(p))  
    ~ The counter is cleared when the page is referenced
    ~ The counter is automatically incremented every fixed amount of time 
    ~ There still need a WS-scan performed periodically

Load Control and Replacement Algorithm
--------------------------------------
- When a task is active |W| frames are committed to the task
- Total memory commitment Wactive is the sum of the W of all active tasks
- WS load control will activate a ready task unless the W of the first ready
  task exceed M - Wactive
- A: set of available frames
- What about page replacement policy:
  + if there a free page frame in A, use that
  + otherwise, deactivate an active task (move it to ready queue), and that 
    task's resident pages are placed in A

Page Writing and Reclaiming
---------------------------
- When a dirty page is placed in A, it must be cleaned before it is replace
- Approach 1: couple the writing of a dirty page and the reading of the page
  that replaces it ==> slow, requires 2 IO
- Approach 2: replace only the clean pages in A, and clean the dirty pages in A
  when there are no outstanding page read requests for a device

# CLOCK POLICY
- simple approximation of global LRU replacement algorithm
- Why not LRU? more complex, need to keep track time a page is referenced
- Details algorithm (Figure 1)
	+ all main memory page frames are ordered in a fixed circular list
	+ a pointer always point to the last frame replaced
	+ when a frame is needed to hold a missing place, the pointer is advanced
	  clockwise, scanning the frames in circular order
	+ The use-bit is tested and cleared
	+ if the bit was set, the frame is recently-used and is not replaced
	+ if the bit was clear, it is replaceable if the page is clean
	+ if replaceable page is dirty, then it is scheduled for cleaning and not
	  replaced.
	+ when the clock locates a clean and not-recently-used page, algorithm halts
	leaving the pointer pointing to the chosen frame to mark the starting point
	for next scan.
- Global policies, such as CLOCK, allow all active tasks to compete for main
  memory allocation.
  No mechanism to determine a task's memory needs independently of other task
  Hence, there is no explicit estimates of memory commitment 
==> require an *adaptive feedback control mechanism*
(e.g. page fault rate is monitored, if it is too high, deactivate some task,
if it is too low, activate some task)

# COMPARING WS and CLOCK
- Implementation
  + WS is more complex, since it requires
    ~ scheduling and executing WS-scan procedure (to keep the WS up-to-date)
    ~ memory to store LR(p) for every page of the task
    ~ algorithm to maintain the set A of available pages including 
      methods to reclaim pages from A
  + Clock
    ~ simpler
    ~ less memory
    ~ but requires adaptive feedback control mechanism
      (may be difficult to tune, compare with WS)
- Policy:
  + WS has advantages of good task isolation , predictive load control
   (WS just need to look at M - Wactive, and in Clock, need heuristic, e.g. page     
    fault rate ..)  --> local policy      
  + Clock, every task competes for global memory: --> global policy
    ~ some tasks may monopolize main memory and force other tasks run slowly
    ~ can lead to thrashing
- Performance:
  + hard to compare

# WSCLOCK          

Basic Features
--------------
Combines best feature of both WS and Clock:
- thrashing-preventive load control and task isolation properties of WS
- Eliminates:
  1. WS-scan (in fact, the clock-scan is an approximation of Ws-scan)
  2. Space for LR(p)
  3. Available frame set A
  4. Page reclaimable procedure (come with A)
- use simple mechanism found in CLOCK, but does not require adaptive-feedback
  load control (as in CLOCK)
- simpler than both WS and Clock

Data Structure
--------------
- Main memory are arranged in fixed circular CLock-like list
- CLOCK pointer identifies the last frame replaced in the previous CLOCK-scan
- LR(p) is defined only for resident pages (Instead of LR(p) for all page p in 
  P)
- When a page fault occurs, a page read request is placed in paging queue,
  when that request is served, the replacement algorithm is invoked to
  obtain a frame containing a clean replaceable page to hold incoming page

Algorithm
---------
- Detail Algorithm: Figure 2 (one error in this figure!)  
	+ To examine a frame, WSClock tests and clears the frame use-bit
	+ if the use-bit is set, meaning that page is referenced recently, WSClock
	  update the LR(p) to be the owning task's VT.
	+ otherwise (if use-bit is clear) if VT - LR(p) > Tau, the page is removed
	  from the working set.
	+ A page p is replaceable if either:
		~ p is not in the working set
		~ or the task is inactive
	+ If a replaceable page is dirty, it is scheduled for cleaning and not
	  replaced.
	+ When WSClock finds a clean replaceable page, it halts and returns 
	  the pointer at the chosen frame		
- No need the available page set A because it simply searches for and finds 
  a replaceable page when one is needed
	+ This can be a trade-off right!
	+ With A, you pay the overhead of maintaing that free pool, but it 
	is faster when you need a page...
- Page reclamation is eliminated because a page is not removed from a task's 
  resident set R until it is selected for replacement  
  
Load Control
------------
- WSCLOCK scans only p in the resident set R, it does not approximate W if W 
  contains some p NOT in resident R
  ==> WSCLOCK approximates only resident working set RW
- In term of load control, WSCLOCK is similar to WS, the only difference is 
  that WSCLOCK use RW = |RW| as memory commitment for each task.
  + RWactive: sum of RW of all active task
  + RW of a ready task is the value of RW when that task was deactivated
   (==> We can only say that WSClock is good approximation when a working
	set of a task is mostly resident.)
- To detect overcommitment: when Clock-scan fails to find a clean replaceable
  page in full circuit of frames
  (some task must be deactivated to relieve overcommitment)

# LT/RT Control

Introduction
------------
- when a task is activated, enter *loading* phase, i.e. it has few p ∈ R and
  must load missing pages to executes efficiently
- then, enter *running* phase:
  + few page fault occurs
  + task run efficiently
- Typically, virtual time of loading phase is small, while real time of
  loading phase is large because of paging I/O delays
- if activations occur frequently, many loading task may contends for access
  to IO devices ==> loading time is proportional to # concurrent loading task
- This can cause IO device (auxiliary memory) very busy although main memory
  is undercommitted.

==> Solution: Loading task / running task control (LT/RT control)
  + reduce the mean loading time of a task
  + ensure more consistent balance of loading and running tasks

Implementation of LT/RT
-----------------------
- idea: limits the number of concurrent loading tasks
- L: number of maximum concurrent loading tasks
  + determined empirically
  + close to # of paging devices which can be accessed simultaneously
- How to detect if a task is in a loading phase or running phase?
  Heuristic: a task is loading until it has executed for t unit of virtual time
  or has requested an I/O operation
- Why consider a task that has request an I/O operation a running task?
  Because if not, an IO bound task might prevent new task activations for a 
  long time

- 3 effects:
  + reduces the mean delay between the time that main memory becomes available 
    to activate a task and the time that the task enters the productive running 
    phase
  + main memory is used more effectively, since fewer page frames are committed 
    to unproductive loading tasks.
  + *improves processor utilization* because it maintains a more consistent 
    balance of loading tasks and running tasks.
   (i.e CPU is spent more time in serving those tasks already run efficiently,
	rather than waiting for task that are in loading phase )

- Further optimizations:
  + deactivate a task when its time-slice end
    ~ delay deactivation if there are L loading task active  
      Why? Because if not, after deactivation, we need to activate a task
      in the ready queue, which make the number of loading tasks L+1
    ~ deactivate only when # loading task < L, and there are insufficient
      uncommitted memory to activate first ready task
  + processing page read for running tasks before reads for loading tasks
    Why? Because # page fault of running tasks is small, compare to that
    of loading tasks. Serve it first will improve processor utilizations
    because we are giving preference to those tasks that execute efficiently

- LT/RT is independent of WS and CLock load control
  + still limits # loading task even if they will not overcommit main memory

(Basically, they can do a bunch of hack here, sick of this)

# Experiment
From the graph, we see that, 
1) With LT/RT, processor utilization is increased
2) WSClock performs equivalent to WS