NOTE: 2nd revision
==================

Good:
- less time, less space
- Repeatable execution
- require only access to shared objects to which may synchronization primitives
can reduce to
i.e. does not dependent on any particular form of interprocess communication
- no centralized bottleneck, either during monitoring or replay
- does not required synchronized clocks or globally consistent logical time
- applicable to both loosely coupled and tightly coupled environment


Bad:
- require all processes to execute during replay, since just record shared
objects and accesses (hence require execution to recompute the outputs, which
may be input to other processes)
(vs. Event log)
==> can use together with event log to re-execute a subset of processes
- 

# 1. INTRODUCTION
-----------------
- debug is important
- debug sequential program is easy:
	+ determinism 
	+ provided fixed input, each execution of a program will always follow
	the same execution path, produce same results.
- debug parallel program
	+ harder, due to non-determinism
	+ given fixed input, re-execution does not necessary produce same results
- 2 options to debug a parallel program:
	+ take snapshots of the program state during execution for later examination
		~ e.g. Behavioral Abstraction (BA)
		~ mechanism to define events hierarchy in terms of sequences of 
		  primitive events that occur during program execution
		~ during single execution, a monitoring tool records streams of
		  primitive events and presents user the abstract view of program 
		  behaviors 
		~ 2 disadvantages:
			1. users exhaustively describes interesting events
			   Also, users need to anticipate events that due to an error
			2. amount of information collected is hugh, since the
			technique is not cyclic
	+ guarantee reproducible behavior of parallel programs (this paper)
		~ allow cyclic debugging techniques
		~ Example 1: serialization of P operations (processes using semaphores)
			> Programmer creates P sequences that can be reproduced
			> requires all P operations to be serialized
			> Hence, lose parallelism, not useful in truly parallel environment
			> Not work with message passing (but can similarly use for monitor)
			> Cannot be used for normal operation (slow)
		~ Example 2: Checkpoint each atomic objects and record time stamp 
		  for each atomic action
			> limit to nested atomic actions
			> require lot of space to store checkpoint
		~ Example 3: for message passing model (loosely-couple), record contents
		  of each message in an event log as it is received.
			> Log can be use as input of replay
			> Some disadvantages:
				• not well-suit with tightly coupled system (shared memory)
				where communication is cheaper and more frequent
				• space for event log can be very large
				• hard to examine the global effects, since replay only operates
				on a single process in isolation
- This paper: INSTANT REPLAY
	+ Goal: repeatable execution for highly parallel application in 
	TIGHTLY COUPLED SYSTEMS.
	+ during program execution:
		~ save relative order of significant events
		~ but NOT the data associated with such event
		~ does not require contents of all process interactions (e.g. messages):
		   	Hence: 
			• less time 
			• less space than other approaches.
	+ guarantee repeatable execution, given same input from external 
	  environment, by imposing relative order on events during replay
	  that occurs during original execution
	+ independent of synchronization primitive (e.g. semaphore, monitor, MPI...)
	+ provide replay entire program (not individual processes in isolation)
	+ NO global synchronization of events, NO centralized bottleneck, NO
	  synchronized clock or global-consistent logical time
	
	
# 2. INSTANT REPLAY
-------------------	
Observation:
- individual process (even of a parallel program) produces the same behavior 
  provided the same input and assumption that execution is deterministic
- hence, ensuring that each process sees *the same input value* at every 
  step of execution, all processes will exhibit same execution behavior
  during both monitoring phase and replay
- each process (given condition above) will produce the same output value 
  in same order. Those output values may then serve as input values for some
  other process. ==> need NOT to store some input value, because it can
  be recompute during replay.

Now, comes to High-level idea:
- Synchronization/interactions between processes can be reduced/modeled as
  operation on shared objects. 
- Hence, just need to monitor:
	1. Sequence of write operations on each shared object. The partial order of 
	the accesses is specified by a sequence of *VERSION NUMBERS* for each object
	2. Each process record the version number of each shared object it accesses
- Note: we do not record the data, but just some accesses sequence specified by 
  version number, hence same space and time, compare to traditional approach
  
Example: assume we have 2 processes, P1, and P2, both share same object S
P1's code: S = S * 2 
P2's code: S = S - 2 
During original execution, P1's code is executed before P2's code. Hence
the information record:
S: 1, 2
P1 (S,1) // P1 accesses S at version 1
P2 (S,2) // P2 accesses S at version 2
Given about recored information, we known the order of access to S, hence
and replay the execution correctly. 

Now, the question:
- How to detect shared data structure
- How to implement monitoring? They claim that there is no global structure,
or global synchronization and bottleneck... Let's find out

# 2.A SIMULATING THE EXTERNAL ENVIRONMENT ???
- Real-time programs, in particular, may not be good candidates for Instant 
  Replay because it is so difficult to simulate equivalent virtual machines

# 2.B COMMUNICATION THROUGH SHARED OBJECT
- process interaction is modeled by operations on shared objects
  	+ all communication and synchronization primitive can be reduced to
	+ e.g. P, V in semaphore, entry procedure in monitor
	+ e.g. message passing: access to shared port or memory segment
- values exchanges between processes depends on
	+ initial values
	+ order in which processes are granted access to shared object
	+ deterministic nature of process
- 2 kinds of operations
	+ read: do not change state of an object
	+ write: change
- *Important observation*:
	+ to recreate the proper sequence of state transition, record the
	sequence of write access on each shared object
	+ recording version number of each shared object read by a process helps
	recreating proper input values for that process during replay
- requirement: this method assume some valid serialization on shared object
	+ there should be some kind of locking/mutual exclusion for shared object
	+ e.g.: this paper assumes CREW: concurrent read, exclusive write
	+ NOTE: if there is a bug that lead to data race, Instant replay may be 
	  broken

# 2.C DATA STRUCTURES FOR PROGRAM MONITORING	
- For each shared object:
	+ version number
	+ totalReaders
	+ activeReaders
- For each process, maintain a process history tape:
	+ used to record the version number of each shared object accessed by a 
	  process
  	+ since the relevant info is read and recorded as part of the access to
	  an object ==> the monitoring phase imitates whatever parallelism
	  is exhibited by the applications
	+ NOTE: all history tapes are link together to form a tree, correspond
	  to the hierarchy of processes.
	+ can be augmented to record additional info

# 2.D ACCESS PROTOCOL FOR SHARED OBJECT	
- Primitives: 4 procedure
	+ ReaderEntry: called by each process that reads a shared object
	+ ReaderExit: done reading, exit
	+ WriteEntry
	+ WriterExit
- Best description by looking at the code:

ReaderEntry (object,process); 
	IF mode = MONITOR THEN
		P(object.lock); 
		AtomicAdd(object.activeReaders, 1); 
		V(object.lock); 
		WriteHistoryTape(process,object.version);
	ELSE 
		/*Find out version read during monitoring phase*/ 
		key:= ReadHistoryTape(process); 
 		WHILE object.version != key DO delay
	ENDIF; 
END ReaderEntry; 

ReaderExit(object);
	AtomicAdd(object.totalReaders, 1); /*Ignored in replay mode *I/
	AtomicAdd(object.activeReaders, -1);
END ReaderExit;	

WriterEntry (object, process); 
	IF mode = MONITOR THEN
		P(object.lock); /*Wait for all current readers to finish*/ 
		WHILE object.activeReaders != 0 DO delay; 	
		WriteHistoryTape(process,object.version); 
		WriteHistoryTape(process,object.totalReaders);
	ELSE 
		/*Read version modified during monitoring phase*/ 
		key:= ReadHistoryTape(process); 
		WHILE object.version != key DO delay; 
		/*Read count of readers for previous version*/ 
		key = ReadHistoryTape(process); 
		WHILE object.totalReaders < key DO delay;
	ENDIF; 
endWriterEntry; 

WriterExit(object);
	object.totalReaders := 0; 
	IF mode = MONITOR THEN
		object.version + = 1; 
		V(object.lock);
	ELSE
		AtomicAdd(object.version, 1);
	ENDIF; 
endWriterExit;

# 3. MULTIPROCESSOR PROTOTYPE OF INSTANT REPLAY

- Good example, about the overhead of monitoring for message passing 
  application. 
- Each send/receive message is treated as access to shared object
  (in this case, message queue or shared memory segment).
- The implementation use polling to find incoming messages. 
- Each polling is treated as an access, hence very high overhead
- SOLUTION: polling operation except the last one are necessary for replay
  hence, record only the operation where message is found.

# 4. INSTANT REPLAY IN THE DEBUGGING CYCLE
How to use Instant Replay
- add output message, as long as the message do not affect the sequence of
  interaction
- add breakpoints: this section is really worth readign though hard


**Debugging parallel programs with Instant Replay**
===================================================

# Take away
- does it work with external event? No, like interrupt, IO failures

# 1. Debugging parallel program
- hard, why? because of *non-determinism*
- then how? do something to make execution of parallel program reproducible
hence, able to use iterative process to debug
- what are general approach to this problem?

# 2. Instant Replay (this paper): A bird-eye view
- save relative order of *significant* events (not the data)
	+ less time
	+ less space
- guarantee reproducible program behavior by enforcing
	+ same input
	+ same relative order of significant event
- not depend on any particular form of interprocess communication
- provide replay entire program
- no global synchronization of events

Ok, 2 question:
- what are events? significant events?

# 3. Instant Replay: more detail view
- observation:
	+ individual process (even of a parallel program) produce the same behavior
	provided the same input and assumption that execution is deterministic
	+ hence, ensuring that each process sees *the same input value* at every 
	step of execution, all processes will exhibit same execution behavior
	during both monitoring phase and replay
- What is events? operation on shared object

# 3.1 Simulating external environments 
???
# 3.2 Communication through shared object
- process interaction is modeled by operations on shared objects
	==> all communication and synchronization primitive can be reduced to
- values exchanges between processes depends on
	+ initial values
	+ order in which processes are granted access to shared object
	+ deterministic nature of process
- 2 kinds of operations
	+ read: do not change state of an object
	+ write: change
- *Important observation*:
	+ to recreate the proper sequence of state transition, record the
	sequence of write access on each shared object
	+ recording version number of each shared object read by a process helps
	recreating proper input values for that process during replay
- requirement: operation