NOTE: 2nd revision ================== Good: - less time, less space - Repeatable execution - require only access to shared objects to which may synchronization primitives can reduce to i.e. does not dependent on any particular form of interprocess communication - no centralized bottleneck, either during monitoring or replay - does not required synchronized clocks or globally consistent logical time - applicable to both loosely coupled and tightly coupled environment Bad: - require all processes to execute during replay, since just record shared objects and accesses (hence require execution to recompute the outputs, which may be input to other processes) (vs. Event log) ==> can use together with event log to re-execute a subset of processes - # 1. INTRODUCTION ----------------- - debug is important - debug sequential program is easy: + determinism + provided fixed input, each execution of a program will always follow the same execution path, produce same results. - debug parallel program + harder, due to non-determinism + given fixed input, re-execution does not necessary produce same results - 2 options to debug a parallel program: + take snapshots of the program state during execution for later examination ~ e.g. Behavioral Abstraction (BA) ~ mechanism to define events hierarchy in terms of sequences of primitive events that occur during program execution ~ during single execution, a monitoring tool records streams of primitive events and presents user the abstract view of program behaviors ~ 2 disadvantages: 1. users exhaustively describes interesting events Also, users need to anticipate events that due to an error 2. amount of information collected is hugh, since the technique is not cyclic + guarantee reproducible behavior of parallel programs (this paper) ~ allow cyclic debugging techniques ~ Example 1: serialization of P operations (processes using semaphores) > Programmer creates P sequences that can be reproduced > requires all P operations to be serialized > Hence, lose parallelism, not useful in truly parallel environment > Not work with message passing (but can similarly use for monitor) > Cannot be used for normal operation (slow) ~ Example 2: Checkpoint each atomic objects and record time stamp for each atomic action > limit to nested atomic actions > require lot of space to store checkpoint ~ Example 3: for message passing model (loosely-couple), record contents of each message in an event log as it is received. > Log can be use as input of replay > Some disadvantages: • not well-suit with tightly coupled system (shared memory) where communication is cheaper and more frequent • space for event log can be very large • hard to examine the global effects, since replay only operates on a single process in isolation - This paper: INSTANT REPLAY + Goal: repeatable execution for highly parallel application in TIGHTLY COUPLED SYSTEMS. + during program execution: ~ save relative order of significant events ~ but NOT the data associated with such event ~ does not require contents of all process interactions (e.g. messages): Hence: • less time • less space than other approaches. + guarantee repeatable execution, given same input from external environment, by imposing relative order on events during replay that occurs during original execution + independent of synchronization primitive (e.g. semaphore, monitor, MPI...) + provide replay entire program (not individual processes in isolation) + NO global synchronization of events, NO centralized bottleneck, NO synchronized clock or global-consistent logical time # 2. INSTANT REPLAY ------------------- Observation: - individual process (even of a parallel program) produces the same behavior provided the same input and assumption that execution is deterministic - hence, ensuring that each process sees *the same input value* at every step of execution, all processes will exhibit same execution behavior during both monitoring phase and replay - each process (given condition above) will produce the same output value in same order. Those output values may then serve as input values for some other process. ==> need NOT to store some input value, because it can be recompute during replay. Now, comes to High-level idea: - Synchronization/interactions between processes can be reduced/modeled as operation on shared objects. - Hence, just need to monitor: 1. Sequence of write operations on each shared object. The partial order of the accesses is specified by a sequence of *VERSION NUMBERS* for each object 2. Each process record the version number of each shared object it accesses - Note: we do not record the data, but just some accesses sequence specified by version number, hence same space and time, compare to traditional approach Example: assume we have 2 processes, P1, and P2, both share same object S P1's code: S = S * 2 P2's code: S = S - 2 During original execution, P1's code is executed before P2's code. Hence the information record: S: 1, 2 P1 (S,1) // P1 accesses S at version 1 P2 (S,2) // P2 accesses S at version 2 Given about recored information, we known the order of access to S, hence and replay the execution correctly. Now, the question: - How to detect shared data structure - How to implement monitoring? They claim that there is no global structure, or global synchronization and bottleneck... Let's find out # 2.A SIMULATING THE EXTERNAL ENVIRONMENT ??? - Real-time programs, in particular, may not be good candidates for Instant Replay because it is so difficult to simulate equivalent virtual machines # 2.B COMMUNICATION THROUGH SHARED OBJECT - process interaction is modeled by operations on shared objects + all communication and synchronization primitive can be reduced to + e.g. P, V in semaphore, entry procedure in monitor + e.g. message passing: access to shared port or memory segment - values exchanges between processes depends on + initial values + order in which processes are granted access to shared object + deterministic nature of process - 2 kinds of operations + read: do not change state of an object + write: change - *Important observation*: + to recreate the proper sequence of state transition, record the sequence of write access on each shared object + recording version number of each shared object read by a process helps recreating proper input values for that process during replay - requirement: this method assume some valid serialization on shared object + there should be some kind of locking/mutual exclusion for shared object + e.g.: this paper assumes CREW: concurrent read, exclusive write + NOTE: if there is a bug that lead to data race, Instant replay may be broken # 2.C DATA STRUCTURES FOR PROGRAM MONITORING - For each shared object: + version number + totalReaders + activeReaders - For each process, maintain a process history tape: + used to record the version number of each shared object accessed by a process + since the relevant info is read and recorded as part of the access to an object ==> the monitoring phase imitates whatever parallelism is exhibited by the applications + NOTE: all history tapes are link together to form a tree, correspond to the hierarchy of processes. + can be augmented to record additional info # 2.D ACCESS PROTOCOL FOR SHARED OBJECT - Primitives: 4 procedure + ReaderEntry: called by each process that reads a shared object + ReaderExit: done reading, exit + WriteEntry + WriterExit - Best description by looking at the code: ReaderEntry (object,process); IF mode = MONITOR THEN P(object.lock); AtomicAdd(object.activeReaders, 1); V(object.lock); WriteHistoryTape(process,object.version); ELSE /*Find out version read during monitoring phase*/ key:= ReadHistoryTape(process); WHILE object.version != key DO delay ENDIF; END ReaderEntry; ReaderExit(object); AtomicAdd(object.totalReaders, 1); /*Ignored in replay mode *I/ AtomicAdd(object.activeReaders, -1); END ReaderExit; WriterEntry (object, process); IF mode = MONITOR THEN P(object.lock); /*Wait for all current readers to finish*/ WHILE object.activeReaders != 0 DO delay; WriteHistoryTape(process,object.version); WriteHistoryTape(process,object.totalReaders); ELSE /*Read version modified during monitoring phase*/ key:= ReadHistoryTape(process); WHILE object.version != key DO delay; /*Read count of readers for previous version*/ key = ReadHistoryTape(process); WHILE object.totalReaders < key DO delay; ENDIF; endWriterEntry; WriterExit(object); object.totalReaders := 0; IF mode = MONITOR THEN object.version + = 1; V(object.lock); ELSE AtomicAdd(object.version, 1); ENDIF; endWriterExit; # 3. MULTIPROCESSOR PROTOTYPE OF INSTANT REPLAY - Good example, about the overhead of monitoring for message passing application. - Each send/receive message is treated as access to shared object (in this case, message queue or shared memory segment). - The implementation use polling to find incoming messages. - Each polling is treated as an access, hence very high overhead - SOLUTION: polling operation except the last one are necessary for replay hence, record only the operation where message is found. # 4. INSTANT REPLAY IN THE DEBUGGING CYCLE How to use Instant Replay - add output message, as long as the message do not affect the sequence of interaction - add breakpoints: this section is really worth readign though hard **Debugging parallel programs with Instant Replay** =================================================== # Take away - does it work with external event? No, like interrupt, IO failures # 1. Debugging parallel program - hard, why? because of *non-determinism* - then how? do something to make execution of parallel program reproducible hence, able to use iterative process to debug - what are general approach to this problem? # 2. Instant Replay (this paper): A bird-eye view - save relative order of *significant* events (not the data) + less time + less space - guarantee reproducible program behavior by enforcing + same input + same relative order of significant event - not depend on any particular form of interprocess communication - provide replay entire program - no global synchronization of events Ok, 2 question: - what are events? significant events? # 3. Instant Replay: more detail view - observation: + individual process (even of a parallel program) produce the same behavior provided the same input and assumption that execution is deterministic + hence, ensuring that each process sees *the same input value* at every step of execution, all processes will exhibit same execution behavior during both monitoring phase and replay - What is events? operation on shared object # 3.1 Simulating external environments ??? # 3.2 Communication through shared object - process interaction is modeled by operations on shared objects ==> all communication and synchronization primitive can be reduced to - values exchanges between processes depends on + initial values + order in which processes are granted access to shared object + deterministic nature of process - 2 kinds of operations + read: do not change state of an object + write: change - *Important observation*: + to recreate the proper sequence of state transition, record the sequence of write access on each shared object + recording version number of each shared object read by a process helps recreating proper input values for that process during replay - requirement: operation