**Time, Clocks, and the Ordering of Events in a Distributed System** ==================================================================== Note: this is borrowed mostly from Andrea's slide found at "http://pages.cs.wisc.edu/~dusseau/Classes/CS739-S06/Writeups/time.pdf" # Problem/Motivation - If want to develop distributed algorithm (and have all participant come to the same conclusion), it helps if all see inputs in same order. Question? - How to know when an event precedes another in a distributed system? It is hard to tell, sometimes impossible, because: + in distributed system, nodes may not share the same clock + event A happens in machine A, event B happens in machine B, but there is no communication between A and B, then did event A, or event B happen first? Solution: logical clocks (see below) # System Model - System consists of processes - Process consists a sequence of events: + instructions + sending messages + receiving messages - *Within a sequence*, a occurs before b if a happens before b ==> i.e events within a process have a *total ordering* - Sending/Receiving message is events - Processes communicates with each other by sending/receiving message # Happen Before Definition: Partial ordering of events in distributed system a -> b: a happens before b if (1) a, b are in the same sequence of events of a process, and a comes before b (2) a is the event of sending message M, and b is the event of receiving M (3) a -> b, b -> c, then a -> c If a -> b: it is *possible* for a to causally affect b Concurrent: if a is not happens before b (followed above rules), we say that a is concurrent with b. + if a and b are concurrent, we do not know the ordering between a and b + it is not possible for a to causally affect b # Logical clocks: definition - Abstract view: a way of assigning a number to an event to express ordering Note: No relation between logical clock and physical time - Clock Ci for process Pi is a function which assigns a number Ci(A) to any event A in Pi - *Clock condition*: for any events A, B: if a -> b then C(A) < C(B) + this provide partial ordering + Note: Converse condition does not hold otherwise, concurrent events have the same logical time (we can't say that) (i.e if C(A) < C(B), we can not tell A -> B, hence this is partial ordering) # Logical clocks: implementation C1. if a and b are in Pi, and a before b, C(a) < C(b) IR1. each process Pi increments Ci between any two successive events (in Pi) C2. If a is sent by Pi and b is received by Pj, then Ci(a) < Cj(b) IR2. (a) if event a is the sending of message m by process Pi, m contains a timestamp Tm = Ci(a) (b) Upon receiving m, process Pj sets Cj greater than or equals to its present value and greater than Tm # Total Ordering: Question? How to use logical clock to define/obtain Total ordering across all processes and event if a => b if and only if: 1) Ci(a) < Cj(b) OR 2) Ci(a) = Cj(b) and Pi < Pj (i.e use process ids to break ties) Note that partial ordering is unique, but total ordering is not. Why? + Concurrent operation can go any order (see example in the paper) + Depends on implementation of each Ci() + Depends upon tie breaking rules # Distributed state machine - Each process runs same distributed algorithm - Relies upon total ordering of requests + Agreed upon by all participants + Can be used to ensure all see events (inputs) in same order and therefore make same decisions - Idea: + All requests are time-stamped, responses (acks) are time-stamped too + Send time-stamped request to all processes + Handle next request in total order > To know next request, must have received greater timestamp from all possible participants > Problems?
 1) If one process crashes, this algorithm does not work 2) it require the participation of all processes Why? Since a process must know all the commands issued by other processes ( see andrea notes: there is a great a example about this) Example: mutual exclusion on a resource --------------------------------------- - fixed collection of processes share a single resource - only one process can use resource at a time --> need synchronization - requirements: 1. Process which is granted the resource must release it before the resource can be granted to another process 2. Different requests for the resource must be granted in the order in which they are made 3. If every process which is granted the resource eventually releases it, then every request is eventually granted. - Why central scheduling process does not work? + Say we have scheduling server P0 + P1 sends request to P0, then sends a message to P2 + Upon receiving message from P1, P2 sends P0 a requests. + It is possible for P2's request to reach P0 before P1's request ==> Violate condition (2) - Some assumption: + The message sent from Pi to Pj are received in the same order + No failure: i.e. message lost ... + process can send message directly to every other process - The algorithm: + The resource is initially granted to P0 + Each process has a *request queue* which initially contains "T0:P0 request" ~ T0 is less than the initial value of any clock + 5 rules: Rule 1: To request resource, Pi send "Tm:Pi request" to every other process, put that request in its queue. Tm is the timestamp of the message Rule 2: When Pj receives "Tm:Pi request", it places it on the request queue, and send (timestamped) ack to Pi Rule 3: To release the resource, Pi removes any "Tm:Pi requests" message from its queue and sends a timestamped "Pi releases" message to every other process Rule 4: When Pj receives a "Pi releases", it removes any "Tm:Pi request" message from its queue Rule 5: Pi is granted the resource when both: (i) There is a "Tm:Pi request" in its request queue which is ordered before any other request in its queue by the relation ==> Why (i)? To make sure requests are granted in order (ii) Pi has received a message from every other process time-stamped later than Tm Why (ii)? To make sure Pi learns about all request which preceded its current request # Physical clock - Motivation: Can observe *anomalous* behavior if other communication channels exist between processes + Example: a person issues a request on computer A, and then call his friend to have him issue a request B in a different computer B. ==> it is possible for request B to receive a lower timestamp and be ordered before request A. + Hence, useful to have physical clock with meaning in physical world - Synchronize independent physical clocks, each running at slightly different rates (skew) - Implementation Idea: + Send timestamp with each message + Receiver may update clock to *timestamp+minimal network delay* > Clock must always increase My notes about this: -------------------- - Goals: to solve anomalous behavior above - Some def: Ci(t): reading of the clock Ci at physical time t (at process i) - Note: + dCi(t)/dt ≈ 1, ideally it should be = 1, but the reading of the clock Ci never exactly the same as the *real* physical time. Hence we want | dCi(t) /dt - 1 | < k, where for example, k < pow(10,-6) + We also want | Ci(t) - Cj(t) | < ε I.e we want to synchronize physical clock of process i and j so that Ci(t) ≈ Cj(t) - So How? + If Pi sends a message m at physical time t, then m has timestamp Tm = Ci(t) + Upon receiving a message m at time t', Pj set Cj(t') = max (Cj(t'-0), Tm + μm) Where μm the minimal network delay ==> Clock must always increase # Conclusion: - Distributed, replicated state machines useful for tolerating faults - Need to construct total ordering of events to obtain same results everywhere - Logical clocks very simple to implement Will see logical clocks used to update replicas