« | Main | Distributed snapshots: determining global states of distributed systems »

Time, Clocks, and the Ordering of Events in a Distributed System

L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System, Communications of the ACM, July 1978, pages 558-564.

Reviews due Tuesday, 9/30.

Comments

problem:
they want to define an ordering of events in different processes in a distributed system (to know what event happened before what event). this is important for example when the goal is to allocate a resource to the process that first requested that resource.

solution:
they first define the "happened before" relation. they define a logical clock and propose an algorithm to synchronize the logical clocks across different processes and maintain the order of events (for example if the time stamp of the received message is bigger than current time, update the current time to time stamp so that time of receiving the message be after the time of sending the message).
they explain how they can use this logical clock to allocate resource (when the use of the resource is mutually exclusive: only one process can use the resource at a time). all processes have a queue. when one process needs the resource, it sends the request to all other processes and they return their acknowledgment and all the processes put this request in the queue. this way all the processes know about all the requests and the time they were requested. and one process only use the resource when its request is the earliest request. and the process that is finished with the resource send a release message to every other process to let them know the resource is free.

they explain how a logical clock can creates anomalies (because the real time and logical time are not the same and the user that had an earlier request might receive the resource later).

they update their synchronization method to work for a physical clock and theoretically prove the limits of the delay.

they also mentioned that they are not considering the failure. failure of one process will create a deadlock because the process that requests a resource waits for acknowledgement of all other processes.

Summary:
- The paper developed logical clocks (counters) to achieve a partial ordering of event.
- The paper developed physical clocks and its requirements to achieve total ordering of events.
- The ordering of events can be used to solve synchronization problems.

Problem:
- Distributed Systems have a transmission delay, which leads to synchronization problems.
- Difficult to determine which process should have access to resources without synchronization.
- Clocks are not perfectly accurate and aligning real clocks is difficult.

Contributions:
- The paper developed a clock condition model that, if satisfied, will detect which of the two related processes comes first. This technical contribution of being able to sort related events is an important for distributed systems to implement “last write wins”.
- The paper developed physical clock requirements for accurate clock synchronization. This technical contribution denotes the maximum transmission delay that the system could sustain for an accurate physical clock in a distributed system.
- The paper shows how total ordering can be used for resource requests. This contribution allows global resource sharing, rather than using a heavily loaded centralized system to distribute resources.

Applicability:
- As the paper has mentioned, the total ordering of events is not fault tolerant and the system will fail if a single process fails.
- Each process has to communicate with every other process to request resources, which is a huge waste in bandwidth. An alternative approach is to allow processes communicate with resources to determine if they are available.
- Every process maintains their own resource queue, which contains duplicated information if multiple processes request the same resources and is a waste of memory. An alternative is to allow resources to keep a queue of the processes that wants to use it.

Summary
The paper tries to tackle the issue of time and ordering of event and mutual exclusion within a distributed system in the absence of a central time source where each system is communicating to the other via messages.

Contributions
1.Partial Ordering - Uses a simple happens before (-->) relationship to determine order between two events within the same process . This is simply a partial ordering of events .

2.Logical Clocks - By sending a timestamp along with the message , the receiving end can use this timestamp 'm ' and increment it to a greater value . So ordering can be established by the use of simple counters . But it is limited to only the components of the system that are interacting with each other . To ensure mutual exclusion among events , processes could first request for a resource from other entities and on being granted the resource , could send out the message , similar to the "token ring" concept .
3.Physical Clocks - Devised a clock synchronizing algorithm to reduce drift among physical clocks by only using the current value of the process physical clock and the timestamp of the message received

Conclusion
This paper was way ahead of its time and successfully came up with a solution of ordering events in a distributed system .However it fails to address issues of failures and mentions the need for a physical time to correctly detect and recover after failure .

SUMMARY: The paper discusses and defines what it means for events to occur "before" other events in a distributed system and introduces algorithms for total ordering based on logical and physical clocks.

PROBLEM: As events happen in one process, other events may be happening in a
separate process and communication between the two takes time. As such, the
question of "what happened first" in two different processes may be ambiguous,
and if a clear and consistent ordering of events is not defined, it leads to
anomalous behavior.

CONTRIBUTIONS: First, the paper presents a formal definition of the partial order, in which some events can be "concurrent" with others, meaning that neither happened before the other and there were no data dependencies. By then introducing logical clocks and an algorithm for assigning a logical clock tick to each event by using timestamps on the messages, it is possible to achieve a total ordering of events. However, the total ordering depends on some method of "breaking ties" and as such total orders are not unique. Introducing physical clocks which can be synchronized within some guarantee helps align the total ordering with an actual person's experience of the flow of time.

APPLICABILITY: Being able to consistently determine the order of events in a distributed system remains important to this day. For example, when debugging two sides of a client/server connection, I sometimes need to rely on log files with time stamps, and knowing that they are in sync is vital to determining the actual sequence of events. Services such as ntpd exist and are widely deployed on the Internet for this very reason. The theoretical foundations put forth in this paper were necessary for solving and proving correctness of synchronization problems and is key to distributed systems research.

Summary:
In this paper, author proposes logical clock to coordinate events in distributed system.

Problems:
Distributed systems consist of collection of computers that do not share a common clock and a common memory. The only way processes in distributed system can exchange information is through messages over network. The fact that message delay in any network is unpredictable, means that it is not possible for all the processes to have same view of global state.

To achieve that, author tries to come up with "time" and "state" concepts in distributed systems.

Contributions:
The author assumes that all events in distributed system have apriori ordering.To capture the order the author uses logical clocks.
The author comes up with clear definition of happened before relation in distributed systems.
The author abstracts the concept of real clocks to logical clocks. They are implemented as counters with no relation to actual clocks, to provide partial ordering between two events.
To solve the shortcoming where two events in the system will have same counter,author extends partial ordering to total ordering by including process identifier.

Limitations:
1) If logical clock value of a is than logical clock value of b, no one can conclude that a happened before b.
2) Because of their relativity, external failures (process failure, network failure) can lead to anomalous behaviour.
3) Logical clocks are not scalable.

Application:
Logical clocks are the first provide global view in distributed system. They are the foundation base for various global event ordering mechanisms used.

Summary
The paper discusses methods to provide ordering of events in processes for distributed systems.

Problems trying to solve
Provide order between different events of different processes with dependencies communicated via messages. Deal with anomalous behavior caused with the use of Logical Clocks. Synchronization of clocks.

Contributions
Defined a set of rules, using the happened before relation, providing partial ordering.
Implements total ordering with the use of logical clocks.
Illustrate use of total ordering with the help of a mutual exclusion problem.
Identifies anomalous behavior and provides with a solution to overcome it.
Provides with a method for synchronization of physical clocks and validates it by proving that it satisfies defined rules.

Applicability
The issue of providing ordering and physical clock synchronization is important in distributed systems. The paper tackles the problems and provides a base for future works e.g. Amazons use of vector clocks.

Summary
This paper introduces the concept of “happening before”, which is used to define partial ordering of the event in a distributed multiprocess system. And they further extend it to total ordering to solve the synchronized problem.

Problem Description
In the multiprocess system, synchronization is one of most important design modules, since some events need to be executed sequentially in order to provide mutual exclusion and so on. However, to synchronize the physical clock is hard and sometime unnecessary if people just need corrected order of events. Therefore, this work aimed to design an algorithm to provide a system level logical clock synchronization.

Contribution
1. The biggest contribution comes from proposing a “happened before” relation without using physical clock to achieve system-level synchronization. And later, they extend it to total ordering of all the event.
2. They propose an algorithm to set the logical clock properly to ensure the event ordering.
3. To avoid the central scheduling process, they implemented hand-shake like mechanism to achieve a distributed algorithm.
4. Combined with physical clocks, they solved anomalous behavior problem.

Applicability
This work is done at 1978, and there are a lot of simple assumptions, which seems not very realistic now. There is a large overhead when processes change the information. Also, all the processes need to participate into the algorithm, and system suffers from a single process failure. However, the contribution of this work is still fundamental, as they introduce a novel idea to solve the synchronized problem in distributed system.

Summary: This paper argues that logical clocks (time), instead of physical clocks, is the right way to consider in distributed systems. Based on that, the paper develops the partial order of events, the algorithm to define clocks, and the algorithm to do synchronization in distributed systems. It also provides bounds about how far the physical clocks of distributed systems is from synchronization.

Problem: Synchronization is one esential problem in distributed systems. The internet connection between nodes can have different delay, the physical clocks can have bias. Therefore, time based on physical time will not help synchronization. One need to define another concept of time, which is independent of physical time, that can reflect the order of events happens in each node and between nodes.

Contributions: This is a very pioneering paper that reconsider the nature of time in distributed systems.
1. The author points out that the utility of time is to define the logical order between events in the distributed systems. Physical time is not useful for it may contradict the logical order.

2. It defines the partial ordering of events (denoted as ->): first, in one node, event a happened before event b is denoted as a->b, second, if a is the sending message fron one node and b is the receiving of the message in another node, a->b. This is the causal relationship between events.

3. Based on the partial ordering, the author defines the clocks as any function for events that do not violates the partial order between events. In other words, the clocks complements the partial order to a total order. It further provides the algorithms to build the clock.

4. Based on the clock algorithm, it introduces an application of the clock: synchronization between nodes. When a node wants to requests/release a resource, it will send the message to all other nodes, and uses/releases the resource after receiving all other nodes' acknowledgement of the message.

5. When external events occur in the system (called anomalous behavior), the schronization algorithms does not work. The author fixes this issue by using stronger ordering based on the synchronized physical clocks.

Application: Lamport proposes this pioneering and very simple algorithm for synchronization based on the reconsideration of clock in distributed system. It is widely used in current synchronized distributed systems. There are also limitation for this early stage algorithm: 1. The communication to all other nodes need a lot of traffic load. 2. It cannot handle the failure of nodes, or disconnection of the network.

Summary:
The paper talks about ordering of the events in a distributed system. The author talks about how partial ordering can be achieved by analyzing the flow of events/ processes and how total ordering can be obtained with logical clocks. This, however has anomalous behavior which can be rectified with physical clocks and special algorithms to determine the bounds and parameters for these clocks.

Problem:
Ordering of the systems is essential in a distributed system, especially in cases like distributing the resources to ensure fairness. Existing methods (at the time) fail to achieve total ordering of the events. Total ordering in a system is difficult because all the nodes are distributed and have no common clock to ensure the same timing synchronization. Hence alternative techniques need to be developed to make up for the errors in the timing synchronization and consequently achieve total ordering.

Contributions:
• Partial ordering can be attained by comparing the sequence of events in a set of processes in a system.
• Logical clocks are used to define the ordering between the events and they are basically simple incrementing counters.
• The clock condition requires that if A occurs before B, then clock value of A is less than clock value of B. 2 rules are followed to achieve this: first, a process increments the counter between 2 consecutive events; second - in case of inter-process messaging the receiver ensures that its timestamp is greater than that of sender and also greater than/equal its current value.
• Total ordering of a system can be achieved by ordering all the events of the system by the time of occurrence (or rather the clocks that obey the clock condition), ties are broken by arbitrary ordering. The author illustrates with an example of resource sharing wherein all the process interact with timestamps to decide on the resources.
• The above method for total ordering has anomalous behavior when a new process wants to request for resources since it would not have the appropriate timestamp. This can be countered by using the physical clocks that provide almost accurate timestamp.
• 2 conditions need to be satisfied by the physical clocks to fit into the total ordering system: first, the clock must run at almost correct rate, that is, with marginal error; second, none of the clocks must differ by more than a small constant with the marginal error rate defined by the first system.
• The theorem derived in the paper helps in defining the error rates allowed for the physical clocks; also it has methods that can help in estimating the message delays and adjusting the clock rates such that they are synchronized (within error margin).

Applicability:
At the time of the paper, the author mainly came up with these synchronization ideas for cases such as allocation of resources. Now, however, these ideas can be used for appropriate synchronization of events – especially in cases like stocks and other crucial financial applications synchronization of the events plays an important role and Lamport’s work is definitely a great contribution for today’s systems too.

Summary: In this paper, the authors discuss how to
define a total order among events in distributed systems.
The solution is to implement clocks, both logical or
physical, in distributed systems. These two clocks
have different requirements, and therefore,
their implementation differs. Essentially, logical clocks
have weaker requirement, and physical clocks can avoid
anomalies that might be possible with logical clocks

Problem: Although the concept of clock is natural,
how to implement it is not obvious. The problem
is how to synchronize among different computation
nodes to achieve different levels of global
consistency. The problem solved in this paper
is the requirement of consistency, their
impact on anomalies, and their corresponding
implementations.

Contributions: First, the author defines a weak
"happens before" relation, which is a partial
order of events, and use machine ID to break ties
to form a total order. This could be achieve by
logical clock, which relies on two requirements:
(1) on the single machine, it define a valid
total order on events; (2) among machines, it
defines a valid order between message sending
and receiving. Therefore, each machine
can maintain a counter, and each message has
senders' clock, and receiver updates counter
using senders' clock.

Second, the author shows that this logical
clock has anomalies problems. Therefore, a
physical clock is defined. The requirement is
that each machine has a clock, with \epsilon
accuracy. A synchronization algorithm is
then defined to keep synchronize between
clocks.

Applicability: The applicability is general
to distributed systems that all processes
can talk to each other. When there is a
hierarchy of range of sending messages,
more broadcasting algorithms are needed.

Summary:
This paper describes an algorithm that orders the occurrence of events in a distributed system. The algorithm leverages a defined notion of logical time to place this ordering on events happening (at perhaps different times physically) within the distributed system whether this system be a multi-processed single machine or a set of spatially distributed machines.

Problems:
-When there is no central authority in a distributed system their needs to a way to ensure an ordering of events within system to avoid synchronization problems. Such an algorithm is necessary to handle possible problems related to concurrency within the system, physical clock skew, and request/response delays.

Contributions:
-The notion of partial ordering which places a concrete definition around the notion of “event a happened before event b” without regards to physical time.
-The idea of a logical clock which abstracts away the notion of a physical clock. This allows a distributed system to account for clock skew between two machines using separate clocks because they will drift with non-zero probability.
-The clock condition (which states for any events a, b: if a -> b then C(a) -The natural extension of the strong clock condition to build this system of logical clocks with the use of physical clocks.
-Proof of how much clock skew a given distributed system can tolerate.

Applications to Real Systems:
The need to provide a definition/formalization of “a happened before b” in a distributed system is extremely important. This paper presented a creative solution to the problem back in 1978. The importance of this is not lost as handling concurrency and clock skew in distributed systems is still an open area of research.

Summary:
This paper discusses the ordering problem in distributed system, and explains several key concepts about clocks and different levels of ordering. With the concept of one event happening before another, the partial ordering can be defined. With the concept of logical clock and the algorithm to synchronize it, the total ordering can be achieved. In a system with side channel (some ordering information is out of the system), the physical clocks are necessary. Since practical clocks can not be exactly synchronized, it’s crucial to know the required accuracy on synchronization in theory.

Problem:
Unlike the system in one machine, which can have a single clock source for all the components, the distributed system is composed of nodes with different clocks. Though now it’s practical to achieve very high accuracy synchronization between these clocks with specialized clocks working independently or dedicated equipment to align the clocks to one single source (e.g. CDMA base stations are synchronized with very high accuracy by GPS clock), normal distributed system prefer to use economical solutions. For a general distributed system, highly accurate clock is not the goal, making the system working correctly is. To make sure the system functionality/logic correct, we need to make the ordering of events/operations correct. Unfortunately, the general clock sources in the distributed system can’t run in the exact same rates, and it’s not so easy to synchronise them in very high accuracy. And as the paper analysis, too high accuracy is not required for some application, achieving acceptable synchronization is enough. So the key problem that this paper tries to solve is what’s the requirement on clocks in distributed system, and what’s the required accuracy for them.

Contributions:
The contributions of this paper are the definition of several key concepts, and exhaustive discussion about the requirement on clocks and ordering in distributed system for different application scenarios. The analysis about synchronizing physical clocks and the bound of inaccuracy are crucial for the practical deployment of real distributed system with total event ordering requirements. The concepts related to simplified scenarios, like partial ordering and logical clocks are also discussed.

Discussion:
This paper only covers the totally distributed algorithm to define ordering and synchronize clocks. They can guarantee the system work correctly. I guess that’s the major goal of this paper. The paper could be better if it covered some centralized algorithms, I think. And it would be interesting to discuss something related to calendar clock in distributed system. The system discussed in this paper depends on the message exchanging to synchronize clock, and the clock rate in every node of the system is not adjusted in the process. I would suggest to include the mechanism to adjust the clock rate in every node, which has the potential to reduce the message exchanging overhead. However, we need to setup a reference clock server to make the system converge in that case.

Summary:

In this paper author Leslie Lamport presents a method of total ordering events in a distributed system. We are presented with the concept of partial ordering of events in a distributed environment and a distributed algorithm is provided for synchronizing a system of logical clock that can be used for total ordering of events. The algorithm is extended to synchronize physical clocks with a theoretical proof that establishes a bound on how far out of synchrony the clocks can become.

Problem:

In a distributed system a collection of distinct processes are spatially separated and communicate with each other by exchanging messages that have a transmission delay that is non-negligible. Hence knowing the ordering of different event with in the systems is very important yet difficult since different entities have a different relative frame of reference of "time." Hence the notion of physical time alone cannot be relied upon to order events in the entire distributed system.

Contributions:

  • I like the fact the author formalized the notion of a distributed system as system in which a message is transmission delay is not negligible.
  • Defining the "happens before" relation to partially order events using a logical clock. And then totally ordering the events with the addition of timestamps
  • The introduction to a relativistic concept of time over the idea of a global absolute time by using a monotonically increasing unsynchronized counters in different processes
  • Presented an algorithm for resource allocation while ensuring totally ordering of events are respected.
  • A mathematical proof to derive the bound on how far out of synchrony clocks can become.

Applicability:

Leslie Lamport was awarded the Turing Award for his works that laid the foundation for modern distributed systems which include the concept of logical clocks, hence this paper is very much relevant today. The basic concepts introduced in this paper may have some issues with scalability since it uses a broadcasting technique to communicate with other nodes, yet the logical locks are used in modern systems like Dynamo (vector locks).

Summary: Paper delves into the issue of time and the ordering of events in a distributed system. It gives a definition of logical clocks that can be used to determine a total ordering of events. They give an algorithm that uses this concept to ensure mutual exclusion synchronization of a resource in a distributed manner. Finally, they introduce the notion of physical clocks to solve events that occur out of band that logical clocks cannot deal with.

Problem: Timing is an issue in distributed systems. In a distributed system, you have separate processes running in coordination across multiple machines (or on multiple cores). The coordination is achieved by passing messages. The issue stems from the fact that the time that it takes to send a message is not negligible. Given this fact, the issue becomes how one can impose an ordering on the system where processes and events across multiple machines can have a strong guarantee that if event 'a' occurred before event 'b', that this ordering state is preserved and respected across disparate machines.

Contribution: The authors delve into the issue of concretely defining what a “happens before” relationship actually means in a distributed system. One interesting fact is that, in order to achieve a partial ordering of events, one need not have a physical clock. The authors introduce the notion “logical time” where time is basically defined as a number; where early events have a lower number than later events. With this, the authors define what it means for an event to have a partial ordering with regards to another. It is a very intuitive definition, but it guides them to the notion that timestamps need to be attached to messages in order to preserve the “happens before” relationship. A total ordering is very similar to a partial ordering in that ties are broken by giving arbitrary priority to one process over another. Using this idea of a total ordering, the authors develop a distributed algorithm for achieving synchronized exclusive access to a shared resource. They note that the intuitive solution of have a scheduling process is not sufficient in that it is possible for a process to send a message to the scheduler asking for the resource and also send a message to another process. When the other process gets P1s message, it sends a message to the scheduler. In this scenario, it is possible for P2 to get access to the resource first if its message arrives at the scheduler first. Their distributed algorithm in a nutshell basically has all processes communicate what they are doing with respect to the shared resource to all other processes.

Finally, they develop the idea of physical clocks in order to combat the set of events that occur “out of band” that could break partial ordering and they prove how these clocks can be synchronized across machines.

Applicability: While timing in distributed systems is an issue that is “timeless”, the mechanisms described in this paper do not seem applicable to this day. At the time, I bet their algorithm would have worked well for the scales that they were dealing with. Of course, in the modern era, with thousands of machines, having all to all communication is not desirable. Furthermore, they make some assumptions that one cannot make in any distributed system; namely that every message is eventually received. As we have been learning with CAP, one can never make this assumption.

The aspects that are applicable are the definitions of ordering that the authors put forth and the in depth definition of the “happens before” relationship. Their proof for the synchronization of physical clocks may also be applicable.

Time, Clocks, and the ordering of events in a distributed system

Summary:
Leslie Lamport has introduced the concept of partial ordering of events in a distributed system. He has used the "happened before" relation to define the partial ordering. A distributed algorithm for a consistent total ordering of all the events is also discussed. Anomolous behavior due to messages external to the system and the way to handle it using physical clocks is also presented.

Problem:
The problem the author tries to solve is to provide an ordering of events within a distributed system. Since it is practically impossible to keep the physical clocks in all the computers in a distributed system to be in sync, it was very difficult to order the events based on it. Also the system doesn't have a master or coordinator which could be used by other systems to sync their their time with. Therefore the solution the author proposes is to use the concept of "logical time" instead of "physical time" to come up with an ordering of events within the system.

Contributions:
1. The introduction of the "happened before" relation is a major contribution since it was instrumental in coming up with the partial ordering of events.
2. The idea of logical clocks, where the time that an event takes place is defined by a number that is assigned to an event was a drastic change from the idea of real clocks that were used to order events untill then.
3. The total ordering of events, which uses arbitrary total ordering of processes to break ties when 2 events occur at exactly the same time and using it to solve a mutual exclusion problem was an important work.
4. The distributed algorithm that runs in each process/system separately and which is able to allocate the resource to the processes based on the total ordering of events was an amazing contribution.
5. Finding out that there are chances for anomalous behavior when some messages are passed external to the system was crucial.
6. The idea of using properly synchronized physical clocks for solving the problems of anomalous behavior and proving a theorem to show how closely the clocks can be synchronized.
7. Making use of ideas from the field of physics, especially relativity, to come up with the idea of logical clocks and partial ordering of events in a distributed system should be appreciated.

Relevance:
The idea of logical clocks and ordering of events is very fundamental in developing a distributed system which is correct and reliable. The algorithm of Lamport timestamps is the basis of vector clocks that are in wide use in many recent distributed key-value stores and other distributed systems. The value of Lamport's contribution is also evident from awarding him with the Turing award for imposing some well defined order in the chaotic behavior of distributed systems.

Summary:
This paper discusses the partial ordering defined by the “happened before” relation, and gives an distributed algorithm for extending an kind of partial ordering to another kind of arbitrary total ordering and its application in solving synchronization problems.

Problem:
1. In distributed system, sometimes it is hard to define if one event happen before another one. Even it is possible to define a partial ordering, the real problem is how to define a total ordering.
2. Using real clock to define the partial ordering is straight-forward. But real clock is not perfectly accurate and precise.
3. Sometimes, a message is sent earlier but arrives later.

Contributions:
1. The biggest contribution is that the author identifies the partial ordering of events in the distributed system and says that this kind of partial ordering can be extended to a total ordering.
2. This paper proposes a partial ordering defined by the logical clocks, instead of real clocks, since real clocks might be inaccurate. In this case, the “happened-before” relations is defined by comparing the logical clock value for each process,
3. Extend this kind of partial ordering to a somewhat arbitrary total ordering. With this kind of total ordering, the author solves one version of mutual exclusion problem: Resource granting. Also, a corresponding concrete strategy is proposed, which is a distributed algorithm with no central synchronizing processes or central storage.
4. To deal with the an anomalous behavior that early message might be received later, the local physical clock is adopted.

Applicability:
The time and clock problem are really important to the distributed system, especially when designing the consistency model. Bypassing physical clock is a good idea that avoids the inaccuracy problem of clock. The vector clock used in Amazon’s Dynamo, I think, can be countered as a variant of implementation for logical clock.

Summary:
The paper presents a distributed way for total ordering of events in a distributed system using logical clocks. It also discuss and define a way to solve anomalous behavior perceived by the user (because of external events which can be reorder in a distributed system)

Description:
In a distributed system, synchronization is necessary for proper functioning. It is crucial that different process agrees on same ordering of events. For eg. for sharing a single resource without a centralized scheduler, different process should agree on same order of request for the shared resource. The paper described the use of 'happened before' relation (which provide partial ordering) can be used to get total ordering of events. It specified two conditions for that.
First, in same process, event a occurs before b, then a-> b. Second, if event a is sending message in Process P1 and event b is receiving of that message in process P2, then a->b. These conditions can be implemented using logical clocks and time stamps. However, even with total ordering using ordinary clock conditions, anomalous behavior can be perceived by the user which can be fixed by using physical clocks.

Contributions:
1. The paper very nicely describes the problem of ordering events in a distributed system. The "happened before" relation define partial ordering of events. By implementing clock conditions using logical clocks and time stamps, we can define total ordering over all the events in a distributed system. To break ties (for concurrent events), some predefined rule is used (for eg, choose a->b if a occur in Pi and b occur in Pj and i 2. It provides a nice example to explain the use of total ordering in which a single resource is shared by many processes. However, one drawback of this algorithm was that a lot of messages needs to be exchanged between the process so that every process knows about the ordering of requests.
3. To solve anomalous behavior, the author describes the use of physical clocks. Physical clocks are continuous clocks rather than discreet (logical clocks). Different clocks in different process can run at different rate and thus drift apart from one other. Assuming certain parameters (such as minimum delay in transmission of a message), the paper provides a mathematical model on bound on clock skew.

Applicability:
The concepts described in the paper are very much applicable. It provides a good analytical behavior of using physical clocks for total ordering of events. Being able to totally order the events can be very helpful in implementing a distributed system, for eg, in Amazon dynamo, vector clocks which resembles to logical clocks are used to order the concurrent writes to a single object. However, the paper does not discuss about failures which can occur in distributed system. It would be interesting to know how this scheme can be modified to work with node failures, network partition and a network where nodes are added and removed frequently.

The paper discusses about how total ordering of events in a distributed system could be achieved by synchronizing the logical clocks. In addition, in order to avoid any difference between the ordering obtained by the algorithm and that perceived by a user, they also introduce the concept of using physical clocks for synchronization.

Contributions :

1. The ordering of the events is defined by using a “happened-before” relation denoted by “->” which is partially ordered. In order to extend this to a totally ordering, they introduced logical clocks.
2. Logical clocks use simple counters. They introduce clock conditions which are to be satisfied to obtain a logical ordering of events.
3. A logical clock achieves ordering by incrementing a counter before each event in the process. On sending a message to another process, it includes the timestamp. On receiving a message, the receiving process sets its timestamp to be greater than the max of the received value and its value.
4. They give us a good example of the use of logical clocks in synchronizing the requests of processes for the problem of mutual exclusion. The interesting thing here is that there is no master process that carries out any sort of a synchronization between multiple requests. This is a good point that it won’t be a single cause of failure for synchronization.
5. The algorithm is based on the idea that each process maintains its own request queue and follows a couple of rules for the transfer of messages for requesting and releasing a resource in order to achieve the total ordering of the events.
6. Anomalous behavior could be caused because logical clocks cannot account for information exchanged by the users outside the system . They introduce physical clocks and strong clock conditions in order to solve this.
7. For physical clocks to work properly, they place conditions that the clocks should run at the correct rate and should also be synchronized. They also use formal proofs to prove that these conditions could be satisfied.

Discussion :
1. The downside in these approaches is that it requires all the processes to take part and every process needs to send a message to every other process to synchronize a particular event .
2. It is also very difficult to handle process failure in logical clocks.

Relevance : The usage of logical clocks is very prevalent in distributed systems to capture the ordering of events. Vector clocks that extend this concept are also being widely used and we have seen its usage in Dynamo.

In a distributed systems, it is necessary to synchronize clocks between different machines since that helps in finding out which event occurred first and also helps in serialization for accessing shared objects. In this paper, Lamport tried to solve this problem by introducing a partial ordering between the events, which later on he extended it to provide consistent total ordering. He also proposes how the anomalous behavior can be solved by using physical clocks.

Contributions:
1. Proposes a new relation called "happens before" (a->b), which makes sure that an event 'a' happens before event 'b'.
2. The partial ordering of the events is implemented using "logical clocks", which is simply a counter for each process, which updates when an event happens in the process. this implies that c(a) b.
3. The counter also gets updated when the process receives or sends a message. ci(a)b (here a process i sends message to 'b').
4. This can be used to represent the partial order of events, when there is a communication between the processes. but when there is no synchronization, it cannot tell for sure whether the events happened concurrently or ordering of events.
5. To enable total ordering, paper explains with a resource sharing problem, where all the process participating in the action, should ack the process which is holding the resource so that the owner of resource is sure that all process have timestamp lesser or equal to the process owning the resource.
6. But if somehow externally some request is made which is unknown to the environment, then it cannot find which request happened first. To solve this anomaly, this paper suggests about physical clocks and a upper bound considering the transmission time plus clock skew that can be tolerated.

Drawbacks:
1. The distributed algorithm doesn't handle failures and it doesn't tell what happens if the message is lost or some process doesn't ack back.

But overall, the concept proposed by Lamport makes a complex problem like clock synchronization to look very simple.

Summary: This paper gives an algorithm to totally order all the events in a distributed system, which is required for some distributed applications.

Problem: In a distributed system, each node has its own clock, and thus the order inside a single node is well-defined. But certain applications may need a total ordering of all events in order to work correctly. This is not easily achievable because we need to take clock skew, communication delay, and concurrent events into account.

Contribution:
1. Established the concept of a total ordering of events in a distributed system. It is important to realize that a simple global physical clock approach is not sufficient and can cause many hidden problems.

2. Gave a formal and practical definition of partial order of events. In the definition they require the order of events on a same machine must be preserved, and that whenever there is a message from A to B, the event that A sending the message must precede the event that B receiving the message.

3. Gave an algorithm that can compute the total order in respect to the partial order. In this algorithm each machine maintains a local counter. When a local event occurred, the event is associated with the current count as its timestamp and the counter is increased. When it is going to send a message, it encloses the current counter in the message as the sending timestamp. When the other sever receives the message, it must increase its local counter to be no less than the sending timestamp. Then the global total order of all events is determined by the natural order of their associated local timestamp.

4. Showed that the algorithm can be applied to practical problems. This paper introduced the problem that many machines competing for a single resource where the resource should be given to the first machine that requested it. They used the previously defined algorithm to solve this problem. In contrast to a centralized algorithm, the proposed distributed algorithm will work independent of physical clocks (and hence can tolerant clock skew) and communication delays.

5. Addressed the issue caused by anomalous nodes. Actually I don't quite understand this part.

Applicability: This algorithm can work correctly. In the example application they introduced however, a potential issue is that each node need to broadcast its message to all other servers, and hence will flood the network traffic.

Summary: The paper describes arbitrary total ordering algorithm that can solve time synchronization problems for scheduling within distributed systems (or with multiple threads running on a single system). Although there are upper bounds on allowable drift, a distributed system can tolerate partial ordering of events within limits. Instead of focusing on trying to determine exactly when something happened (or what might happen), events are ordered based on when machines receive resource requests (which adds a degree of 'fairness').

Problem: Events can happen in an unpredictable order, and in a distributed system it can at times be impossible to determine what happened first. Also, even when physical clocks are used they are often inaccurate and time is not always precise.

Contributions:
1-partial ordering: let's rethink what 'time' means. In a system we don't always need to know the exact physical time something happened in order to schedule or share resources. A system that instead just creates a relative partial ordering can provide the mechanism we require for scheduling, allowing for a system that isn't dependent on a traditional physical clock.

2-logical clocks: they don't have a traditional timing mechanism and aren't meant to be tied to physical time. Instead, the clock uses a counter which always increases ('ticks') with new events, and as each event occurs, a number is assigned to the event. This number gives us a relativistic sense of when the event occurred. Correctness is based on the clock condition which tells us that if a is less than b, then a happened before b. The clock condition also states if a process sends an event C_i(a), then the process receiving the event, C_j(b) must be a later even with a higher number; C_i(a) is less than C_j(b).

3-ordering events totally: this can be done by breaking ties and ordering events arbitrarily.

4-resource sharing algorithm for total ordering: three conditions must be met to share resources: 1-a process must be released before it can be re-used, 2-requests are granted in the order they are made, 3-every request will eventually be granted. For simplicity, it is assumed all messages are eventually received and that any process can send a message to any other process. A message is sent to request a resource, which goes to a resource queue and as resources are released, they are reassigned. The strong clock condition helps resolve anomalous conditions where a process receiving a message somehow gets a lower relative time number than the process sending it.

5- Clock synchronizing algorithm (for physical clocks): dC_i(t)/dt represents the continuous rate at which a clock runs. Two clocks can never run at the same rate and they will always slowly drift further apart. Clocks must be reset (always) forward and (never) back to avoid anomalous behavior.


Applicability:
There was one drawback: we can't tell the difference between failed events and others that are just waiting too long to respond. However, the author states the full solution to the problem was beyond the scope of the paper.

Problem and Summary

This paper discusses on how one can order events in a distributed system using the "happened-before" relation. This relation can only be a partial order since one can order only events that are either executing in a single process or are based on sending and receiving a message. The events can be totally ordered by arbitrarily breaking the ties between concurrent events. The paper also shows that one can build a simple synchronization solution using the total ordering produced using the logical clocks. It also shows how anomalous behavior can happen when there are messages external to the system and suggests an idea using physical clocks to solve this problem. It also derives results related to physical clock drift and synchronization.

Contributions

1. Introduced the notion of "happened-before" ordering and ported the the well established ideas of mathematical relations and set theory to modelling of events in a distributed system.
2. Presented an algorithm to implement logical clocks based on the two "Clock conditions". Briefly, every process must tick its clock to transition from one internal state to the another. Also, when a process receives a message, it sets its clock to maximum between its local logical time and one greater than the timestamp in the message.
3. Proposed a technique to totally order events in a system (arbitrary tie breaking for concurrent events).
4. Showed that one can build a simple synchronization solution using the total ordering produced.
5. Showed how anomalous behavior can still be a problem when there are external messages. To solve this, the paper introduces synchronization using physical clocks. It also derives bounds on how well the clocks can be synchronized.

Relevance

I believe logical clocks were one of the major contributions of Lamport among other things. Today, logical clocks are widely used in systems to order events. Vector clocks which are used in systems like Dynamo look like an extension to the base logical clock concept.

Summary:
The paper describes the algorithm for ordering of events in the distributed system. The author discusses partial ordering of events in a process, extends it to total ordering and apply to synchronization problems. The author applies synchronized local physical clocks to prevent anomalous behavior in the system.

Problem:
The main problem in hand is how to keep track of events in a distributed system. Different machines may have different clock time in a distributed system. Hence, it is difficult to keep track of order of events.
The paper introduces the concept of "happened before" relationship and argues that in most cases we don't need real clocks.

Contribution:
1. Partial Ordering by logical clocks:
a) In a process, a clock is incremented between consecutive events.
b) A process sends a message containing timestamp T_m, the receiving process sets its clock >= max (present_timestamp, T_m+1).
2. Total Ordering to break ties between events of different processes that appear concurrent.
3. Use of physical clocks to correct the anomalous behaviour due to external events affecting the order of events in the system.
3. Synchronizing physical clocks and an upper bound of on delay tolerance due to clock skew or transmission delay.

Discussion/Applicability:
- Paper does not take into consideration any process or connection failure. It would be interesting to know how is it handled.
- Synchronization problem is a well known problem in many fields of computer science. I feel, the concept of logical locks must have lead to inception of many more ideas to target the problem.
- Vector locks are an extension to logical locks and have been extensively used in systems like Dynamo.

Summary:
In this paper, author have developed an approach based on logical clock and timestamp for achieving total ordering of events in processes in a distributed system. The author analyzes the ordering of events in the context of "happens-before" relation without using physical clocks. He discussed the possible problems with partial ordering and provides an algorithm to extend it to a consistent total ordering.

Problem:
In a distributed system, many times it is required that all the participant nodes see same ordering of the inputs. Each node has a physical clock but is not synchronized to any other clock in the system. Given this, the problem is to decide the ordering of the events across the distributed system or decide which event precedes another in distributed system.

Contributions:
-First, the paper conveys an important idea which is useful in understanding any distributed system, that is, in a distributed system, the order in which events occur is only a partial ordering.

-The paper shows how to use timestamps to provide a total ordering of events that is consistent with the causal order. Author first introduces “happening before” relation which defines a deterministic partial ordering of the events in a distributed system. Then with the help of a collection of logical clocks which satisfy Clock Condition, together with timestamp extends the partial ordering to a somewhat arbitrary total ordering.

-Third, it explains (through an example on solving a mutual exclusion problem) why knowing the total order of events is useful for us in implementing a distributed system.

Flaw:
-One potential problem that might come up with these methods is when the system becomes especially large. As more computers are connected, the number of closely occurring events will increase and the overhead with keeping them all sorted will increase and as will the communication needed to make sure timing stays consistent.
-And secondly, the assumption that any processes can receive messages with timestamps from all other processes. The assumption is not easy to be true since the failure of machines and disconnection of network links may occur frequently in real world.

Applicability:
The ideas in this paper could be used in any system which require ordering of events without a globally available time. Though the approach in the paper is not directly implementable with current scale of distributed systems, but have influenced lots of later work in the area of clock and synchronization problems, which have been used in the current generation distributed systems.

Summary:
The paper explores happened-before relationships and uses it to define a partial ordering of events based on logical clocks. It then use the partial ordering to develop a distributed algorithm for arriving at a total order for the events of the system. It then goes on to explore physical clocks with the same foundation and derives bounds on how far away the physical clocks of distributed systems can get out of synchrony.
Problem:
Ordering of events occurring in a distributed system is the problem being studied. When you have events being generated by a single process, the ordering of these events is trivial. However, when you have multiple processes generating events at random intervals, arriving at a global ordering for the events generated is non-trivial. Moreover, the physical clocks in different systems will never run at exactly the same rate and so they tend to deviate from each other more and more over time, thus exacerbating the problem.
Contributions:
Lamport starts off by formalizing what it means for clock to be correct, i.e., the clock condition which states that for any events a, b: if a->b, then C . Here, ‘C_i’ is the logical clock’s value at process ‘P_i’ for an event ‘a’. He then presents two auxiliary conditions(C1 and C2 in paper) which help satisfy the clock condition. These auxiliary conditions are then satisfied with using logical clocks by i) having a process increment the clock between any two successive events and ii) having the receiver of a message increment the local clock to a value greater than the received message’s timestamp. These mechanisms which help achieve partial ordering in the individual processes are then used to achieve total ordering across the system (=>). Note that the total ordering is not unique.
The paper presents an illustration of this distributed algorithm to solve the problem of mutual exclusion when multiple processes are trying to acquire a resource.
The paper tackles anomalous behavior, like the presence of external events, by defining a stronger clock condition and using physical clocks. Since two physical clocks never run at exactly the same rate, the paper derives the bounds on how much out of sync the clocks can get before causing issues for ordering.
Issues:
Not really an issue because the paper does say it is out of its scope to talk about failures but I was interested in knowing how the system can handle it. Types of failures can be processes failing and the coming back live again or network failures/partitions because of which the clocks may get more divergent.
Applicability:
The paper presents good insight on how to determine a global ordering of events in a distributed system. The concept of logical clocks, I think is very useful and widely applied today. Dynamo from Amazon is an example that comes to mind immediately.

Summary: The paper describes algorithms to totally order events that occur in a distributed system, solving a synchronization problem, and provides an upper bound on clock drift.

Problem:
It is difficult to ensure that a set of distributed system's physical clocks are perfectly synchronous and this causes problems when one tries to infer the order of events.

Contributions:

- Formalizes happens-before relationships.
- Provably correct distributed algorithms as opposed to many heuristic approaches.
- Achieves synchronization without using physical clocks.
- Provided a mechanism to control clock drift in physical clocks.
- Applies developed mechanism to solve a distributed synchronization problem in a fair manner to all clients.
- Paper provides a nice analogy to relativity.

Limitations:

- Every process communicates with every other process in the solution to the synchronization problem. Will not scale well for larger systems.
- Synchronization algorithm is fragile and requires all processes to operate correctly. Failure of a single node can make the system stall.
- An unfair comparison, but it doesn't address issues that are addressed by vector clocks.

Applications:
This paper lays the theoretical foundation for modern mechanisms in distributed systems like vector clocks.

Summary:
This paper defines partial ordering of events in a distributed system and proposes a distributed algorithm for acheiveing a total ordering of the events in a distributed system of logical clocks by extending partial ordering.

Problem:
Correctness of any program depends on the order in which its instructions are executed. Ordering of events becomes challenging in distributed systems due to clocks not synchronizing with one another, underlying network latency which would cause the messages to arrive in different order. Hence achieving the same ordering we would get when we execute the operations on a single machine is the end goal.

Contributions:
1. Identified that partial ordering is seen in any distributed system. This was a good find, since many people back in those days were not aware that two systems that are spatially apart could not have their clocks synchronized.
2. Introduced logical clocks into the system. Hence, we need not deal with any inaccuracy that would arise from physical clocks.
3. Came up with a definition of partial ordering for events that belong to the same process and events that belong to spatially separated processes.
4. Was able to use the definitions of partial ordering, to build logical clocks into the system based on counter mechanism.
5. Partial ordering and Logical clocks were useful from the perspective of a single process since for any message it sends, the response would be received at a later time and all the events local to a process was controlled by the local counter or logical clock.
6. Partial ordering was extended to achieve total ordering of events in the system. Through this, the author also solves a mutual exclusion problem which involves granting a resource to a process from a set of processes requesting the resource.
7. Identified limitations in the total ordering algorithm when the events are not fully observable from within the system. [This might not satisfy the condition that the processes must be guaranteed the resource in the order in which they requested it since now, there is no way to know about the messages which are external to the system].
8. Proposed using physical clocks inorder to solve the problem and derived a formulation that provides an upper bound of how far the clocks can be apart.

Limitations:
1. Breaking the ties arbitrarily when Ci(a) = Ci(b) might cause correctness issues. A good example would be if we tried to do this across branches of a source control system like git. When commit timestamps are equal, breaking ties with one of the branches being preferred than other might get some updates lost.

Applicability/Conclusion.:
Ordering of events is crucial in distributed systems (ex -serializing updates to a db) and this paper is a great milestone back in 1978 in coming up with some ground theories for achieving this. Some systems like FaceBook's Cassandra assume that clocks are synchronized always.

Summary :
The paper formulates an approach to synchronize events across multiple systems in a distributed systems environment. This is referred to as total ordering of events. The paper discusses an anomaly in this approach and solves it using physical clocks.

Problem :
In a distributed system with decentralized setting, a lot of events might be going on across multiple systems. It is very important to order these by time to achieve correctness and conflict resolution. Using physical clocks may not help because clocks may be skewed across systems. This paper solves this problem by introducing logical clocks.

Contributions :
a. The author takes the common problem of resource allocation and tries to achieve total ordering of requests for a single resource across multiple systems.
b. A completely decentralized mechanism where any process can send out requests to all other processes.
c. The paper defines partial ordering by using logical clocks maintained by each process. When an event happens within a process, the clock’s tick is moved forward (incremented), and when an event is received from an external process, the clock’s tick is incremented more than the incoming timestamp, thereby reflecting the causality.
d. In the setting of partial ordering, there could be some events which appear to be concurrently happening and don’t affect each other causally. This arouses difficulty in ordering these events when globally synchronizing. To solve this, the paper introduces total ordering of events.
e. Delay in communication between two events might not be detected by the system. This could result in a wrong ordering of events and this problem will not be solved by total ordering. To solve this, the author uses physical clocks.
f. When using physical clocks, the paper defines an upper bound on the delay that can be tolerated by the system which can be due to transmission delay or clock skew.

It would be interesting to know how failures can be handled in such a situation.

Applicability :
Lamport describes a very neat idea in this paper, signs of which we can see in every distributed system these days. In fact, the vector clocks we read about in DynamoDB seem to be derived from this idea. Keeping in mind that this was developed in 1978, it was pretty revolutionary for that time. No wonder he was awarded the Turing Award for all of his contributions.

Summary: ordering of events are an important part of distributed computing. This can be problematic if solely relying on physical clocks, or as the number of processes increases.

Problem: many algorithms in distributed computing have a notion of ordering events, and knowing such a thing is paramount for correctly behaving systems. Physical clocks do not necessarily keep track of time accurately, which means that there may be discrepancies in knowledge of physical time between multiple systems (even on the same physical machine). This can cause problems with ordering of events: if neither process has the correct time, how do you know which event occurred first? One approach would be to have a centralized component that is responsible for global synchronization -- this, however, may not be considered a true distributed system and may cause problems of fault tolerance.

Contributions: although the authors did not create and evaluate an artifact, they described some key concepts in ordering and timing of distributed systems:

  • ordering: this paper formalizes ordering and what it means for an event to occur before another, and what the difference is between total and partial ordering.
  • timing is tricky: it is not easy to have a synchronized system. Building a system that relies on synchronized physical clocks can have a lot of problems in face of clock drift. They explain that these are real problems and should be addressed.
  • ordering in a distributed context can be achieved by associating messages with the timestamps from the sending machines. Furthermore, there can be an ordering of all the machines.
  • in the scenarios described, all the systems are working together, and they are all trusted.
  • Applicability: one of the main contributions of this paper is the formalization of ordering and clocks. This is useful to system designers, so that they know what they're talking about. Timing is still a big problem, however. Although this paper proposes a scheme to try to order all the events in the system, there are some aspects of it that make it inapplicable for certain systems. Their scheme can tolerate a certain amount of clock drift, but after a certain point, it may cease to work. Also, they omit any issues of fault tolerance, and assume all the nodes are trusted and working together. This may not always be the case. What would happen in a system with a few nodes with radically different physical clocks or understanding of the total ordering?

    Summary:

    The paper describes a distributed algorithm for ordering of messages/events within and between processes in a distributed system environment by using logical clocks for time synchronization. They also discuss about anomalous behavior that can be observed because of external events and suggest the usage of physical clocks for solving this issue.

    Problem:

    • It is very difficult to predict exact ordering of events/messages in a distributed system due to presence of concurrency.
    • Each process is unaware of the commands that are being executed in other processes; despite that synchronization must be achieved.

    The paper presents an algorithm which considers "happened before" ordering and an arbitrary total ordering of the processes to solve these problems.

    Contributions:

    • Logical clocks can handle synchronization of a distributed environment to a large extent and can be implemented successfully.
    • A process can execute a command using a resource at some timestamp, if it has learned about activities of all other processes which occurred before that timestamp - this directly implies that a process can obtain a resource only if no other process had requested for the same resource before itself.
    • The above mentioned procedure will not work when external events occur in the system and the system is unaware of the those, which is an anomalous behavior. This problem is solved by using physical clocks to provide strong clock synchronization.
    • Two different physical clocks run on different rates causing a drift; the paper proposes an algorithm such that this drift does not exceed a particular threshold - this ensures external messages take greater time than the clock synchronization thereby solving the anomalous behavior problem.

    Limitation:

    The only problem which is of concern is huge latency involved in a process waiting for all other processes to complete command execution before a particular timestamp - in large scale systems, this could lead to starvation.

    Applicability:

    Logical clocks are highly used in present day distributed systems and I believe vector clock concept is kind of an extension from these basic logical clock concepts. Vector clocks have been implemented and deployed in many systems like Amazon Dynamo. The concept of tick-line made me related to if TCP timeline event representations.

    Summary
    The author has analysed the temporal ordering of events in a distributed system. Synchronising clocks in a distributed system that might be spacially distant and communicate only via internet messages is a very difficult problem. He devised the concept of logical clock and partial ordering of events. Temporal ordering of two events only matters when they communicate to each other. He visualized logical clock as an event driven non-decreasing counter which always increase its value or discard old timestamped messages. Later he showed the necessities of physical clock to handle anamolous event due to external communication between the processes.

    Problem
    Ordering the events in a distributed system without the presence of a centeral authority - is the main problem that Lamport tries to address.

    Contribution

    1. Brought the concept of relativistic time into managing distributed clocks in a distributed system. A value in the clock matters only when it intercepts with other clock through some network messages.
    2. Ordering of events is a partial order, rather a total ordering. To make it complete ordering for usability purposes, he proposed breaking the ties arbitrarily.
    3. Logical Clock: The concept of "logical clock" -- an even driven counter was first time proposed by Lamport. Later he showed alone logical clock is not enough (otherwise he has to take into account all possible event/communicatoin in the universe -- fall back relativistic complicacies) and introduced the need to nearly synced and correct clocks to get around the anamolous situations.
    4. Distribtued queue for resource management without intervention of any central authority.

    Drawback

  • In case of faliure there is no way for other process to know the state of that process (without physical clock).
  • Logical clocks does not take into account external events/communications.
  • C_i(a) > C_j(b) does not give any information about temporal ordering of a and b. While Vector clock can solve this issue at the cost of higher payload.
  • The systems are assumed to be 'trustworthy' for the resource allocation algorithm
  • Applicability
    Lamport was awarded with Turing Award for his contribution in process synchronisation in distributed system. Logical clock is still used to synchornise events efficiently on top of physical clocks in distribtued systems.

    Summary
    This paper looks to identify a solution to the problem of temporal ordering of events in a distributed system. The author first identifies a means to achieving a partial ordering, by means of tracking ‘happens before’ relationships between events in the system, and goes on to extend this to a total ordering in logical time, using which he demonstrates how to solve a simple synchronization problem. The author also identifies certain problems with this simple causality based ordering, with events outside of this relationship affecting the causality without being tracked by the ‘happens before’ ordering. To solve this, Lamport proposes the use of synchronized physical time based ordering and demonstrates how this can be used to produce an ordering of events that matches the one perceived by system users.

    Description of Solution

    Partial Ordering - Each process has a logical clock that tracks logical time as a sequence of monotonically increasing ticks. Events happening within the same process push forward the clock by a tick. Messages received from the outside world that carry a tick value greater than the current tick would cause the clock to set itself to a value higher than the received tick, thereby preserving the causality of message being received only after it is sent.
    Total Ordering - Partial ordering leaves some events that are not directly causally related ambiguous in terms of their ordering making it hard to come up with a total ordering of events in the system, as they appear concurrent. A simple solution is proposed where an ordering is enforced on the processes in the system, and this ordering is used to break ties between events that appear concurrent, to come up with a total ordering.

    Physical Clocks - Total ordering through logical clocks is susceptible to external events not being captured in the causality relationship, yet influencing the ordering of events in the system. The solution to avoid this anomalous behavior is to make use of a stronger ordering based on synchronized physical clocks. Lamport goes on to show that by taking into account inter process transmission latency and relative clock skew rates, it is possible to incorporate physical clock synchronization on top of logical clocks to achieve a total ordering that matches real ordering.

    Flaws,Applicability and Discussion
    As we read last week, Amazon’s Dynamo makes use of Vector clocks that build on this to achieve ordering of concurrent writes, as do many other popular systems in use today. What I find interesting, which has not been discussed in today’s reading was that these systems are tolerant to failure of nodes and the networks including scenarios of partition, whereas today’s paper indicates that it’s beyond the scope to discuss in detail. As a discussion, I’d be interested in understanding how such distributed clock algorithms deal with the network getting split in half (or more pieces), and how a node that is down, comes back up and resynchronizes its ordering with the rest of the system.

    Summary:
    Leslie Lamport in his short paper introduces some groundbreaking concepts for the study of distributed systems. As laid out in the title, he introduces the idea of ordering events in a distributed system so as to synchronize processes. Overall it is a very theoretical paper with few underlying assumptions to arrive at its solution.

    Problems:
    The main problem this paper attempts to solve is how to handle clock drift in a distributed system. Computers rely on crystal clocks, which only have some degree of accuracy and can eventually skew away from other computers. Time is never synchronized exactly, and thus when trying to coordinate actions in a distributed way it is hard if not impossible to do with physical clocks.

    Contributions:
    Several short, but major contributions were introduced in this paper.
    • Rethinking clocks as a means to ordering events and those events are linked together by causality.
    • Ordering these events using three conditions that are quite simply defined. They define a relation between two events as “happened before” such that there is an inherent partial ordering to events.
    • Creating logical clocks, which implement these conditions and can be used without the need for physical clocks.
    • An algorithm for synchronizing a distributed state machine. Unfortunately this algorithm isn’t fault tolerant in that if one process fails to respond, then the distributed state machine cannot progress.
    • Applying the ideas of logical clocks back to physical clocks, so that clock skew can be handled without issue by following the properties of logical clocks such as not rolling back a physical clock to update it.

    Applicability:
    The ideas discussed in this paper could be used directly in a distributed system to synchronize a distributed state machine in a decentralized way. Logical clocks gave rise to vector clocks, which are used in modern systems. We saw an example of these in the Dynamo paper for synchronizing writes to the key value store. They are used in other distributed systems where physical clock skew is an issue and partial ordering is acceptable.

    Summary:
    Lamport discusses what it means for events to be ordered in a distributed system, and mechanisms which can allow synchronization in such a system.

    Problem:
    People tend to think of the operation of a system as a stream of events in time. It can even be useful, for example to have a notion of "fairness" in a system of shared resources, to be able to order events. However, it is not trivial to ensure ordering of events between distributed components/processes.

    Solution:
    1) First proposes idea of logical clocks at each process to provide partial ordering with three rules: (i) Logical clock must tick between events in the same process. (ii) Timestamp is sent with any messages to other processes. (iii) Logical clock must advance to at least later than the timestamp of any received message, so that all future events occur "after" the receipt of said message.

    2) Order the processes themselves to break ties and impose a total ordering.

    3) Using well-synchronized physical clocks to reduce chance of anomalous behavior.

    4) Presents algorithm for keeping physical clocks within some error of each other, based on timestamps in messages and the delays expected when sending messages.

    Contributions:
    1) Showed that adherence to physical time is not necessarily required to order events. In fact, later methods like vector clocks also use logical clocks to understand the partial ordering of events (e.g. Dynamo does this with updates).

    2) Showed rigorously how well physical clocks could be synchronized, and in what circumstances they would need to be used (when there are external events leading to causal relationships between true system events that may not then be ordered correctly).

    Applicability:
    Again, logical clocks are used in distributed systems which need synchronization or to detect ordering of events today. At the time, this work probably pointed to work in the use of logical clocks in presence of partial failures.

    Overview

    This paper discusses the general question of how to order events in a distributed system. It describes how a system can be developed to decide on causality.

    Problem

    In a distributed system the delay from the network is non-neglible. Because of this, ordering of events becomes difficult - when does on event occur "before" another? Even if the system is able to say that one event "happened before" another, this only gives a partial ordering. What they need is a total ordering, which is not provided by this relationship.

    Contributions

    This paper covers many important parts for developing a total ordering of a system. First, Lamport describes the "happened before" relationship in depth. This occurs when two events occur in the same process (one is after another) or one sends a message to another (the sending "happens before" the reception).

    He then describes a logical clock into the system. This clock just gives numbers which relate the ordering of different events, but does not necessarily relate directly to the "actual" time of the system (they are essentially counters). He proposes an important requirement for his system: The clock can only ever move forward in time. This gives rise to two simple rules for processing, which guarantee that for any two events, if a "happens before" b, then Time(a)

    Given this partial ordering, he develops a total ordering based on the partial ordering. This is an important contribution, since with the total ordering, any algorithm can be developed which satisfies the true ordering of the processes. He describes one such algorithm, where multiple processes all require one central resource, and the resource can only be accessed by one at a time (mutual exclusion). He describes the system as a distributed state machine.

    Finally, he develops a method for synchronization of real clocks. This is driven by the need for fixing anomolous behavior, where what the user percieves to be the state of the system needs to be reflected in the system itself. He describes an inequality involving the clock rate, maximum clock offsets, and minimum transmission time which will guarantee that anomalous behavior cannot happen. Finally, a theorem is proved which bounds the time it will take to synchronize clocks. This is the most important contribution - a way to synchronize clocks, with a guarantee that no errors will happen, and a bound on the time to synchronize.

    Discussion

    This is an interesting paper because it is entirely theoretical. It hardly feels like a regular paper that is read in a systems course. The results are undoubtedly important, however, because any distributed system must have some sort of ordering between events. It is obvious that these algorithms could be applied to a real world system. It is also interesting to me that based on his inequality, it is easier to prevent anomalous behavior in a system with very large message times than small message times. This of course begs the question that perhaps, as we get shorter and shorter latencies, if this work is becoming more relevant again.

    Summary:
    The paper defines a partial ordering of events and provides a distributed algorithm to synchronize a system of logical clocks which can be used to totally order the events.

    Problem:
    In a distributed system, it's sometimes impossible to decide which of two events occurred first and the relation "happened before" is only a partial ordering of the events in the system.

    Contributions:
    (1) Give the definition of partial ordering. The "happened before" relationship is defined without using physical clock. The relation '->' on the set of events of a system satisfies: (1) If a and b are events in the same process and a comes before b, then a->b. (2) If a send message to b in another process, then a->b (3) If a->b and b->c then a->c.
    (2) Use logical clocks to define the "happened before" relationship. Logical clocks should obey the Clock Condition: For any events a, b: if a->b then C(a) = present value and > Tm.
    (3) Use a system of clocks satisfying the Clock Condition to place a total ordering on the set of all system events. In a multi-process system, each process independently simulates the execution of the State Machine and synchronization is achieved by ordering commands according to timestamps.
    (4) Extend the schema by adding physical clocks and prove how closely the clocks can be synchronized. This helps to solve the anomalous behavior when using logical locks.

    Applicability:
    As the author states in the paper, the algorithm has several limitations. The algorithm requires the active participation of all the processes. The amount of message can be very huge and lead to problems such as network congestion. Besides, it's difficult to handle the failure. The latency can also be very high because a process should have the replies messages from all other processes before it can continue the job. But back to 1978, it was pioneering and the paper has good combination of theory and algorithm.

    Post a comment