CS 739 Reviews - Fall 2014: The Byzantine Generals Problem

Summary:
In yet another classic paper by Leslie Lamport, the authors discuss the problem achieving consensus in a distributed system under the most extreme failure model for components: not only that they may fail to communicate, but that they may communicate incorrect or inconsistent information at will. They narrate the problem in terms of Byzantine generals, leading to the term "Byzantine" faults for this failure model. Both impossibility results and algorithms with correctness proofs provide rigorous bounds on what is possible and not under various scenarios for the capabilities of the failing and trustworthy nodes.

Problem:
Under this extreme failure model of arbitrary and potentially malicious failures, achieving reliable consensus becomes very challenging. The authors show that such assumptions are in fact necessary to correctly model faults in various systems, ranging from circuits to distributed systems. The formal statement of the problem defines the notions of consensus, reliability, Byzantine faults, etc. The authors further consider scenarios such as whether or not the failing components can choose to impersonate each other (i.e. whether the messages are signed or not) and various communication connectivity conditions.

Contributions:
The authors begin with the simplest version of the problem, with just 3 generals that need to reach consensus. The theoretical result here shows that at most one-third of the generals must be traitors to be able to guarantee reliable consensus. This result is then generalized to an arbitrary number of generals. This bound is tightened by providing an expensive but theoretically correct algorithm that solves the problem for strictly less than one-third of the generals being traitors. They then extend the problem to the case where the identity of the sender can be unambiguously and correctly identified using signatures. In this model, it is always possible to reach consensus for any number of traitors. This result is again backed up with an expensive algorithm and correctness proof.

Confusion:
Given the extremely high cost of the algorithms presented here, it is unclear when these algorithms can ever be practically applicable. Further, it is unclear how we can use the OM(m) algorithms not knowing the value of m, apart from just assuming that m = 1/3 * n - 1, say.

Learning:
I learnt the importance of addressing this failure model, and also how difficult it is to do so. Naturally, the question is whether the algorithms can be improved or if there are reasonable relaxations of the failure models (i.e. traitors' capabilities) or of the consistency conditions (say letting a majority of the loyal generals take the correct action) that make the problem practically tractable.

Posted by: Navneet Potti | October 14, 2014 08:03 AM

summary:
- the paper formulate the situation that we have faulty nodes and provide algorithm to reach consensus.

problem:
- we have a commanding general and n-1 lieutenants (loyal officers and traitors, corresponding to working nodes and faulty nodes) that are communicating. we want that
1) all the loyal lieutenants obey the same order
2) and if the general is loyal, all the loyal lieutenants obey his orders.

contributions:
- they first show that we can't solve the problem with 3 generals and 1 traitor.
- they also show that even if we look for approximate agreement, it is still impossible to solve.
- they prove that when the messages are oral (unsigned), we need at least 3m+1 generals in presence of m traitors.
- then they provide and algorithm (OM) to solves the problem and prove its correctness.
- they also provide an algorithm in case that messages are signed and unforgeable (and show that it is easier to reach consensus in this case)
- then they translate their formulation to a distributed system setting (for example traitors translate to faulty nodes or faulty connections, or how they can reach the assumptions that they made for their solutions)

learned:
the way that they formulated the problem and then proved their solution was very interesting. the way that they proved their solution with induction was new to me (probably because I am not really familiar with the literature in this field).

Posted by: Alireza Fotuhi | October 14, 2014 08:03 AM

Problem:
- The paper wants to solve a fault tolerant distributed system problem, which is also modeled as the Byzantine Generals problem. The goal is to execute the same plan on action with faulty nodes or traitorous generals.

Summary:
- Impossibility results – no solution exists if there are m traitors with less than 3m+1 generals.
- Oral messages works if less than 1/3 generals are traitors by using majority voting and induction.
- Signed messages works if there are at most m traitors by detecting same signatures on different orders.

Contributions:
- The paper gives a good visual example of why there is no solution with >= 1/3 traitors. One of the general can see that a traitor exists, but is unable to determine the traitor. This is a good model of how traitors works in the Generals Problem, and is easy to understand because it models many television shows where there is an evil clone.
- The paper shows that oral message works with at least 2/3 loyal generals. This informs readers of the number of reliable nodes they need to sustain m faulty nodes. Furthermore, they can use the probability of failure and size of the system to determine reliability of the entire system.
- Signed messages works if there are at most m traitors. I think this could be boiled down to working if there are at least 2 loyal generals by detecting all traitorous generals who have the same signature on different orders.

Confusing:
- When the paper first mentioned 3 generals and 1 traitor, it was not certain whether the traitor was included as a general. It is now clear that the traitor is included for a total of 3 people.
- The Algorithm SM(m) and its proof. What does it obey the order choice(Vi) and execute {“attack”, “retreat”}?

Learned:
- Synchronized clocks can be implemented with if there is lower and upper bound transmission delay. I was under the impression that synchronized clocks are very difficult to implement in decentralized algorithms, and people gave up on attempting to do in the 1980s.

Posted by: Kai Zhao | October 14, 2014 07:19 AM

Summary :
This paper models the problem of a system with malfunctioning components which send out conflicting information to the rest of the system , into a Byzantine Generals Problem where the goal is for loyal generals to reach a common good strategy in the presence of traitors trying to mislead them .

Problem :
The issue the author is trying to address is that of coping with failure in a system where these failed nodes in the system send conflicting information to the rest of the system and how the healthy and functioning nodes reach upon an agreement . Failures may be due to non malicious reasons such as bugs or malicious reasons such as the system has been compromised by a foreign entity .

Contributions:
1 . The author mathematically formulates the Byzantine Generals problem and proves that no more than m traitors can be tolerated by a system with at least 3m + 1 healthy nodes
2 . Oral Messages - By using the majority value from the set of received messages from every other node , this algorithm ensures that the healthy nodes reach a consensus and make a common and correct decision
3.Signed Messages - Allows the system to deal with a maximum of m failures irrespective of the number of generals in the system by using signed messages ( similar to cryptographic keys ) by ensuring that the traitors cannot lie about the information they received (due to the introduction of the signature)

Thoughts:
How can the signed messages solution be used if we do not know the maximum number of traitors in the system ?

The paper shows the classic tradeoff between achieving reliability at the cost of performance . It does a great job of mathematically formulating this distributed system problem but its solution is impractical due to the excessive traffic it generates and also the time involved in coming to an consensus . Another important goal apart from "correct execution " would be to isolate faults after detecting faulty nodes ,

Posted by: Arkodeb Dasgupta | October 14, 2014 07:14 AM

Summary
The paper targets distributed system node failure caused due to abnormal behavior. Here the paper provides with algorithms such that nodes take the correct decision even in the presence of faulty machines.

Problems to solve
Device an algorithm where all nodes or generals arrive at the same consensus and that a traitor should not interject and affect the decision to be taken by loyal members.
The algorithm to be devised should be reliable in nature.

Contributions
Devised an algorithm using unsigned message for the nodes to come to a correct consensus even in the presence of traitors or faulty nodes. The assumption for which is that the number of traitors must be less than 1/3 of the number of generals.
Also provides with algorithm that utilizes signed messages. Here the generals sign their messages which cannot be forged by others. Even if the general itself is the traitor its subordinates can verify the authenticity of the general’s signature.
To maintain the reliability of signed message algorithm the same message should not be signed twice as the receiving process would have already seen the signature and could forge it.

Confusing part
Applicability in real world systems? Too much traffic due to broadcast of messages.

What I learnt
Need to counter byzantine faults and the difficulties faced when trying to solve the issue.
Trade-off between reliability and cost of algorithm.

Posted by: Shiva Prashant Chada | October 14, 2014 04:17 AM

Summary:
The goal of the paper is to build reliable computer systems capable of handling malfunctioning components that give conflicting information to different parts of the system. The authors model this as the Byzantine Generals problem. They prove that for unsigned messages, more than 2/3rd of the generals need to be loyal for a solution to exist and for signed messages, any number of traitors can be tolerated.

Problems they are trying to solve:
Reliable computer systems need to be able to tolerate not just the typical failures but also need to perform in the presence of malicious components that actively try to bring the system down. Doing this is not trivial as can be seen in the 3-generals problem where it is not possible to arrive at a solution even when we have 2/3rd nodes which are loyal.

Contributions:
For unsigned messages, when more than 2/3rd nodes are loyal, they provide an algorithm to arrive at a solution. At a high level, the algorithm is a recursive one where each node becomes a commander in turn and the other nodes become lieutenants and the message is sent by the commander which is flooded by all nodes to all the other nodes and then a majority vote is taken to decide the action at each node. For the signed messages scenario, the number of traitors does not matter. The problem becomes easier since the traitors cannot lie anymore (because of the use of unforgeable signatures and the ability to verify the authenticity of signatures).

My Key Takeaway:
Theory plays a big role here. The capability to model a given problem in terms of existing problems like Lamport et al did in this paper with reliable systems and Byzantine generals is very important and my key takeaway from this paper. Modeling in terms of other known problems provides a very good base to start the solution of our own problem. Just reading the 3 generals problem seems to be somewhat trivial but as we found out it is far from it. Another key takeaway is that handling Byzantine failures is complex and costly which fits in with why systems generally have fault models with predefined scopes that they handle.

What I found confusing:
Not really a confusion but what I do wonder is how large scale distributed systems handle byzantine which seems so expensive. Do modern systems not consider this in their fault model?

Posted by: Chaithan Prakash | October 14, 2014 02:47 AM

Summary:

In this paper the author discuss solutions in achieving consistent communication in a reliable computer system dealing with malfunctioning components that potentially give conflicting information to different parts of the system. Parallels are drawn from the classic Byzantine General Problem where geographically separated generals much communicate upon a common battle plan with the presence of potential traitors. Several theoretical solutions are provided to this problem under various hypotheses such as number of potential traitors and use of oral or written messages; and their applicability in implementing reliable computer systems is also presented.

Problem:

Achieving reliability in the face of arbitrary malfunctioning is a very difficult problem since malfunctioning components can behave arbitrary by sending inconsistent or incorrect or no information to other parts of the system. This is similar to or known as the Byzantine Generals Problem. For a system to be reliable a consistent agreement needs to be reached among the properly functioning components or nodes even with the presence of other faulty components or nodes. This paper primarily deals with an actually faulty component as opposed a potentially malicious component.

Contributions:

A theoretical proof is presented as a corroboration to the solution (under both situations when only Oral Messages or Signed Messages were used).
Adapting a classic real world problem in to the realms of theoretical distributed systems.
Different hypotheses were presented and theoretical solutions were provided.
Although the solutions were expensive both in amount of time and the number of messages required; the author claim them to be optimal.
Formal definition of "Interactive Consistency Conditions" is provided.
Using majority voting to obtain consensus is used by many later systems.

Learned:

What I found very interesting and informative was the definition of the problem while drawing parallels from a classic problem. Also the simple yet logical solution that was used to circumvent the problem was intriguing.

Confusion:

The authors claim that their solution(s) are optimal even though they are expensive in term of number of messages required and amount of time; it is a little confusing to understand the applicability of this solution(s) in large scale systems that are prevalent today.

Posted by: Saikat R. Gomes | October 14, 2014 02:31 AM

Summary:
To design a reliable distributed system in the presence of malfunctioning server nodes, to achieve consistency agreement of consensus between the nodes on a agreeable value is a challenging problem. Lamport is trying to solve this problem by expressing it abstractly as byzantine general problem, in which a command sent by the general is passed on to the lieutenants. In this case, the command can be sent wrongly by the commander in case of faulty node, else by other faulty lieutenants. In order to be able to resilient to 'm' faults, there should be at least 3m+1 nodes, in case of oral messages. The other algorithm specified in the paper is signing messages where the commander signs the message which cannot be forged and doesn't need the 3m+1 nodes.

Contributions:

1. Oral message algorithm, in which to tolerate m faults, the algorithm needs at least 3m+1 nodes and m iterations to compute the majority consensus.
2. Written message, in which the commander signs the message, which cannot be forged by other nodes, hence other faulty nodes cannot change that. But if the commander itself is faulty, then also a loyal node can identify by comparing with response from other nodes.

Limitations:

1. They didn't talk about what happens when the message is not sent by the traitor node.

Learned:

This paper shows how a complex problem like this can be modeled in a abstract way to look simpler and to come up with an elegant solution. The oral message algorithm which looks less compute intensive but generates a large amount of traffic.

Confusing:

One thing confusing about this paper is that, in all the cases , if the node cannot converge on single majority they select a default case. But in real-time, how this default value can be selected ?

Posted by: Dinesh Rathinasamy Thangavel | October 14, 2014 02:24 AM

Summary
The paper presents a method for achieving reliability in computing systems where components can malfunction and can produce undesirable outputs. The authors first give a solution to achieve reliability using oral message algorithm which works for less than 2/3 failures, and then go on to present the signed message algorithm which removes this limitation.

Problem
When components malfunction, they may not only fail to give output but they can give unexpected/ wrong outputs that can conflict with the correct (nonfaulty) information, thereby corrupting it. Unless the nodes of the system do not agree upon an algorithm for reliability, they cannot be expected to be reliable. In case of conflicting inputs, it is not only difficult to agree on the right decision but also to get an approximately right decision. Moreover, this issue is not solvable at the hardware level.

Contributions
• The authors use the annotation of the byzantine army’s camping (Byzantine generals problem) to illustrate the algorithms which makes it easy to understand.
• The 2 interactive consistency conditions form the prerequisites for solving the problem.
• The Oral Message (OM) algorithm solves the problem with • The main issue with the OM is that the traitor can lie, so it needs more than 2/3 loyal generals (nonfaulty nodes). This can be resolved by using signed message (SM) instead of modifiable oral message.
• In the SM algorithm, the loyal generals can identify the traitors by verifying their signatures and the decision sent. This helps in solving the problem with any number of traitors in the system.
• It’s also interesting how failure in the communication line can be regarded as a node failure and then the algorithms essentially still holds true.
• They talk about how SM signature can maintain the integrity (although it’s just a packet of data) in cases of random malfunction and malicious intelligence.

Confusing/ Doubts
• How well does these solutions scale today? The message length = (m+d), but today that “d” can be very high. So won’t that also add to the complexity?
• How do we decide the “m”? Is it the expected maximum failures or any other method?

Learnings
• Firstly, the paper gave clear idea of how unexpected things can happen in distributed systems in cases of failures. It’s also interesting to know how the hardware can lead to faults and why hardware cannot be used to detect these faults which makes such algorithms necessary.
• It gave me a perspective of the complexity of achieving high reliability in such cases. For highly reliable systems it gets more and more complicated. The inductive approach used in the paper is appealing.

Posted by: Chetan Patil | October 14, 2014 01:58 AM

Summary: Leslie Lamport does it again, co-authoring another foundational paper in distributed system that discusses the byzantine generals problem and how it applies to reliable systems. This problem involves how loyal generals can reach the same consensus in the case where there are traitors. The authors show that this can only be done if there are at least 3m + 1 loyal generals and m traitors and give an algorithm that ensures loyal majority. They also give an algorithm for signed messages.

Problem: One of the central issues surrounding distributed systems regards how to make them reliable. The key idea behind reliability is fault/failure tolerance. A system isn't very reliable if a single failure can bring it to its knees. However, there is one type of failure that seems inherently hard to deal with, and that is how to deal with a failure that causes components to send conflicting information to the rest of the system. This can be done either maliciously (i.e. a system compromise by an adversary) or innocently (i.e. a bug). This paper addresses what kind of assumptions and algorithms are needed to be reliable with the onset of this type of failure (coined byzantine failure by this paper).

Contribution: The main contribution of this work is its outline of the byzantine generals problem and its relevance to distributed systems.

First, they give a description on what it means to withstand a byzantine failure and that is that loyal generals reach consensus and that they can't reach a bad plan if there are a small number of traitors. They show, that this problem is unsolvable with 1 traitor and two non-traitors via oral messages (i.e. messages that are analogous to the messages sent in a distributed system). They then expand this notion to show that it the problem is unsolvable for less than 3m + 1 generals with m traitors.

After showing the negative result, they give an algorithm that solves the problem with oral messages. Basically everyone sends the order they received to everyone else and they each compute majority, which should be the same. The correctness follows from knowing that there are at least 3 times as many loyal vs disloyal generals.

Then contribute a solution to the problem using signed messages, which can deal with arbitrary number of traitors. The algorithm is pretty much the same but now, when generals send messages to each other, they have a signed list of people that order went through. Signatures cannot be forged.

Finally, they connect this problem to that of building reliable distributed systems by showing that the model of processors communicating with other processors fits the generals problem and the assumptions made in majority voting fit exactly with IC1 and IC2 that need to be solved with the generals problem.

Confusion: One thing I found confusing was the seemingly interchangeable use of generals, commanders, and lieutenants. Is the commander the head general and the lieutenants are generals that are subordinate to the head general? Or do all generals have lieutenants that are not generals, just lower ranking officers? Or do these distinctions not matter and they might as well all be called generals?

Learned: A more formalized way to think about byzantine failures. Basically, so far in class, byzantine failures just received a hand-wavy explanation: “failures in a system that can have arbitrary behavior”. Here, I learned exactly how to think about byzantine failures and mechanisms that can be put in place to ensure some robustness against it. Before I read this, it seemed that these types of failures were almost unsolvable. They may still be given that it may be prohibitively expensive (in terms of sending messages) to defend against them.

Posted by: David Tran-Lam | October 14, 2014 01:01 AM

Summary:
Paper presents the problem of correctly working nodes reaching consensus in the presence of benign failure or malicious nodes. The authors present the problem with a notion of Byzantine army generals reaching agreement on plan of action, where some generals may be traitors.

Problem:
To ensure a system is fully reliable, it should be assumed that any trusted node can become faulty to the point of behaving arbitrarily, sending potentially any messages including a pattern of messages that could be directly in opposition to the correct functioning of the system. The goal of this paper is to propose an algorithm which when followed by all non-faulty nodes will ensure that at least those nodes will behave reasonably in concert.

Contributions:
1) Interactive consistency conditions, defining what "behaving reasonably" means, for those non-faulty nodes.
2) An algorithm is given which can withstand m byzantine faulty nodes given there are at least 3m+1 nodes. It is shown that if messages can be forged that this is the best that can be done.
3) If messages cannot be forged and signatures can be verified, the authors give an alg with which any number of faulty nodes can be tolerated (the cases where all or all but one node are faulty are vacuously correct under any algorithm).
4) Both the given algorithms are inductively defined in terms of "m", the number of failures which can be tolerated. This is a nice property as any implementation of the algorithms can tune "m" to trade off between efficiency and worst case number of failures to guarantee robustness to.

Learned:
I was glad that they presented and was surprised to know that actually their technique is optimal in the number of hops and signatures a message must go through to guarantee correctness of any algorithm for this problem (for a given m). It seems that it is a high cost to handle worst-case faults.

Confused:
I wasn't entirely convinced by their argument about approximate agreement, and felt that maybe the given problem there was contrived. It would have been nice to see a more general formalization of "approximate agreement" than "within 10 minutes". I thought the mapping from that problem to the original exact agreement problem was maybe too trivial.

Posted by: Brandon Davis | October 14, 2014 01:00 AM

Summary:
The authors introduce an agreement protocol, in distributed systems with malfunctioning components that give conflicting information to other components of the system, to increase reliability.

Problem:
In any distributed system with multiple components, individual components can arbitrarily become faulty and communicate buggy information to other components thereby affecting the reliability of the system. This could also happen because of malicious processes being introduced in the system. The authors in this paper design an agreement protocol which produces a correct/consensus result even in the presence of these faulty components.

Contributions:
1. The authors did a wonderful job by comparing the distributed systems scenario to an army generals communication analogy and extend the solutions.
2. Formalization indicating that to reach a consensus, there are at least 2/3rd loyal nodes required. And the examples provided here were very intuitive in the 3 node 1 faulty node setting.
3. The authors then design two protocols to achieve consensus.

The first one is an oral message based protocol where the algorithm iterates for m rounds if the system needs to handle
m faulty nodes. Here, this protocol could not handle situations where at the number of non-faulty nodes were not at least 2/3rds of the system.

A signed message protocol where each node appends its signature to the received message and passes it along. Here,
assumptions are made that no node can change the contents of the message without being noticed, and that any node can verify the authenticity of any other node. This protocol can detect exactly if the commander node is faulty or not. As can be derived, this works for scenarios where number of faulty nodes is more than 1/3rds of the system also.

Confusing parts in the paper:
The paper says that in case of an unclear decision, the generals just choose “retreat”. Translating to the systems scenario, the nodes pick a default value, say 0. But, what I would like to know is that in a dynamically changing system with lots of input and output values, in the case of no clear consensus, how the nodes will end up choosing the default value.

What I learned:
The idea of comparing the distributed systems scenario to a real world example is very neat and cool. It made me understand the concept better. Also, I learned that achieving reliability in a system with many nodes is a very hard task. I also understood how important it is to lay down all the assumptions the system is going to make before designing the algorithm.

Posted by: Anusha Dasarakothapalli | October 14, 2014 12:41 AM

Summary: This paper studies the Byzantine Generals Problem and presented two algorithms to solve the problem. The first one assumes using only oral messages to communicate and can solve the Byzantine Generals problem when at most one third of the generals are loyal. The other one assumes authentic signatures can be employed and can solve the problem with any number of traitors.

Problem: Assume there are n generals, one of them is the commander, and the others are lieutenants. Some generals, including the commander, are possibly traitors. The commander will give lieutenants an order. The goal of all (loyal) lieutenants is to come to an consistent interpretation that order. Moreover, if the commander is loyal, that interpretation must be the same as the commander's original order. A simple algorithm that each lieutenant just do as what the commander tells them does not work because if the command is traitor, he can send different orders to different lieutenants, which will result in an inconsistent interpreting among lieutenants.

Contribution:
1. Converted a fault tolerance problem in distributed systems into the Byzantine Generals Problem, and gave a simple yet interesting formalization of this problem. The original problem in distributed system is that how do other nodes act consistently given that fault nodes can do anything crazy (like sending inconsistent messages to other nodes)? The authors converted this problem into the problem of sending a message and then collectively interpreting the message.
2. Showed the theoretical limits of the Byzantine Generals problem. The author showed that under oral messages (normal messages that can be forged by anyone), the Byzantine Generals Problem is not solvable when the number of traitors is more than one third.
3. Proposed an algorithm that can solve the Byzantine Generals Problem using oral messaging given that the number of traitors does not exceed one third. This recursively constructed algorithm makes use of the echos of the commander's message from other lieutenants to help determine the final interpretation.
4. Proposed an algorithm that can solve the Byzantine Generals Problem using signed messages under any number of traitors.

What I found confusing: I understand the argument that given 2 loyal generals and 1 traitor, it is impossible to give an algorithm to solve the Byzantine Generals problem. But I don't understand how this result can be used to argue for the case when there are 3m or fewer generals coping with m traitors.

What I learned: I learned why the Byzantine Generals problem is hard, and how to solve it using the two algorithms the authors had proposed.

Posted by: Menghui Wang | October 14, 2014 12:37 AM

Summary:
This paper discusses the challenge to design reliable distributed system in the case that faulty nodes can do whatever they want. The nodes of the system exchange information by sending/receiving messages. If the message’s source can’t be authenticated, then the solution only exists if the number of faulty nodes is less than ⅓ of the total number of nodes. If the message’s source can be authenticated, then the solution always exists.

Problem:
In distributed system, some faulty nodes not only don’t work properly, but also can do some harmful things for the whole system’s functionality, e.g. to make other working nodes confused about current status by sending fake information. To build reliable distributed system, the strategy in every single working node needs to be prepared for such kind of scenarios, and works with other working nodes together to make sure the functionality of the whole system is robust and reliable.

Contributions:
The authors analyze the problem in distributed system when faulty nodes do harmful things to affect other working nodes’ behavior and eventually affect the functionality of the whole system. They also formalize the problem with assumptions and simplifications, and give the problem an interesting name - “Byzantine Generals Problem”.
The authors discuss two different message exchange modes (oral message and signed message) used in distributed systems, and analyze the requirement for the system configuration in order to build a reliable system with oral message mode which can get rid of Byzantine Generals Problem.
The authors also give the example algorithms to solve Byzantine Generals Problem in case of oral message mode and signed message mode.
They prove the correctness of the requirements for system and the example algorithms in theory with induction.

Discussion:
In my understanding, for oral message mode, the job of every working node is to make sure the majority of the information they have collected are consistent. To achieve such kind of consistency, the working nodes need to be major among all the nodes in the system to some extent. I tried to come up with an intuitive understanding about the “extent”, e.g. based on the information quantity of every message, but I haven’t achieved that. The proofs given by the authors are rigor, but hard to follow. The example algorithms given by the authors: OM(m), seem to correct in theory, what I am confused is how can we know the value of “m” in practice.

For signed message mode, every working node can either get the same information from the commander node if the commander node is working fine, or get the same set of information if the commander node is faulty. In both scenarios, all the working nodes can get consistent information about current status of the whole system. Distributed system built upon signed message mode along with the algorithm proposed in the paper is reliable in case of Byzantine failure, which is promising.

Posted by: Peng Liu | October 14, 2014 12:10 AM

Summary:
The paper discusses about the problem of reaching consensus in an environment where there can be parties that can send conflicting information or exhibit random behaviors like not sending any message at all.

Problem:
Agreement about any particular thing like system state, value of a variable etc is very important to build reliable systems. For example, a distributed database system has to determine if all replicas can update their data as a result of an update transaction. The problem is that systems can fail. If these failures are fail-stop, then the countermeasures to still achieve agreement can be simple. The other type of failure where a system can behave in a completely unpredictable way is hard to deal with. This paper presents algorithms that can overcome such failures and still reach agreement. The problem is explained as Byzantines general problem where a set of distributed army forces have to reach an agreement on whether to "attack" or "retreat".

Contributions:
1. Formalization of the problem - The authors formalize the agreement problem by providing the interactive consistency conditions.
2. An illustration on why 3 generals cannot reach agreement if there is one traitor. And then, the authors also extend to case where there can be 'm' traitors. They show that to tolerate m traitors, there have to be >= 3m+1 generals in total.
3. The OM algorithm that can work if the number of generals is atleast 3m+1 if there are m traitors. The SM algorithm which uses signed messages that work for any number of traitors m and atleast m+2 generals in total.
4. Proofs of correctness for OM and SM algorithms.

Learning:
It is important to make the assumptions and conditions clear before attempting to solve a problem. It is nice to see how the authors roll out explicitly the assumptions that they are making and then proceeding to propose the solution.

Confusion:
It is not clear to me if the OM algorithm can work well with large number of nodes in the system. It looks like the OM algorithm has high communication overhead and hence I am not sure how current systems solve this problem. And also, given that the SM algorithm is simple and works correctly, I am not sure if it is too much to assume that the messages cannot be forged sent by the commander in current systems. If this can be assumed then there may be no need for OM algorithm.

Posted by: Ramnatthan Alagappan | October 13, 2014 11:46 PM

Summary:
In this paper authors present a problem called the Byzantine Generals Problem which describe a new category of problem that requires agreeing on the correctness of data when some components of the system may be malicious or in error. They have presented several methods (under different hypotheses) for improving reliability with allowing increased number of error locations but still able to determine the correct answer.

Problem:
Byzantine failure (in which components of a system fail in arbitrary ways producing inconsistent output) is a possibility in any distributed system with multiple processes/nodes where any individual process can arbitrarily become faulty and participate in any distributed protocols of the system. Such failures can happen unintentionally due to buggy processes/hardware or intentionally by malicious processes introduced in the system. The aim of this paper is to design a method
resilient to these failures or in other words, achieve "correctness" even in the presence of such faulty nodes.

Contributions:
- Formalization of the Byzantine Generals problem showing that it can be solved if following two conditions are met. IC-1: All loyal lieutenant obey the same order. IC-2: If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. In real systems, loyal lieutenants are the non-faulty nodes.
- Use of majority function in arriving at a consensus among loyal generals with help of Oral Message(OM) algorithm.
- Proved and provided a lower bound on how many loyal generals you need to be able to solve the problem -- in the case of oral messages, 3m+1 general for m traitors.
In the case of signed messages, you need only m + 2 generals; any less and the problem is a non-issue.
- Showed the applicability in reliable systems, when faulty processes can send any possible messages but still non-faulty processes can arrive at correct result
consistently.

Flaws:
one flaw or shortcoming of the paper is the inefficiency of the proposed solutions. Both OM and SM algorithms require exponential time which will definitely cause scalability problems.

Learned:
- Inductive approach for designing an algorithm.
- It is possible to design highly critical systems which can arrive at best possible solution using the signed messages approach, in-spite of presence of
malicious nodes.

Confusing part of the paper:
-How is value of "m" (number of faulty nodes) decided when designing a real system and what are the trade-off associated with it.
-How realistic is it for a system to have a full Byzantine Generals solution and at what point the expense of increasing the number of "traitors" that can be tolerated by one more could not be better spent improving another part of the system.

Posted by: Bhaskar Pratap | October 13, 2014 11:41 PM

Summary :

The paper discusses the two algorithms namely the Oral Message and the Signed Message algorithm to solve the Byzantine Generals Problem. It is a problem which deals with the loyal generals being able to arrive at a common plan of action in the presence of faulty generals. They also discuss the applicability of the algorithms to computer systems to achieve a common agreement in the presence of faulty processors.

Contributions :

1. They arrive at the interactive consistency conditions that must be satisfied by both the OM and the SM algorithms in order to solve the byzantine generals problem.
2. In the presence of oral messages alone, they give a formal proof to show that the algorithm has a solution only if there exist atleast 3m+1 generals in the presence of m traitors.
3. Usage of the majority function by the loyal generals on the messages received from all the other generals in the recursive OM algorithm to arrive at a common plan of action.
4. The signed message algorithm adds signatures to the messages and assumes that the signatures cannot be forged / forged signature could be detected. By doing so, they simplify the algorithm and prove that this algorithm can work with any number of generals in the presence of m traitors.
5. The conditions corresponding to IC1 and IC2 that need to be satisfied in the case of computer systems for majority voting to work are outlined quite clearly.
6. How the assumptions of OM and SM algorithms apply to computer systems and what are the ways in which these could be achieved are stated. Usage of timeouts to detect absence of messages and randomized functions to reduce the probability that a faulty processor could forge a signature are the ideas used.

What is unclear :

I am not sure to what extent it could be implemented in a practical distributed system because it is highly recursive and incurs a huge message overhead. It is not a technique that is scalable. In a practical system, how do we determine the maximum number of faulty nodes?

Learning :

I learnt the way in which the concepts of majority voting and signed messages are applied to solve the complicated problem of how non faulty nodes in a system can arrive at a common decision inspite of the presence of byzantine faulty nodes. Mainly I learnt the relevance of the assumptions that needed to be made in order to simplify the problem.

Posted by: Krishna Gayatri Kuchimanchi | October 13, 2014 11:30 PM

The Byzantine Generals Problem

Summary:
The paper addresses the ways in which the byzantine generals can come to an agreement about their course of action to attack an enemy city inspite of the fact they can communicate only via messengers and their could be traitors among the lieutenants and/or generals.
The anology between the byzantine general problem and the process of coming to an agreement inspite of failures or presence of malicious nodes in a distributed system is also presented.

Problem:
The main problem the authors try to solve is to make sure that all generals decide upon the same plan of action and also to make sure that a small number of traitors shouldn't be able to cause the good generals to adopt a bad plan. To solve this problem they come with 2 algorithms, one which involves only oral messages and the other which makes use of public key cryptography to produce signed messages.

Contributions:
1. The most important contribution of the paper is the way the authors formalized the problem of handling failures in a distributed system by telling us a story about Byzantine Generals to make it more interesting.
2. The solution of using majority voting when only oral messages can be sent among the generals was good and also coming up with the constraint that atleast more than two-third of generals need to be good to come up with a correct decision was important.
3. The authors have use the techniques in public key crytography to come up with the solution when signed messages among generals is possible is amazing. They also proved that a 3 general problem with one traitor can be solved with this technique.
4. Although the solutions to the problem seem trivial, the important contribution was to point out that such a problem exists and the importance of addressing such problems in a distributed computer system.

Limitations:
The cost of using numerous message among multiple nodes in a huge distributed system will cause much overhead on the system. Also the use of signatures will add some overhead since all the nodes need to sign these messages.

Things I learnt:
I learnt from this paper that if we can present a technical problem in the form of a story, it would be more interesting that stating it in plain way. Dijkstra's dining philosopher's problem is also an example for this. I also learnt that a proper consensus can be achieved in a distributed system even in the presence of failures.

Things I found confusing:
I wasn't able to understand what would be the course of action the lieutenants would take when the commander is a traitor and if he sends signed messages. The 2 lieutenants can find out that the general is a traitor using his signature on 2 different commands but the course of action is still unclear.

Posted by: Adalbert Gerald | October 13, 2014 11:29 PM

Summary:
The paper targets failures caused by the malfunctioning of nodes such as nodes sending conflicting/inconsistent outputs. The problem is abstracted by Byzantine Generals Problem with non-faulty processors as loyal generals and faulty ones as traitors. The authors propose two solutions, Oral Messages and Signed messages to be used by generals ( a.k.a processors) to decide and move to a consistent state. Later on, it is shown that these solutions can be applied to reliable computing systems.

Problem:
Problem in hand is given a distributed system where multiple nodes are prone to failure, how can various nodes achieve consistency. In terms of Byzantine Generals Problem, how can a set of Generals decide upon a common plan of action in the presence of traitors.

The paper formalizes the problem into two interactive consistency conditions: (i) All loyal lieutenants obey the same order, (ii) If commanding general is loyal, then every loyal lieutenant obeys the order he sends.

Contributions:
1. First Solution proposes Oral messaging. The solution is proven to works iff more than two-third of the generals are loyal i.e. there must be atleast 3m+1 generals for no more than m traitors.

2. With Signed messages (more secure and unforgeable), a solution is possible for any number of traitors provided signatures can't be forged and anyone can verify the authenticity of general's signature.

Learning:
The paper draws a very good analogy between Byzantine generals and processors in a distributed system making it very easy to understand. As noted by authors, each node needs to send a large number of messages to every other node repeatedly, verify signature(encryption/decryption cost) making the solutions very costly. For achieving high reliability, that's the necessity - My learning from the paper..

Confusing things:
I am still confused about the final choice lieutenant generals make when they identify that commander is a traitor, how is choice made? If it's the median, how is it calculated? Or do they go with the default value?

Posted by: Harneet Singh | October 13, 2014 10:58 PM

Summary:
This paper is a model for a how distributed system can reach agreement in spite of failures. The Byzantine generals analogy represents a system problem where malfunctioning components send conflicting messages in the system. An algorithm is presented that allows generals to communicate to troops and have their orders followed (a reliable system) in spite of traitors (faulty system components) among the ranks. The “commander” is the unit generating the input, ”lieutenants” represent processors, ”loyal” means system components that are not faulty or maliciously compromised, and ”messages” are packets.

Problem:
A Byzantine army is camped outside an enemy city. After assessing the enemy, the loyal generals must all agree on a common course and get the right message to the lieutenant generals in spite of traitors among their ranks (who will send conflicting messages, or no messages at all). Two interactive consistency conditions must hold: 1-all loyal lieutenants obey the same order, and 2-if the commanding general is loyal, then every lieutenant obeys the order he sends.

Contributions:
1-The Problem Model
The Byzantine General Problem (as stated above) is a good analogy because it greatly reduces complexity without losing the key properties that make the analogy useful. By simplifying the problem, Lamport et al create a model with a provable properties.

2-Oral Algorithm (and proof):
Through proof by contradiction, we see that via oral communication, no solution with less than 3m+1 generals can handle m traitors, where generals send messages to n-1 lieutenants. A few assumptions must hold for this oral algorithm to work: A1-every message that is sent is delivered correctly, A2-the message receiver knows who the sender was, and A3-we can tell if no message was sent. Since each lieutenant must send messages to each other lieutenant, we can see this algorithm is extremely expensive.

3-Signed Messages Algorithm (and proof):
With this algorithm, public key cryptography becomes an analogy for a general’s unforgeable signed message, and we can deal with any maximum number of traitors (as long as that number is known). The assumption is made that traitors can collude, and additional assumptions are required that: A4a: a loyal general’s signature cannot be forged, and A4b: anyone can verify the authenticity of the general’s signature. Once all lieutenants have received & verified signed orders (either all messages or timeout) the choice function is applied to decide whether to retreat (empty set) or follow some other order where choice(V) = v.

What I Found Confusing:
You will be shocked to hear that I didn’t completely understand the proofs; mostly I was unclear on how we can guarantee the signed messages will work with a very high number of traitors, just because we know how many traitors there are..

What I Learned:
There is a clear trade-off between reliability and performance. If we want high reliability, massive amounts of intercommunication between components are required, and some computation will be required to verify signatures and compute votes.

Posted by: Jason Feriante | October 13, 2014 10:28 PM

Summary:
This paper provides algorithms for handling scenarios in which malfunctioning components could give inconsistent values to other components in a distributed system. In order to be reliable a distributed system needs to be fault tolerant under such scenarios. The authors correlate this to a problem called "Byzantine Generals Problem" and specify how the solution to the problem can be applied into distributed systems, thus making them more reliable.

Problem:
Failures which relate to inconsistent behavior from a node are hard to detect. These failures could cause other nodes in a distributed system to not achieve a consensus or to achieve a wrong consensus. Detecting and continuing to function correctly during these types of failures is required to make the system more reliable.

Contributions:
1. The theoretical work which states that to achieve a correct consensus in a system of n nodes , we need at least more than 2/3 rds of the nodes to be non-faulty.

2. Devised ORAL messages algorithm in which a system could handle scenarios in which 1/3 rds or less nodes to be faulty. The algorithm ran up to m rounds if there are m faulty nodes in the system.

3. Developed signature based algorithm based on signed messages which could handle scenarios in which more than 1/3 rds of the nodes could be faulty. This algorithm ensured that none of the faulty nodes could change the original message sent to it and any node could verify the authenticity of the messages sent to it. In comparison to Oral messages algorithm, this could detect the faulty nodes faster.

Thing I learnt:
Achieving reliability under such scenarios is a challenging task. Also, some scenarios like resorting to a default action under the case of no majority is not easily achievable in a real-world scenario.

Thing I found confusing:
Under the solution with signed messages algorithm, I don't quite understand how the choice() set is different from the majority selection of Oral Messages algorithm.

Posted by: Manasa Subramanian Ganapathy Subramanian | October 13, 2014 10:22 PM

Summary:
This paper discusses a set of algorithms to cope with conflicting information send by faulty/unreliable components in a system. The problem is expressed as Byzantine General Problem where the loyal generals wants to reach an agreement even in presence of traitors.

Discussion:
Problem: A commanding generals sends an order to n-1 lieutenants and we want that i) all loyal lieutenants follow the same order ii) if commander is loyal, then every loyal lieutenants follow the order he sends. This paper presented two types of algorithms which make certain assumptions about message communication. If oral messages are used, this problem can only be solved if more that two-thirds of the generals are loyal. However, with signed messages, this can be solved with only m+2 generals (at most m are traitors).

Contribution:
1. One of the major contribution is expressing the problem using byzantine general problem and providing solution and its proof. This problem is very relevant to distributed system where different process has to deal with faulty components and they want to handle it in consistent way.
2. It provided solution to the problem under various assumptions such as: i) oral messages - the traitor can relay false order, ii) signed messaged - traitor cannot forge a loyal commander message, iii) missing communication path - not all generals can send message to each other.
3. In the last section, author discuss how this problem relates to computing system and how the solution can be applied. Solving this problem also require solution to some other kinds of challenges such as replication, clock synchronization and cryptography.

Learned: I like how author expressed the problem using Byzantine generals army. It's easier to discuss about a problem and analyze it if you express it in term of some real world problem.

Confusion: I am confused about how the different lieutenants make an agreement in oral message algorithm. Because of the recursive step in the algorithm, a lot of messages needs to be send (around n^m).

Posted by: Avinaash Gupta | October 13, 2014 10:21 PM

Summary:
This paper discusses a set of algorithms to cope with conflicting information send by faulty/unreliable components in a system. The problem is expressed as Byzantine General Problem where the loyal generals wants to reach an agreement even in presence of traitors.

Discussion:
Problem: A commanding generals sends an order to n-1 lieutenants and we want that i) all loyal lieutenants follow the same order ii) if commander is loyal, then every loyal lieutenants follow the order he sends. This paper presented two types of algorithms which make certain assumptions about message communication. If oral messages are used, this problem can only be solved if more that two-thirds of the generals are loyal. However, with signed messages, this can be solved with only m+2 generals (at most m are traitors).

Contribution:
1. One of the major contribution is expressing the problem using byzantine general problem and providing solution and its proof. This problem is very relevant to distributed system where different process has to deal with faulty components and they want to handle it in consistent way.
2. It provided solution to the problem under various assumptions such as: i) oral messages - the traitor can relay false order, ii) signed messaged - traitor cannot forge a loyal commander message, iii) missing communication path - not all generals can send message to each other.
3. In the last section, author discuss how this problem relates to computing system and how the solution can be applied. Solving this problem also require solution to some other kinds of challenges such as replication, clock synchronization and cryptography.

Learned: I like how author expressed the problem using Byzantine generals army. It's easier to discuss about a problem and analyze it if you express it in term of some real world problem.

Confusion: I am confused about how the different lieutenants make an agreement in oral message algorithm. Because of the recursive step in the algorithm, a lot of messages needs to be send (around n^m).

Posted by: Anonymous | October 13, 2014 10:21 PM

Summary:

This paper describes the necessary conditions required to achieve consensus in the presence of malicious or faulty processes. It outlines two algorithms that can tolerate various levels of traitors in the system, and discusses how this can be applied to build reliable systems.

Problem:

In a distributed system, determine how you can achieve consensus among processes when a few of them can be malicious. Determine how many traitors you can tolerate under different communication protocols.

Contributions:

- Gives a theoretical limit to the scale of the system when a number of traitors are known.
- Gives an algorithm to achieve consensus even in protocols which can be tampered for a small number of traitors.
- Gives an algorithm to achieve consensus quickly if integrity of the message is guaranteed.

Limitations:

Too many messages are sent through the system.

One thing I learned:

- Do not assume limits like n/3 traitors cannot be broken at all just because there's a proof in a paper. It is important to check the assumptions of the proof. In the presence of stronger guarantees (like signatures), this is not be true like we have seen with the signed message algorithm.

One aspect I found confusing:

- I did not understand why achieving approximate consensus is supposed to be as hard as exact consensus.

Posted by: Satyanarayana Shanmugam | October 13, 2014 09:51 PM

Summary
The paper models a distributed system, where the nodes are attempting to achieve some form of consensus in the presence of faulty/malicious nodes that attempt to spread incorrect/conflicting messages. The authors formalize the definitions of such a consensus in the presence of faulty nodes and identify the base case for which such a consensus can be achieved. The problem is modeled as a Byzantine’s General algorithm where loyal generals are trying to achieve consensus on a plan of attack or retreat in the presence of traitors that are attempting to coerce loyal generals into making a ‘bad’ decision. They go on to provide an algorithm that can be employed to achieve consensus where a system has m faulty nodes, if 3m+1 or more nodes are present. They also outline a solution that uses message signatures to identify consensus.

Problem
In a distributed system, meaningful outcomes can be achieved only when nodes that are a part of the system arrive at some form of agreement on some system state. This becomes especially hard in the presence of nodes that are faulty or malicious, as simple majority voting becomes inconclusive. Analogous to a distributed army trying to co-ordinate an attack, the part of the army that is loyal to the commander will have to converge on a strategy proposed by the commander, even if traitors are part of the army. If the commander is a traitor, the loyal groups will have to agree to refrain from making a bad decision.

Contributions

Formalizing this problem by deriving an analogy to a distributed army trying to stage an attack, and defining the base case that it’s impossible to achieve consensus if less than 2/3 rd of the troops are loyal.

Proposing an algorithm to arrive at a consensus if the base case is satisfied. The commander sends out a message to all (n-1) lieutenants. Each of the (n-1) lieutenants assumes the role of the commander and sends it out to the remaining (n-2) lieutenants, and this continues for m iterations, when each lieutenant just sends the decision to the other lieutenants. This can be thought of as a tree of decisions, that is formed by a chain of messages as seen by every other node in the system. By conducting a majority vote at each level of this tree, all the way up to the root, if less than (n-1)/3 faulty nodes exist, consensus can be reached by all the nodes performing this majority vote at the root of this voting tree.

Introducing the signed messages approach to performing the same task of distributed consensus. The fact that messages and their ownership cannot be tampered with because of a well known signature, there is now no limit placed on the number of treacherous nodes that can exist in the system.

What I found confusing
Given that the network has to be a complete graph, and given that network hop failures cannot be distinguished from node failures, I don’t particularly understand if this could even be used in a real system with this tight assumption. What is a realistic value for ‘m’ that can be assigned in such a world, given that the points of failure in the system has exponentially increased with the number of nodes?

What I learnt
I learnt that it’s possible to achieve consensus in the presence of nodes that could be misfiring if the number of such nodes that are faulty is within a reasonable amount compared to the number of nodes in a system. This distributed consensus can have useful application as was demonstrated in the 3 phase protocol in the second paper, with a global ordering of request triggered events and a consensus response can be achieved even in the presence of faulty nodes.

Posted by: Vijay Kumar | October 13, 2014 09:27 PM

Summary: This paper defines and solves the Byzantine Generals Problem, which means one or more components in a computer system can have failure such as sending conflicting information to different parts of the system.

Problem:
A computer system is like a Byzantine army. The generals can communicate with each other only by messenger. Some of the generals may be traitors and send wrong messages, just as a failed component in a computer system may send conflicting information to different parts of the system. The generals have to decide upon a common plan of action.

Contributions:
(1) Abstract the component failure in a computer system to Byzantine Generals Problem. This helps to analyze and solve this problem. A commanding general must send an order to his n-1 lieutenant generals such that a. All loyal lieutenants obey the same order. b. If the commanding general is loyal, then every loyal lieutenant obeys order he sends. (interactive consistency).
(2) Prove that Byzantine Generals Problem is unsolvable unless more than 2/3 of the generals are loyal. The proof is based on the fact that with oral messages there is no solution for three generals with one single traitor.
(3) Propose the inductive Oral Message Algorithms OM(m). Majority function is used to choose value.
(4) Propose the Signed Message(SM) algorithm. Lieutenants add signatures to the order and send signed copies of messages. A choice function is used to make the decision. SM(m) can cope with m traitors for any number of generals.
(5) Give an application of the algorithm proposed, where multiple processors should compute the same result and majority vote will performed on the outputs.

Learned & Confusion:
This paper gives a good example about abstracting and formalizing problem (A reliable system with failed components -> Byzantine Generals with traitors). Besides, its inductive way of designing algorithm and analyzing problem also make sense. The paper mentions that OM(m) and SM(m) involve sending up to (n - 1)(n - 2) ... (n - m - 1) messages and the number can be reduced by combining messages. The "combining" means "batch the sending" or "run a combine function on the output messages?"

Posted by: Jing Fan | October 13, 2014 09:19 PM

Summary: In this paper, the author studies
have multiple parties in a distributed system
can achieve a consistent state via pairwise
massage passing even with the appearance of
traitors, that is, node that passes wrong
information, no matter intentionally, or
unintentionally.

Problem: The difficulty of this problem
is the existence of traitors, and the fact
that a single node can only obtain local
information from message it receives. The
problem is how to achieve consistency to
non-traitors with the existence of these
traitors and a global ground truth.

Contributions: The first contribution of
this work is the study of two algorithms,
one with oral message, and the other
with written, and unamendable, signature.
For the oral message algorithm, the intuition
in my opinion is to redundantly pass messages
systematically, and allow each node to apply
the majority rule.

The second contribution of this work is
the study of different variants of
the problem, including unfully-connected graph and
how this algorithm can be used to solve real
problem.

Confusing & Learned: The algorithm seems
natural to me, but the way that the formal
analysis conducted is new to me, and I learn
how to formalize similar things. The thing
that I am confused is on the optimality of
the proposed algorithm. Although it briefly mentioned
some lower bound result from Fischer and Lynch,
I do not quite follow.

Posted by: Ce Zhang | October 13, 2014 07:54 PM

Summary: This paper studies how to make the different nodes in a distributed system consistency. The author use the analogy to making the agreement of a battle plan in the bizantine generals (include the loyal generals as the nonfaulty processor and the traitors as the malfunctioned processors). The author proves that using only oral message among generals, the problem is solvable if and only if more than two thirds of the generals are loyal. When the messages source are unforgable, problem is solvable for any number of generals with any number of traitors. The application to reliable systems with malfunctioned processors or malicious intelligence is also showed.

Problem: Some malfunctioned processors in a distributed system may send wrong information. In this case how can the whole systems (the nonfaulty processors) get the true information? The author analogy each processor as a general, and loyal generals (nonfaulty processors) need to make agreement on a battle plan. Generals can send message to any other generals and only traitors (malfunctioned generals) will send wrong information. How to desian a communication protocol among the loyal generals so they all share the same battle plan?

Contribution:
1. The author proved that when the messages source is forgeable, then if and only if there are more than two-thirds loyal generals, the correct battle plan will be agreed by all loyal generals. Furthurmore, the author shows the recursive algorithm to make the agreement among the generals.

2. When the message source is unforeable, the author proved that the agreement can be made without the restriction on ratio of loyal generals. The similar recursive algorithm (with signed source) is develop for making the agreement.

3. The author shows the application in reliable systems, when the malfunctioned processor can send any possible messages to different processors and the nonfaulty processors always perform correctly and consistently.

Learned from the paper: The logic flow the makes the problem simpler and clearer step by step. They first define the ambiguous goal "no bad plan", then go precise to the agreement on the same information, then go to the simpler setting that there is one commanding general and 3m lieutenants. At this stage, the problem is easier to solve by induction.

Confusion about the paper: This paper may be too theoretical thinking. I am surprised about the superexponential number of messages that each general needs to send to make a agreement. Should this paper shed more light on how to reduce this number or show that it is impossible to reduce the number.

Posted by: Shike Mei | October 13, 2014 07:33 PM

Summary:

The paper presents algorithms for solving Byzantine Generals Problem, where loyal generals must arrive at a synchronized attack strategy despite receiving misleading values/not receiving values from traitors among themselves. Formalization of the problem is done by proposing required conditions and they also discuss applicability of their solution to create reliable distributed systems.

Problem:

How to ensure that loyal generals arrive at a single proper decision despite traitors sending altered messages?

The paper addresses this problem by formalizing two interactive consistency conditions that ensure that all loyal generals arrive at same values, also that this value is the one originally sent by a loyal commander.
Apart from consistency requirements, the problem also has majority requirements.

As in other Lamport works, the analogy to generals and attack strategy makes the problem easily comprehensible.

Contributions:

Proof showing that the problem is unsolvable for basic case with fewer than 3m+1 generals when there are m traitors forms the basis for their proposed Oral Message(OM) Algorithm.
OM algorithm aids in forming a consensus among the loyal generals by using majority function to choose the appropriate action value.
In a more secure case, where messages can be signed with low probability of forgery, Signed Message(SM) algorithm is proposed, where consensus is arrived by passing around signed copies of original message and storing all different values obtained and using a choice function to make the decision. The fact that they moved from no forgery to low probability of forgery makes this solution practical and attractive. Understandably, this method has the ability to handle more traitors than OM algorithm.
Providing justification for the four assumptions while discussing practicability of their algorithm in a creating a reliable distributed system.

Unclear concept:

In SM(m) algorithm, how exactly should the value v be selected when set Vi contains multiple values (as shown in figure 5)? If median value is to be picked, does it mean that the all the lieutenants should re-arrange their set to a pre-determined ordering of the possible v values?
Also what is the guarantee that all possible values for v will be known? There could a possibility that a traitor randomly inserts some value.

Learning:

Assumption about perfect message delivery being justified by simply stating that a failed communication channel is the same as a failed processor - very simple and effective way of eliminating counter-arguments against a huge assumption.

Posted by: Meenakshi Syamkumar | October 13, 2014 05:15 PM

CS 739 Reviews - Fall 2014

The Byzantine Generals Problem

Comments

Post a comment