« The Byzantine Generals Problem | Main | Petal: Distributed virtual disks »

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance;Miguel Castro and Barbara Liskov,OSDI'99

Review this or Byzantine Generals Problem for Tuesday, October 14th.

Do not write about applicability of the system

Do write 1 (or more) thing you found confusing about the paper or did not understand, and 1 thing you learned from the paper.

Comments

Summary: This paper describes several algorithms which solve the Byzantine Generals problem under different conditions.

Problem: Coordinating a network of computers is a tricky problem unto itself. The Byzantine Generals problem takes it to the next step and asks: What if some of the machines cannot be trusted? Any messages they send cannot be taken at their word. This makes the task of coordinating the machines that much more difficult, since these nodes may not send messages or send the incorrect ones.

Contributions: The key take-away from this paper is that for sufficiently small number of “traitorous” machines, an algorithm can be developed which will lead to consensus among the nodes.  They conclude that if less than a third of the machines are traitorous, then their algorithm will lead to consensus among loyal machines.

The other important take aways are regarding the topology and security. For the former, even a graph in which the nodes are not completely connected, an algorithm will suffice to bring consensus. For the latter, if the machines can send unforgeable signed messages, then consensus can be found for any number of loyal and traitorous machines.

Confusing: In general, these algorithms are straightforward - upon receipt of a message, send it to other lieutenants. In that sense there was nothing confusing.

Learned: Aside from the obvious, I enjoyed the discussion in the final section on how these problems can arise in practice. They discussed how maybe these problems can be mitigated, but never rid of completely. For example, it was interesting to think that multiple processors can take the input from a wire and interpret it in different ways, even if the signal is the exact same.

Summary
This paper presents a practical replication algorithm to deal with Byzantine faults. The authors implemented their algorithm on NFS service, and evaluated the performance using Andrew benchmark and micro-benchmark, which showed there was only a small overhead than standard unreplicated NFS.

Problem Description
With a rapid growth of malicious attacks and software errors, some faulty nodes may exhibit Byzantine behavior. Therefore, Byzantine-fault-tolerant algorithms are important. Some previous designs are proved to be feasible theoretically, but they are too inefficient in practices. Other designs may need synchronization, which is hard to achieve. To overcome above mentioned problems, the authors developed a practical Byzantine-fault-tolerance algorithm without synchronization.

Contribution
1. The first contribution is they designed a state-machine protocol to provide Byzantine-fault-tolerance in asynchronous networks. The basic idea of the algorithm is not hard to understand, but it can do achieve safety and liveness.
2. In order to meet the practical requirement, they do some optimizations on their implementation, such as reducing the communication load and using message authentication codes rather than digital signatures in most messages. This two aspects improve their performance
3. To demonstrate their algorithm can work in reality, they implemented replication library using the proposed algorithm and used that library in a NFS service.

Confusions
1. I am not sure the assumption of independent node failures is good or not. They provided a lot of ways to achieve this assumption, but, from practical perspective, homogenous services may be easy to managed and less costly. If the system need to follow their assumption, I do not think the system can be scalable.
2. I also have some doubts about that their algorithm need to know the maximum number of replicas that may be faulty. But how can we know that in the implementation stage before the runtime?
3. Also, in their algorithm, the client needs to wait for (f+1) replies from different replicas with the same result, which is necessary for correctness. However, this also means the client can know the actual f value for the system. How about some malicious clients attack (f+1) replicas. So, I think all the core system parameters should be hidden from the client in order to protect the system.

Knowledge Learned
Although many solutions are proved to be feasible theoretically, they still need to be verified in reality, and optimized to improve performance. In this paper, the author not only implemented the basic version, but also provide some optimizations technique such as reducing communication and using MAC to improve performance.

Summary
The authors of the paper "Practical Byzantine Fault Tolerance". Their contribution was to develop a Byzantine fault tolerant consensus protocol that was efficient and applicable to realistic scenarios.

The basic idea of PBFT is to have the primary broadcast a request to all replicas, which then retransmit what they have heard to every other replica. If all replicas agree on the same operation, then the primary is currently correct, and the replicas broadcast a commit message to each other, much like a standard three-phase commit protocol. In fact, replicas will proceed once they have got 2f+1 identical replies in the first phase, which is a Byzantine quorum, any two of which will be guaranteed to contain at least one correct replica in common. The idea is to ensure that the primary can’t have two conflicting sequence numbers for a single request accepted.


Problems
Byzantine faults cause nodes in a distributed systems to act erratically. Previous work on providing fault tolerance either proposed solutions which are not practical or assumed syncrhony. The present paper
tries to provide a soultion which is practical, reasonably fast and works with asynchronous networks.

Contributions
1) First to propose an algorithm that works efficiently for asynchronous systems and is realized in 3f+1 lower bound.
2) As having consensus is not possible in asynchronous systems, they assume weak synchrony to guarantee liveliness.
3)Provided optimizations to the algorithms to achieve resonable performance.
4) Although initially assumed determinism in service, provide a way to overcome it.
5)Usage of MAC for message signing.

Unclear
It is unclear how nodes rejoin the system. The view changing assumes that primary(p1) is failed permanently. What will happen if it is just an intermittent failure? considering the fact that we are assuming an asynchronous network, this is not far fetching. Suppose a primary(p1) is involved in an intermittent network failure and starts running after the new primary multicasts its message giving out current view. The primary has no way of joining knowing the current view. In addition to that as messages are not accepted if both primary and replicas are not in same view. There is no way the faulty primary can move onto second view.

If this keeps on continuing size of system keeps on decreasing making it more prone to byzantine faults.

Learned
Byzantine failures are hard problems to solve. Till now there is no efficient solution for the problem. It is much more harder in asynchronous network due to the fact that consensus cannot be achieved in asyncrhonous network.

Summary:
The authors described an implementation of efficient (and practical) replicated NFS system that is Byzantine fault tolerant. They proposed an algorithm that enables the execution of arbitrary deterministic operations in a distributed system with no more than one third failures. They talked about few optimization approaches that
drastically improve their system and can be adopted to prior replicated systems. The authors has tested the performance overhead of the proposed system and concluded
that it has less than 3% overhead on Andrew micro benchmark.

Problems:
The main goal of this paper is to create a Byzantine fault tolerant distributed system. The prior work on this are mostly theoretical and falls apart due to practical
inefficiencies. Also many of them assume synchronous which makes those vulnerable to many malicious behaviors.

Contributions:

  • This is the first paper to propose a protocol for replicating state-machine in a Byzantine fault prone system and in more realistic asynchronous network.

  • They achieved more liveliness by utilizing asynchrony. As well as became tolerant to many faults arise due to lack of synchronization.
  • They proposed few
    optimization technique which are really smart and easily adaptable to other system. e.g. Send authenticated message digest instead of large messages.
    etc.
  • Easy view change in case of failure is also nice.
  • last but not the least they described the BFS system as well as shown the cost of replication system

Learnt:
The algorithm allows access revocation in a distributed manner (? if I got it correctly). Every client consistently monitor access revocation and this provides a
powerful mechanism to recover from attacks by faulty clients.

Confusing:
In the service properties they describe that "faulty clients cannot break" the invariants that "the service operations are designed to preserve". I don't understand
what they meant by this. And why cannot a completely compromised server and a malicious client break these invariants. I also could not understand completely their view change scenarios and how client is informed about those (esp. when primary changes).

Summary:
Building on Leslie Lamport et al.’s foundation in the formation of the Byzantine Fault Tolerance problem, this paper gives a practical solution while ensuring safety and liveness. An algorithm is given as well as an implementation. The implementation is able to manage a replicated NFS cluster and achieve comparable benchmarks to NFS without this replication.

Problems:
Byzantine faults can occur due to malicious attacks or even simple software or hardware errors. This can be hard to protect against in an efficient and asynchronous manner. Denial of service attacks need to be accounted for in such a system where non-faulty nodes can be delayed until considered faulty.

Contributions:
• Like Paxos, Raft, and Viewstamp Replication, the author’s created a replicated state-machine that works in an asynchronous network such as the Internet. Unlike traditional Paxos and others, it can handle Byzantine faults using message signing and authentication.
• Several optimizations were introduced to their algorithm to efficiently pass and authenticate signed messages using the now insecure MD5 algorithm.
• Extending NFS without writing kernel code (because they couldn’t find the source code) to allow for replication using their algorithm.
• Using three-phases, so as to handle byzantine faults instead of the traditional two phase commit methodology.

Unclear:
Was this paper’s algorithm actually better than the original Byzantine Fault Tolerance paper’s solution for signed messages in terms of the number of faults it could handle? This paper claims it can achieve liveness and safety with floor((n-1)/3) faulty nodes, however in Lamport’s paper it could tolerate m faulty nodes for any number of total nodes in the cluster.

Learned:
I learned how Byzantine Fault Tolerance could be useful even outside of malicious adversaries or hardware failures. Software upgrades that cause bugs won’t bring down a Byzantine Fault Tolerant system when slowly deployed and quickly reverted. As mentioned in the paper, independent errors popped up in their system, but it was able to continue running regardless.

Summary:
This paper describes an algorithm that enables the proper execution of arbitrary operations in a distributed system that can face byzantine failures. The paper also discusses the implementation of this algorithm and a replicated NFS like file system built using the library implementation of the byzantine fault tolerant algorithm. The paper also demonstrates the practicality of the system through a few selected benchmarks.

Problems:
This paper addresses building a system that is tolerant to byzantine failures which can be caused by software errors or malicious intent and are therefore harder to code against (in comparison to normal failure modes like network partition, etc.). The algorithm needs to ensure certain safety and liveness properties which the authors demonstrate as well.

Contributions:
- A state-machine replication protocol that handles and survives byzantine faults in a distributed system. The proofs of correctness of the algorithm and demonstration of liveness and safety guarantees.
- The practical implementation of the algorithm as a replicated library bundle of code which allows distributed and replicated systems (such as a file system) to be built using it.
- Optimizations to the algorithm so that it achieves an acceptable level of performance in its practical implementation (as compared against standard benchmarks).
- The byzantine fault tolerant distributed file system (BFS) built using the replicated library code.
- The experimental results demonstrating the feasibility and costs associated with using the replicated technique.

Learned:
The use of view changes to gracefully move the system away from a malfunctioning primary in the replicated system. This seemed like a very simple, albeit clever, technique to handle byzantine failure cases.

Confusing:
I was a bit confused (or bewildered) by the normal case operation. Does the traffic diagram Figure 1 depicts happen on every request (what constitutes a request? a single read?) from the client? It seems like an extraordinary amount of traffic overhead for one operation. I cannot imagine this would scale well, but perhaps this is because I am inherently misunderstanding the algorithm in some way.
My second point of confusion is in the view change (Section 4.4) part of the algorithm. It says a min-s and max-s are computed and all the pre-prepare messages are generated as a set O. Why does it pre compute all the pre-prepare messages for the new view? How can the system know how many messages will occur before a new view stars?

Summary:
This paper describes a new replication algorithm that is able to tolerate Byzantine faults and takes this algorithm into a Byzantine-fault-tolerant NFS service..

Problems:
1. Malicious attacks and software errors can cause faulty nodes to exhibit Byzantine behavior, Byzantine-fault-tolerant algorithms are increasingly important.
2. Most existing Byzantine-fault-tolerant algorithms have inefficient problems or have some strong assumptions of synchrony, like known bounds on message delays and process speeds.

Contributions:
1. The state machine replication algorithm. As the author states, this is the first state-machine replication protocol that correctly survives Byzantine faults in asynchronous networks. Except the functionality part, the author also proves briefly the correctness and liveness of the algorithm.
2. Optimizations on the provided algorithm. Specifically, their optimizations focus on reducing communication and cryptographic overhead.
3. The Byzantine-fault-tolerant file system. The traditional NFS file system is integrated with the author’s proposed Byzantine-fault-tolerant protocol and several replication servers.

Confusings:
It seems that the scenario that author concern focuses on only one client. It guarantees if less than one third of replicas are faulty, the client can get the correct answer. But, if it is extended to multi-clients, would this algorithm still behave correctly? Specifically, in the case of BFS, if multiple clients write to and read from the same file, will individual client get the correct responses?
Furthermore, can this Byzantine-fault-tolerant protocol be integrated with some kind of consistency model? In some model distributed system paper, like GFS, there is no mention about how to deal with Byzantine failures. Does it mean Byzantine failure is no more important in current distributed system design?

Things Learned:
The algorithm that is designed should be practical in the real world and can be used in the real world applications, like that the algorithm in this paper can be used in asynchronous network environment.

Summary: this paper applies the insights from the byzantine general
problem and builds a NFS filesystem with it.

Problem: when building systems, it is not sufficient to assume fail
stops in the failure model, but you also have to take byzantine failures into
account.
Contributions:


  • the artifact, BFS, was able to apply the insights from the byzantine general problem to a real-world application

  • checkpointing can be a effective way of making views more efficient. Instead of treating each view as a checkpoint, you can come up with one view to use as a checkpoint and revert back to it if necessary

  • authenticators, which are MAC vectors, can be used to reduce the overhead of cryptography.

Confusing: the paper boasts only a 3% loss in speed compared to AFS. Is this meaningful? Intuitively, I would not think that this scheme would take significantly longer than the standard AFS, because all the messages are sent in parallel. What about the other performance consequences of this scheme? What about network congestion? It may be comparably fast to AFS, but what about the other resources it's using?

Software errors: how prevalent are they? It seems like they are the main failure that systems like these are trying to address. How often do they occur, and at what cost?

Learned: using MACs instead of digital signatures. They are a lot faster to compute, but cannot be verified by a third party, which is not necessary.

Post a comment