Understanding the Byzantine Fault Tolerant (BFT) Consensus Protocol

Understanding the Byzantine Fault Tolerant (BFT) Consensus Protocol

The Byzantine fault tolerance protocol has its roots in a very old problem encountered by computer scientists. To understand the protocol it is critical to understand the problem it is trying to tackle as the corresponding solution forms the basis of how blockchain works.

The two general problem

Let’s first understand what is known as ‘The two generals problem’. The problem was first published in 1975 and got its current name three years later. The problem is based on a military problem, where two generals intend to simultaneously attack their enemy. 

It can be visualised in the below image where the enemy lies in the valley between the two hills with the two attacking armies stationed on either side. 

Now each army on its own lacks the strength to defeat the enemy. If either army attacks on its own, it is sure to face defeat with the remaining army facing a certain defeat later. On the other hand, if both attack together, their combined strength is sufficient to defeat the enemy.

The key here is attacking at the same time. However, the two armies have no means of communication except for sending a messenger. The geography of the hills means that the messenger can be intercepted by the enemy and can either be killed or replaced by an impostor who would intentionally deliver a wrong message to the other army. 

The enemy’s intention would obviously be to create a situation where both armies attack at different times. Based on various mathematical simulations and research by the authors of the paper they came to the conclusion that this problem is unsolvable. This means there is no way of communication that can ensure that the two armies attack at the same time.

Byzantine general problem

The ‘Byzantine General Problem’ is a modification to ‘the two generals problem’, making it a bit more complicated but also yielding practical results. 

In this case, instead of two generals, there is one commander who then has several lieutenants at his disposal. The commander will communicate to his lieutenants when to attack. Now another modification is that one or more of these lieutenants can be traitors, or the commander himself can be the traitor. 

The problem remains the same – attacking the enemy at the same time. Another acceptable solution is to not attack at all rather than risking attacking at different times. This is based on the maxim “live today to fight tomorrow.”

On analyzing this problem, researchers came to the conclusion that this setup would work only if at most one-third of the participants are traitors. Let us see how that works out by using a setup where there are three lieutenants and one commander. 

As can be seen below, let us say that lieutenant three, in this case, is a traitor. We design a communication system such that in addition to the commander informing the lieutenants on the time of the attack they will also communicate amongst each other for confirmation. So each lieutenant will receive three messages, one from the commander and two from the other two lieutenants. 

For the traitor, the objective would be to send out an incorrect timing. As can be seen in the figure, let us say the original message was ‘v’, and a traitor changes it to ‘x’. So, now the loyal lieutenant would each receive 2 ‘v’ messages and 1 ‘x’ message. They can have a pre-decided protocol where if there is a clear majority of the messages they would choose to execute it. In this case, despite there being a traitor lieutenant, the loyal lieutenants can agree upon a time to stage their attack.

Now let us take the case where the commander himself is a traitor. In that case, he will relay three different messages to each lieutenant. As seen below, each would end up receiving different instructions and so there will be no majority messages. For such a situation, they can have another predefined protocol. So, when there is no majority, they simply choose to retreat rather than risk attacking at different times.

In either case, as long as the number of participants who are traitors remains less than one-third, there exists a protocol to reach a consensus. This is called the Byzantine Fault Tolerance (BFT). It implies that the system can tolerate faults till a certain number of participants generate a correct signal.

How does this apply to blockchain? Blockchain is a decentralized network. This means no single entity verifies the transaction, but rather, the entire network arrives at a consensus on whether a transaction is valid or not. 

But what if some participants in the network are malicious? In that case, they can take over the network by validating wrong transactions. However, as we saw above, by using the Byzantine Fault Tolerance, we can have a system that can fend off fraudsters or even incorrect signals as long as this forms less than one-third part of the system.

This is really critical to the workability of blockchain and hence cryptocurrencies. Unlike the conventional financial payment systems, like Visa, blockchain doesn’t have a central authority to validate transactions so, if the system fails once, it will fail permanently. 

In fact, in the very first white paper on bitcoin written by Satoshi Nakamoto, the BFT protocol was critical to creating the proof of work concept. BFT is also used in large and critical systems like airplane engines and even nuclear reactors. 

Summary

The Byzantine Fault Tolerance is essentially a protocol useful wherever the decisions depend on the inputs of a large set of sensors. Several algorithms have been designed solely with the purpose of achieving Byzantine Fault Tolerance. These algorithms are then extensively used by some cryptocurrencies with further underlying protocols.