PlatON’s Giskard consensus protocol consists of PPoS (PlatON proof of stake), a probabilistic proof-of-stake consensus, and Giskard BFT (Giskard Byzantine Fault Tolerance). The PPoS selects the validators to participate in consensuses by staking, delegation, and random selection, while the Giskard BFT uses a BFT-like algorithm to implement block production and verification.
In this feature, we will delve into two technical issues with two articles. The first article will briefly introduce the PPoS consensus and BFT theory, and analyze the characteristics of the PBFT algorithm. The second article will focus on analyzing how Giskard BFT evolved by drawing on other consensus protocols such as PBFT, Tendermint, and Hotstuff.

[PlatON Tech-Column] PlatON Consensus Protocol: From PBFT to Giskard (1)

In essence, blockchain technology is inseparable from traditional distributed systems. The distributed consistency algorithm is a major problem of traditional distributed systems. After a long period of research and practice, there emerged many mature and secure algorithms such as paxos, draft, zab, etc.

Compared to traditional distributed systems, there is no assumption of centralization in public blockchains, and any node can join and freely access all the data, so, inevitably, there will be malicious nodes existing in public chains. Therefore, the consensus mechanism in the blockchain system needs to support not only CFT (Crash fault tolerance) but also BFT (Byzantine Fault Tolerance). BFT is a well-studied theory, of which PBFT is the most famous algorithm implementation, and is now widely used in major blockchain systems.

PlatON’s Giskard consensus protocol consists of PPoS (PlatON proof of stake), a probabilistic proof-of-stake consensus, and Giskard BFT (Giskard Byzantine Fault Tolerance). The PPoS selects the validators to participate in consensuses by staking, delegation, and random selection, while the Giskard BFT uses a BFT-like algorithm to implement block production and verification.

PPOS — Selection of Validator

Before going into the PPoS, let’s talk about PoS firstly. Currently, the PoS consensus schemes can be divided into four categories:

PoS Consensus Overview

Chain-Based

This is an early generation of PoS. The validator is pseudo-randomly selected for block production based on the number of tokens held. There is also the PoS+PoW scheme, in which PoW is generally used for the generation of blocks, and the PoS is adopted to select validators for verification. The Casper1.0 of Ethereum, as an intermediate scheme for its conversion from PoW to PoS, is also a hybrid scheme of PoS and PoW.

DPoS (Delegated Proof of Stake)

Delegated Proof-of-Stake Consensus. Each token holder can delegate his/her rights to a representative who will participate in the production and verification of blocks.

VRF (Verifiable Random Function)

Verifiable Random Function is used for the random selection of validators. This solution is currently used by Dfinity, Cardano, and Algorand, among others.

BFT (Byzantine Fault Tolerance)

Byzantine Fault Tolerance. After the validators are selected, the BFT protocol is used to reach a consensus through multiple rounds of voting. Currently, Tendermint, Stellar, Ontology, Zilliqa, NEO, etc., are all using this type of consensus algorithm.

The consensus scheme PlatON adopts is PPoS, which means probabilistic proof-of-stake consensus, and it is essentially a PoS consensus scheme that plots a binomial cumulative distribution function curve based on the nodes’ staking and randomly selects validators using VRF.

The key of the PPoS solution is that the selection of validators is not only related to the amount of the node staking, it is also randomized, which means that the selected validators are not necessarily the ones with the highest staking, and those with lower staking also has a certain probability of being selected. A randomized algorithm ensures that the result of selection is unpredictable, non-manipulable, fair, and reliable. The PPoS is essentially a combination of the PoS and VRF scheme.

In short, PPoS provides a scheme to select several validators from a large number of participating nodes as fairly and randomly as possible.

For the randomness of the validator selection, we have also conducted some research in the article “A Randomized Scheme for Validator Election in the PoS Consensus” here:

[PlatON Tech-Column] A randomized scheme for Validator Election in the PoS Consensus (1)

[PlatON Tech-Column] A randomized scheme for Validator Election in the PoS Consensus (2)

BFT — Block Consensus

After the validators are elected, the consensus protocol is put into action for block production and verification. The whole process requires the nodes to collaborate to mutually confirm the blocks and reach a unanimous conclusion to reach a block consensus.

As mentioned above, the consensus algorithm in the blockchain needs to consider Crash nodes and Byzantine nodes. What is a Byzantine node? It begins with a story.

The Byzantine Generals Problem

The Byzantine Roman Empire was so vast that, for defense purposes, each patch of land was garrisoned by a division under the command of a general. The divisions were far away from each other, and the only way for generals to communicate was by messengers.

At the time of war, all generals from various Byzantine divisions had to reach an unanimous consensus to decide whether there was a chance of winning before attacking the enemy. However, there might be traitors and enemy spies lurking in the divisions, confusing the generals and impeding them from reaching an unanimous consensus. If it was known that some generals were traitors, how would the remaining loyal generals reach an unanimous agreement? That is the Byzantine Generals Problem.

What the Byzantine Generals Problem describes, is that good generals have no idea whether other generals are good or bad, but the purpose of all good generals is: to act in unison and advance or retreat together. So, they need to agree on a unified strategy.

By far, I believe you have preliminarily understood the Byzantine nodes. Simply put, there are two types of error in blockchain systems:

The first type of error includes node crash, network failure, packet loss, etc. This type of error is not malicious and is a Non-Byzantine fault.
The second type of error means that the node may be malicious and does not follow the protocol rules. For example, the validator can delay or reject messages in the network, the leader can propose invalid blocks, or the node can send different messages to different peers. In the worst-case scenario, malicious nodes might conspire. These are collectively called Byzantine fault.

To solve this problem, we have to first introduce a concept — distributed network model. According to the theory of distributed systems, the network models of distributed systems are divided into three categories:

Synchronous network model: message sent by a node will definitely reach the target node in a determined time
Asynchronous network model: message sent by a node is not guaranteed to reach the target node
Partial synchronous network model: message sent by the node, despite delays, will eventually reach the target node

A very important prerequisite for the solution of the Byzantine Generals Problem is that the communication channel must be reliable. If the channel is not guaranteed to be reliable, then the Byzantine problem has no solution. This is also known as the FLP Impossibility, i.e., a consensus algorithm can’t satisfy both safety and liveness under the assumptions of the asynchronous network model, that is to say, it is impractical, or very difficult, to attempt to reach an agreement by communicating over an unreliable communication link. As for what safety and liveness is, we will cover them later.

Actually, the Byzantine Generals Problem was first raised by Leslie Lamport in his 1982 paper The Byzantine Generals Problem, he proved that loyal generals can agree on orders when the total number of generals is greater than 3f, and the traitors are for less, i.e., 3f+1 <= n. In 1999, Miguel Castro and Barbara Liskov first proposed the PBFT algorithm in their paper “Practical Byzantine Fault Tolerance”, which also satisfies the fault tolerance number of 3f+1<=n.

BFT is a well-studied theory revealing the fact that, based on the assumptions of the partial synchronous network model, consistency can be achieved among non-Byzantine nodes in the case of no more than one-third of the total nodes are faulty and malicious. PBFT, short for Practical Byzantine Fault Tolerance, is the best-known algorithm implementation. Currently, the consensus algorithms of most blockchains are implementations based on BFT. Giskard BFT is also evolved from PBFT.

Application of the PBFT algorithm in blockchain consensus

The PBFT algorithm is widely used for reaching consensus in various blockchain systems, which not only solves the Byzantine node problem that may occur in the consensus process, but also enables the system to always maintain two properties: safety and liveness.

Safety: Even if the two types of error occur, i.e., Crash nodes and Byzantine nodes, the consensus system cannot produce incorrect results. In the blockchain, they refer to no double-spending or forking.
Liveness: the system can continuously generate commits, which, in the blockchain, means that consensus will continue and not get stuck. If the consensus of a blockchain system is not sustainable, then the system cannot respond to new transaction requests from clients, i.e., liveness is not satisfied.

We can take the application of the PBFT algorithm in the blockchain consensus as an example to summarize the core flow of the algorithm:

Consensus process of PBFT

As can be seen from the above figure, PBFT is a typical three-phase submission algorithm:

Pre-Prepare: the nodes receive and execute blocks, as well as generate block voting signatures, and start to broadcast signatures to all block producers
Prepare: the nodes collect signatures. A node will be ready for committing blocks once it has collected 2*f signatures, and then it will start broadcasting the Commit packet
Commit: the nodes start to collect Commit packets, and after a node collects all the 2*f+1 Commit packets, it directly commits the latest locally cached blocks to the database

When seeing this, you may have the following questions:

Why the number of signatures required varies from phase to phase

For the phases of Prepare and Commit, let’s consider the worst scenario: assume that we receive f signatures from normal nodes and also f from malicious nodes, then the 2 * f + 1 signature can only come from normal nodes (since we limit the maximum number of malicious nodes to f). It can be seen that the system will keep functioning with normal nodes taking up the “majority”. So the parameter 2*f+1 and the requirement n>=3f+1 are logically self-consistent. In the Prepare phase, any message sent by Node 0 can be considered as a confirmation message, so the Prepare phase only needs to collect 2*f signatures.

Why consistency cannot be realized with messages from just two phases

It is not possible to realize consistency with only two messages from the Pre-Prepare and Prepare phases. For example, suppose there is no Commit phase, Node 1 would be ready for entering the Prepared phase after collecting 2*f of signatures in the Prepare phase, however, the Prepared here is only a local perspective of Node 1 and is not globally consistent. Node 1 cannot guarantee that all the remaining nodes are ready for the Prepared phase. If less than f non-Byzantine nodes entered the Prepared state, and Node 1 confirms the message, then the system is inconsistent.

Why consistency can be realized with messages from three phases

For now, it should be easy to understand why consistency can be realized with messages from three phases. A node collected 2*f+1 Commit packets means that f+1 non-Byzantine nodes have reached the Prepared status, which means that “most” nodes have agreed on the message.

Next, let’s talk about the view of PBFT and the process of view change.

PBFT consensus algorithm uses the view to record the consensus state of each node, and the same view nodes maintain the same list of Leader and Replicas nodes. When the Leader fails, the view will change to ensure protocol liveness, and if the view change is successful (at least 2*f+1 nodes reach the same view), a new leader is elected based on the new view.

Let’s take a look at the flow chart of the view change:

The view-change flow of the PBFT can also be divided into three stages:

view-change: the replica nodes (Replica) will send a view-change message to other nodes when they think that the primary node (Primary) is problematic
view-change-ack: After all the nodes received 2*f+1 view-change messages, the node with the smallest node number will be elected as the new primary node and the view-change-ack messages will be sent to that node
new-view: When the new primary node receives view-change-ack messages from 2*f+1 other nodes, it will broadcast the new-view messages to the other nodes. Note: secondary nodes do not initiate new-view events

Through the analysis of the consensus process, we believe that the view change flow is easy to understand and needs no further elaboration. In the next article, we will focus on the problems of the PBFT algorithm and the improvements made by the Giskard BFT.

In this issue of Tech-Column, we briefly introduced the development of the PoS consensus scheme, analyzed the PPoS consensus and BFT theory, and delved into the consensus process and the view change flow of PBFT with the application of the PBFT algorithm in the blockchain consensus as an example. In the next issue, we will continue to elaborate on the optimization ideas of the Giskard BFT consensus based on these technical backgrounds. Stay tuned.

Publisher：PlatONWorld，Please indicate the source for forwarding：https://platonworld.org/?p=3562