Understanding Raft: Consensus Algorithm for Distributed Systems
In an increasingly connected world, distributed systems play a crucial role in ensuring the availability, reliability, and integrity of services. However, coordinating and maintaining consistency among multiple nodes in a distributed environment is a complex task. This is where Raft comes in, a consensus algorithm designed to simplify and make the process of maintaining consistency among nodes in a distributed system more understandable.
The vast majority of distributed services that form a "mini cluster" use some form of coordination. The most common is to use Zookeeper, which is being replaced by Raft.
Tools like Vault, Kafka, Consul, ETCD, Splunk, RabbitMQ, databases, and many others already use Raft.
What is Raft?​
Raft is a distributed consensus algorithm, designed to facilitate consistent data replication among nodes in a distributed system.
Raft simplifies the consensus process by dividing it into three main parts:
- Leader Election: One of the nodes in the system is elected as the leader, being responsible for coordinating and controlling write operations in the system.
- Log Replication: All changes to the system state are recorded in a log replicated across all nodes. The leader is responsible for replicating these logs and ensuring all nodes are up to date.
- Election Safety: Raft uses an election process to ensure only one leader is elected at any time. If the leader fails, a new leader is elected to replace it.
A much more in-depth reading on the subject can be found at https://raft.github.io/raft.pdf.
Election Process​
Initially, all nodes are in an election state. When a node realizes there is no active leader, it initiates an election. The election process involves several stages:
- Candidacy: The candidate node requests votes from other nodes. A node can only vote for a candidate if it has not yet voted for another candidate in the same election.
- Voting: If a node has not yet voted for a candidate in this election and considers the candidate legitimate, it grants its vote to the candidate.
- Leader Election: A candidate wins the election if it receives votes from the
majorityof nodes. Once elected, it becomes the leader and begins log replication.
Go to https://raft.github.io/ and simulate node stoppages and observe what happens, how an election occurs if the leader node stops. You can click on servers and do a stop or a timeout simulation.


Suggestions:
- Stop the leader node and observe a new vote happening. See that all nodes try to be the leader, but the one that promoted itself first becomes the leader.
- This cluster has 5 nodes and is tolerant to two failures. Stop at least 2 nodes and see the result.
- Stop the third node (the leader) so you have only 2 nodes without a leader and see what happens in the vote.
Quorum​
In Raft, a decision is only considered valid if the majority of nodes agree with it. This concept is known as Quorum. Quorum is essential to ensure write operations are safe and the system remains consistent, even in case of node failures.
For example, if a Raft cluster has 5 nodes, a write operation will only be considered valid if at least 3 nodes (the majority) agree with it. This means that even if one node fails or is inaccessible, the system can still continue operating as long as the majority of nodes are functioning correctly.
The formula for quorum size is QuorumSize=(NumNodes/2)+1.
To check the number of possible failures we have the formula FailureTolerance = NumNodes - QuorumSize or FailureTolerance = NumNodes - (NumNodes/2)+1.
| Number of Nodes | Quorum Size | Failure Tolerance |
|---|---|---|
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
| 6 | 4 | 2 |
| 7 | 4 | 3 |
We can observe that the minimum necessary to have one failure is 3. To have two failures is 5 and to have three failures is 7. Number of nodes like 2, 4, and 6 can exist, but are not fully utilized.
The larger the number of nodes, the lower the performance, since the majority's approval is required. A simple system would use 3 nodes, a more complex system 5, and a critical system 7. Above that, performance starts to decrease considerably.
It's worth taking a look at how Raft works explained visually at https://thesecretlivesofdata.com/raft/