Consensus algorithms are at the core of distributed systems. How do you manage consistency across multiple servers or nodes?
The Raft Consensus Algorithm is a distributed system protocol that’s widely used (including by systems like Kubernetes, via etcd). It is equivalent in fault tolerance and consistency guarantees to Paxos, which is often seen as a more complex approach.
Here’s a simplification of the algorithm:
Overall design: Elect a leader among the servers, which is responsible for managing replication and ensuring that all followers have the same data. If the leader fails, the system elects a new leader.
Some of the steps (simplified)
Initialization:
All nodes start as followers
Elect a Leader
If a follower does not receive a heartbeat message from the leader within a certain time period, it becomes a candidate for leadership.
The candidate votes for itself and asks all other nodes for votes
A candidate becomes the leader if it gets a majority of votes
Replicate Logs
The leader accepts commands from clients and appends them to its log
It sends the logs to all followers
When the majority acknowledges the entry, the leader applies it to its own state machine and informs the clients.
Each step has many other nuances, but this is a very high-level description of the algorithm. The Raft paper is the best place for more information. And the etcd raft implementation is a good starting point if you’re more comfortable looking through the code.
Some other systems that use Raft.
CockroachDB
ClickHouse
MongoDB
Etcd
Hey, MongoDB created that AFAIK. I implemented it for replica sets several years before the RAFT paper. Too bad we didn't write it up, I guess :-P