Tracks/The Elector

The Elector

Advanced

Consensus|5 tasks

Learn the fundamentals of distributed consensus through leader election. You will implement node states, heartbeats, randomized timeouts, and vote handling, building the foundation for the full Raft protocol.

Subtracks & Tasks

Raft Leader Election

0/5

RA-1

intermediate

Implement Node States (Leader, Follower, Candidate)

Implement the three states from Raft: Leader, Follower, and Candidate. Each node starts as a Follower. Candidates request votes. Leaders coordinate th...

state machineleader electionRaft roles

RA-2

intermediate

Add Heartbeat Mechanism

Implement heartbeats from the leader to followers. The leader periodically sends AppendEntries (empty for now) to maintain authority. Followers that d...

heartbeatlivenessfailure detection

RA-3

intermediate

Implement Randomized Election Timeout

Add randomized election timeouts. When a follower does not hear from a leader within its timeout, it becomes a candidate. Randomization helps prevent ...

randomizationtimeoutsplit brain prevention

RA-4

advanced

Handle RequestVote RPC

Implement the RequestVote RPC. Candidates request votes from other nodes. Nodes grant their vote if the candidate's term is current and they have not ...

votingterm comparisonvote granting

RA-5

advanced

Prevent Split Votes Through Term Management

Handle split votes where no candidate receives a majority. Candidates increment their term and retry. Proper term management ensures the cluster event...

termsplit voteelection retry

Interview Prep

Common interview questions for Distributed Systems / Infrastructure Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

When a follower does not hear from a leader within its election timeout (randomized 150-300ms), it increments its term, becomes a candidate, and requests votes. A node grants a vote if: (1) candidate term > its current term, (2) it has not voted in this term yet, (3) the candidate's log is at least as up-to-date as its own. A candidate wins by getting votes from a majority (>N/2). Two leaders are prevented because majorities overlap — any two majorities of an N-node cluster share at least one node, which can only vote once per term.

Model Answer

The minority partition cannot make progress — it cannot form a quorum for any write. The majority partition elects a new leader if needed and continues serving writes. When the partition heals, the old leader (on the minority side) sees messages with the new, higher term and immediately steps down. Its uncommitted entries are overwritten by the new leader's log. Committed entries (replicated to majority) are safe — they will be in the new leader's log because the new leader must have gotten votes from a majority, which overlaps with the quorum that committed those entries.

Model Answer

Kubernetes becomes read-only. Existing workloads keep running (kubelet operates independently), but no new pods can be scheduled, no config changes can be made, and the API server returns 503 for writes. etcd requires a majority of nodes to be available for writes. With 3 etcd nodes, losing 2 = loss of quorum. With 5, you can lose 2. This is why production clusters run 5 etcd nodes. Recovery requires restoring from a backup or rebuilding the etcd cluster, followed by reconciliation of the API server state.

Model Answer

Randomized timeouts break symmetry to prevent split votes. If all nodes had the same timeout, they would all start elections simultaneously, split votes, and no one would win — indefinite livelock. With randomized timeouts (e.g., 150-300ms), the first node to time out starts an election. If it sends vote requests quickly, it wins before others time out. Split votes are possible but increasingly unlikely per round because retries also use new random timeouts.

Model Answer

A log entry at index i is committed when the leader has replicated it to a majority of nodes (including itself). The leader sends AppendEntries RPCs to all followers. When a majority respond with success, the leader advances its commitIndex to i. On the next heartbeat (or the next AppendEntries), followers learn about the new commitIndex and apply the entry to their state machine. Note: a leader cannot commit entries from previous terms by counting replicas — it can only commit them indirectly by committing a new entry from the current term that comes after them (the term commitment rule).

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

If all nodes use the same timeout, they all become candidates simultaneously, each voting for themselves, and nobody gets a majority.

The fix

Randomise the election timeout per node per election (e.g. 150-300 ms). The first node to time out becomes a candidate before others and usually wins.

Why it happens

The purpose of the heartbeat is to suppress follower elections. If the timer is not reset on each valid AppendEntries (even an empty one), the follower will time out.

The fix

Reset the election timer whenever you receive a valid RPC from the current leader — AppendEntries, heartbeats, or InstallSnapshot.

Why it happens

Raft's safety guarantee depends on the leader having all committed entries. A follower must refuse to vote for a candidate whose log is behind.

The fix

Before granting a vote, check that `candidate.lastLogTerm > voter.lastLogTerm`, or `candidate.lastLogTerm == voter.lastLogTerm && candidate.lastLogIndex >= voter.lastLogIndex`.

Why it happens

Raft requires these two fields to survive crashes. An in-memory node that restarts forgets it already voted.

The fix

Write `currentTerm` and `votedFor` to durable storage (the Maelstrom `lin-kv` service works as a stand-in) before sending any response that depends on them.

Why it happens

Once you see a term T > currentTerm, you must update currentTerm to T and clear votedFor before doing anything else — including sending a reply.

The fix

Make "update term and step down" the very first thing you do when receiving any RPC with a higher term, before processing the rest of the message.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	Raft	Paxos	Bully
Core idea	Randomised timeouts + term-based voting	Two-phase prepare/accept with ballot numbers	Highest-ID node always wins; lower nodes yield
Understandability	Designed for understandability	Notoriously hard to implement correctly	Simple to understand
Split-vote handling	Randomised timeouts avoid splits	Higher ballot preempts lower	Deterministic — no split possible
Election speed	One round-trip per candidate (usually fast)	Two round-trips minimum	O(n²) messages in worst case
Stable leader	Yes — leader sends heartbeats	Multi-Paxos adds leader leases	No — new election on every failure
Used in	etcd, CockroachDB, TiKV	Google Chubby, Zookeeper (ZAB)	MongoDB (older), academic examples

Verdict:Raft for new implementations — it is designed for understandability and has reference code. Paxos only if you are building on existing Paxos infrastructure.

Concepts Covered

state machineleader electionRaft rolesheartbeatlivenessfailure detectionrandomizationtimeoutsplit brain preventionvotingterm comparisonvote grantingtermsplit voteelection retry

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.