ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/elector
Tracks/The Elector
05

The Elector

Advanced
Consensus|5 tasks

Learn the fundamentals of distributed consensus through leader election. You will implement node states, heartbeats, randomized timeouts, and vote handling, building the foundation for the full Raft protocol.

Subtracks & Tasks

Interview Prep

Common interview questions for Distributed Systems / Infrastructure Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

When a follower does not hear from a leader within its election timeout (randomized 150-300ms), it increments its term, becomes a candidate, and requests votes. A node grants a vote if: (1) candidate term > its current term, (2) it has not voted in this term yet, (3) the candidate's log is at least as up-to-date as its own. A candidate wins by getting votes from a majority (>N/2). Two leaders are prevented because majorities overlap — any two majorities of an N-node cluster share at least one node, which can only vote once per term.

Model Answer

The minority partition cannot make progress — it cannot form a quorum for any write. The majority partition elects a new leader if needed and continues serving writes. When the partition heals, the old leader (on the minority side) sees messages with the new, higher term and immediately steps down. Its uncommitted entries are overwritten by the new leader's log. Committed entries (replicated to majority) are safe — they will be in the new leader's log because the new leader must have gotten votes from a majority, which overlaps with the quorum that committed those entries.

Model Answer

Kubernetes becomes read-only. Existing workloads keep running (kubelet operates independently), but no new pods can be scheduled, no config changes can be made, and the API server returns 503 for writes. etcd requires a majority of nodes to be available for writes. With 3 etcd nodes, losing 2 = loss of quorum. With 5, you can lose 2. This is why production clusters run 5 etcd nodes. Recovery requires restoring from a backup or rebuilding the etcd cluster, followed by reconciliation of the API server state.

Model Answer

Randomized timeouts break symmetry to prevent split votes. If all nodes had the same timeout, they would all start elections simultaneously, split votes, and no one would win — indefinite livelock. With randomized timeouts (e.g., 150-300ms), the first node to time out starts an election. If it sends vote requests quickly, it wins before others time out. Split votes are possible but increasingly unlikely per round because retries also use new random timeouts.

Model Answer

A log entry at index i is committed when the leader has replicated it to a majority of nodes (including itself). The leader sends AppendEntries RPCs to all followers. When a majority respond with success, the leader advances its commitIndex to i. On the next heartbeat (or the next AppendEntries), followers learn about the new commitIndex and apply the entry to their state machine. Note: a leader cannot commit entries from previous terms by counting replicas — it can only commit them indirectly by committing a new entry from the current term that comes after them (the term commitment rule).

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

If all nodes use the same timeout, they all become candidates simultaneously, each voting for themselves, and nobody gets a majority.

The fix

Randomise the election timeout per node per election (e.g. 150-300 ms). The first node to time out becomes a candidate before others and usually wins.

Why it happens

The purpose of the heartbeat is to suppress follower elections. If the timer is not reset on each valid AppendEntries (even an empty one), the follower will time out.

The fix

Reset the election timer whenever you receive a valid RPC from the current leader — AppendEntries, heartbeats, or InstallSnapshot.

Why it happens

Raft's safety guarantee depends on the leader having all committed entries. A follower must refuse to vote for a candidate whose log is behind.

The fix

Before granting a vote, check that `candidate.lastLogTerm > voter.lastLogTerm`, or `candidate.lastLogTerm == voter.lastLogTerm && candidate.lastLogIndex >= voter.lastLogIndex`.

Why it happens

Raft requires these two fields to survive crashes. An in-memory node that restarts forgets it already voted.

The fix

Write `currentTerm` and `votedFor` to durable storage (the Maelstrom `lin-kv` service works as a stand-in) before sending any response that depends on them.

Why it happens

Once you see a term T > currentTerm, you must update currentTerm to T and clear votedFor before doing anything else — including sending a reply.

The fix

Make "update term and step down" the very first thing you do when receiving any RPC with a higher term, before processing the rest of the message.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

DimensionRaftPaxosBully
Core ideaRandomised timeouts + term-based votingTwo-phase prepare/accept with ballot numbersHighest-ID node always wins; lower nodes yield
UnderstandabilityDesigned for understandabilityNotoriously hard to implement correctlySimple to understand
Split-vote handlingRandomised timeouts avoid splitsHigher ballot preempts lowerDeterministic — no split possible
Election speedOne round-trip per candidate (usually fast)Two round-trips minimumO(n²) messages in worst case
Stable leaderYes — leader sends heartbeatsMulti-Paxos adds leader leasesNo — new election on every failure
Used inetcd, CockroachDB, TiKVGoogle Chubby, Zookeeper (ZAB)MongoDB (older), academic examples
Verdict:Raft for new implementations — it is designed for understandability and has reference code. Paxos only if you are building on existing Paxos infrastructure.

Concepts Covered

state machineleader electionRaft rolesheartbeatlivenessfailure detectionrandomizationtimeoutsplit brain preventionvotingterm comparisonvote grantingtermsplit voteelection retry

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.