Tracks/The Consensus

The Consensus

Advanced

Consensus|20 tasks

Complete your Raft implementation with log replication. You will build log matching, commitment tracking, and apply entries to state machines, creating a fully functional consensus layer.

Subtracks & Tasks

Raft Log Replication

0/5

RA-1

advanced

Implement Log Replication

Implement Raft log replication from leader to followers: 1. Leader receives client commands, appends to local log 2. Leader sends AppendEntries to al...

logreplicationAppendEntries

RA-2

advanced

Ensure Log Matching Property

Implement the Log Matching safety property: On the follower side: 1. Receive AppendEntries with prevLogIndex, prevLogTerm 2. If entry at prevLogIndex...

log matchingconsistency checkconflict resolution

RA-3

advanced

Implement Entry Commitment

Implement log entry commitment: 1. Leader tracks matchIndex for each follower 2. For each index N, count how many nodes have matchIndex >= N 3. If ma...

commitmentmajorityquorum

RA-4

intermediate

Apply Committed Entries to State Machine

Apply committed log entries to the state machine: 1. Track lastApplied - highest entry applied to state machine 2. When commitIndex > lastApplied, ap...

state machineapplydeterminism

RA-5

advanced

Implement Election Restriction for Safety

Implement the Election Restriction to ensure safety: A candidate must have an "up-to-date" log to win election: 1. Voter compares own log to candidat...

election restrictionsafetyup-to-date

Commitment and Application

0/5

CO-1

intermediate

Implement the Raft Commitment Rule

Implement the Raft commitment rule: an entry is committed when a majority of nodes have it in their log. The leader uses `matchIndex[]` to determine w...

commitmentmajority replicationcommitIndex+1 more

CO-2

intermediate

Implement the Apply Channel for State Machine

Implement an apply channel that feeds committed log entries to the state machine. The state machine is a simple key-value store that processes `Put`, ...

apply channelstate machinecommitted entries+1 more

CO-3

advanced

Handle Leader Changes with No-Op on Election

When a new leader is elected, it must not apply uncommitted entries from previous terms. The "no-op on election" trick solves this: the new leader imm...

no-op entryleader changeuncommitted entries+1 more

CO-4

advanced

Add Snapshot Support for Log Compaction

Add snapshot support to compact the Raft log. When the log grows beyond a threshold, take a snapshot of the state machine. Followers that fall behind ...

snapshotlog compactionInstallSnapshot RPC+1 more

CO-5

advanced

Pass Linearizable KV with Network Partitions

The ultimate test: pass a linearizable key-value workload with 5 nodes under network partitions. This combines everything: leader election, log replic...

linearizabilitynetwork partitionMaelstrom+1 more

Paxos

0/5

PA-1

advanced

Implement Single-Decree Paxos Phase 1 (Prepare/Promise)

Implement Phase 1 of single-decree Paxos (agree on one value). Phase 1 (Prepare/Promise): 1. Proposer selects a unique proposal number `n` 2. Propose...

Paxossingle-decreePrepare+2 more

PA-2

advanced

Implement Paxos Phase 2 (Accept/Accepted)

Implement Phase 2 of Paxos. After getting a majority of promises in Phase 1, the proposer sends `Accept(n, v)` to acceptors. Value selection rule: if...

PaxosAcceptAccepted+2 more

PA-3

advanced

Prove Paxos Safety: Chosen Values Are Immutable

Prove that once a value is chosen in Paxos, all future proposals will also choose the same value. This is the core safety property. Implement a simul...

safety proofinvariantchosen value+1 more

PA-4

advanced

Implement Multi-Paxos for an Infinite Log

Extend single-decree Paxos to Multi-Paxos: an infinite log where each slot is a separate Paxos instance. Key optimization: once leadership is establi...

Multi-Paxosinfinite logPhase 1 skip+1 more

PA-5

intermediate

Compare Raft vs Multi-Paxos

Compare Raft and Multi-Paxos across multiple dimensions. Both solve the same problem (replicated log) but make different design tradeoffs. ```json Re...

RaftMulti-Paxosmessage complexity+2 more

Byzantine Fault Tolerance

0/5

BY-1

intermediate

Understand Byzantine Faults with Real-World Examples

What is a Byzantine fault? Unlike crash faults (node simply stops), Byzantine faults allow arbitrary misbehavior. A faulty node can lie, send contradi...

Byzantine faultcrash faultmalicious node+2 more

BY-2

advanced

Implement Simplified PBFT with 4 Nodes

Implement a simplified PBFT (Practical Byzantine Fault Tolerance) with 4 nodes (f=1 Byzantine fault). PBFT Three-Phase Protocol: 1. **Pre-prepare**: ...

PBFTpre-prepareprepare+2 more

BY-3

advanced

Detect and Handle Equivocation Attacks

Simulate a Byzantine node that sends contradictory messages to different peers (equivocation). Show that PBFT correctly handles this. ```json Request...

equivocationcontradictory messagesevidence collection+1 more

BY-4

advanced

Prove the N >= 3f+1 Byzantine Fault Threshold

Prove that tolerating f Byzantine faults requires at least N >= 3f+1 nodes. Then verify empirically. The proof: - We need two quorums to overlap in a...

fault threshold3f+1impossibility result+1 more

BY-5

advanced

Implement Tendermint-Style BFT Voting Rounds

Research and implement Tendermint-style BFT voting (used in Cosmos blockchain). Tendermint simplifies PBFT with clear round structure and a rotating p...

TendermintCosmosblockchain+2 more

Interview Prep

Common interview questions for Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

FLP (Fischer, Lynch, Paterson 1985) proves that no deterministic algorithm can solve consensus in an asynchronous system if even one process can fail. "Asynchronous" means no upper bound on message delay. In practice, all real systems have timing assumptions (TCP timeouts, election timeouts), making them partially synchronous. Raft and Paxos work in the partially synchronous model — they may block during asynchronous periods but are always safe and make progress when synchrony resumes. FLP is a theoretical bound, not a practical barrier.

Model Answer

Paxos defines single-value consensus (single-decree Paxos). Multi-Paxos for a replicated log requires engineering decisions that are not specified in the original paper: how to pipeline proposals, how to handle leader changes, how to handle gaps in the log. Ongaro & Ousterhout found that real Paxos implementations (Chubby, Zookeeper ZAB) diverged significantly from the paper and from each other. Raft explicitly specifies leader election, log replication, and membership changes as a single coherent algorithm, making the design space tractable.

Model Answer

Use a Raft-replicated state machine where the lock state is a key-value entry. Acquire: propose "set lock = clientId" if lock is unset. Release: propose "set lock = nil" if lock == clientId. Add a TTL/lease to handle client crashes. Failure modes: (1) client crash while holding lock — use a lease that expires, (2) network partition — the leader might think it has the lock but is partitioned from the cluster. Use lease-based reads: the leader only serves reads while its lease is valid, which requires clock synchrony assumptions. HashiCorp Consul's session system implements exactly this.

Model Answer

A stale leader (on the minority side of a partition) still believes it is leader because it has not heard about the new higher-term election. Both the stale leader and new leader exist simultaneously. This is safe: the stale leader cannot commit any writes because it cannot reach a majority. Any write it tries will be rejected when the partition heals. The key safety property is that a stale leader's uncommitted writes are overwritten — only committed writes (replicated to majority) survive.

Model Answer

Linearizability: every operation appears to take effect instantaneously at a single point between its invocation and completion. A read always returns the result of the most recent write that completed before the read started. This matches single-machine memory semantics. Eventual consistency: if no new writes are made, all replicas will eventually converge to the same value. There is no guarantee about what a read returns during the convergence period. Tradeoff: linearizability requires coordination (quorum reads or leader reads), adding latency. Eventual consistency can serve reads locally, with lower latency but potentially stale data.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

An entry is only safe to apply once it has been replicated to a majority. Applying earlier can apply entries that the cluster will never agree on.

The fix

Track `commitIndex` separately from `lastLogIndex`. Only advance the state machine to `commitIndex`, which the leader advances only after receiving acknowledgements from a majority.

Why it happens

The consistency check requires the follower to have the entry immediately before the new batch. Sending the wrong index or term causes spurious rejections.

The fix

Maintain a `nextIndex[]` per follower. On rejection, decrement `nextIndex[follower]` and retry with the corrected `prevLogIndex` and `prevLogTerm`.

Why it happens

Without snapshots and log truncation, every write is kept forever.

The fix

Implement snapshotting: periodically write the full state machine state and the index/term of the last included entry, then truncate the log prefix up to that point.

Why it happens

RPCs can take hundreds of milliseconds or time out. Holding a lock during an RPC blocks all other message processing.

The fix

Release the mutex before sending RPCs. Collect the work to do while holding the lock, then release and do the network I/O.

Why it happens

A partitioned node is still reachable but not responding. Blocking on it prevents the leader from advancing `commitIndex` for the healthy majority.

The fix

Use async / parallel AppendEntries. The leader should send to all followers concurrently and advance `commitIndex` as soon as it counts a majority of acknowledgements, without waiting for slow or unreachable followers.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	Raft	Multi-Paxos	Viewstamped Replication
First published	2014	1989 (Paxos), extended later	1988
Log replication	Leader-driven, term-based	Leader (distinguished proposer) + prepare once per leader epoch	Primary-driven, view-stamped ops
Leader election	Randomised timeouts, majority vote	Phase 1 Paxos (any node can propose)	View change protocol, majority vote
Reconfiguration	Joint consensus or single-server changes	Requires new Paxos round	View change protocol
Specification clarity	Formal TLA+ spec, extended Raft paper	Informal; many incompatible variants	Revisited 2012 paper is clear
Production use	etcd, CockroachDB, TiKV, Consul	Google Chubby, Spanner	Academic; influenced PBFT

Verdict:Raft is the go-to for new systems. Multi-Paxos is entrenched in Google's stack. VR is worth studying for its clarity on view changes and recovery.

Dimension	Linearisable	Sequential	Eventual
Ordering guarantee	Real-time order respected — if op A finishes before op B starts, A is visible before B	All nodes see the same order, but not necessarily real-time	No ordering guarantee — any order is valid temporarily
Read-your-writes	Yes	Yes (within a session)	Not guaranteed
Latency	High (requires quorum / leader round-trip)	Medium	Low (local reads/writes)
Availability under partition	No (CAP: CP)	Partial	Yes (CAP: AP)
Use cases	Locks, leader election, financial txns	Collaborative editing, gaming	DNS, social media likes, caches
Examples	etcd, ZooKeeper, Spanner	Some multi-leader DB modes	Cassandra, DynamoDB (default), S3

Verdict:Linearisability is the gold standard for correctness but costs availability. Use eventual consistency only when you can handle stale reads and resolve conflicts.

Concepts Covered

logreplicationAppendEntrieslog matchingconsistency checkconflict resolutioncommitmentmajorityquorumstate machineapplydeterminismelection restrictionsafetyup-to-datemajority replicationcommitIndexlog replicationapply channelcommitted entriesdeterministic replayno-op entryleader changeuncommitted entriessnapshotlog compactionInstallSnapshot RPCstate transferlinearizabilitynetwork partitionMaelstromend-to-end correctnessPaxossingle-decreePreparePromiseproposal numberAcceptAcceptedvalue selectionconsensussafety proofinvariantchosen valueconsensus immutabilityMulti-Paxosinfinite logPhase 1 skipstable leaderRaftmessage complexityleader change costunderstandabilityByzantine faultcrash faultmalicious nodebit flipCFT vs BFTPBFTpre-preparepreparecommitthree-phase protocolequivocationcontradictory messagesevidence collectionByzantine detectionfault threshold3f+1impossibility resultByzantine quorumTendermintCosmosblockchainvoting roundslock mechanism

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.