Subtracks & Tasks
Linearizable Key-Value Store
Implement Key-Value Interface
Implement the key-value store interface on top of Raft: 1. GET(key) - Return current value or null 2. PUT(key, value) - Set key to value 3. CAS(key, ...
Handle Client Request Routing
Route client requests to the leader: 1. Client sends request to any node 2. If node is leader, process request 3. If not leader, return redirect with...
Ensure Read Consistency
Implement linearizable reads: Option 1: Log reads (simple but slow) - Treat reads as log entries, wait for commit Option 2: ReadIndex (Raft optimiza...
Handle Client Retry and Deduplication
Handle client retries without duplicate execution: 1. Client assigns sequence number to each request 2. Server tracks (client_id -> latest_seq, respo...
Implement Log Compaction with Snapshots
Implement log compaction with snapshots: 1. Periodically snapshot state machine state 2. Record snapshot index and term 3. Discard log entries before...
Read Optimization
Implement Read Index for Linearizable Reads
Implement the "read index" optimization: the leader records the current commit index, confirms it still leads via heartbeats, then serves the read. Li...
Implement Lease-Based Reads
Implement lease-based reads: the leader uses its active lease to serve reads without network round trips. Document the clock assumption required. ```...
Add Follower Reads with Bounded Staleness
Implement follower reads with bounded staleness. Clients can opt to read from any follower if they accept data up to T seconds stale. This scales read...
Guarantee Read-Your-Writes with Follower Reads
Ensure read-your-writes consistency even when using follower reads. Clients send their `last_write_index` with each read. Followers only serve if they...
Benchmark Read Strategies Under Mixed Workload
Benchmark three read strategies under an 80% read / 20% write workload. Measure throughput and latency for each. ```json Request: {"type": "read_ben...
Transactions on Raft
Implement Multi-Key Transactions as Atomic Log Entries
Implement multi-key transactions. A transaction is a batch of operations committed as a single log entry for atomicity. ```json Request: {"type": "t...
Implement Optimistic Concurrency Control
Implement optimistic concurrency control (OCC). Read keys with version tracking, then commit only if no versions changed since the read. ```json Requ...
Implement Multi-Version Concurrency Control
Implement MVCC: keep multiple versions of each key. Readers get a consistent snapshot without blocking writers. ```json Request: {"type": "mvcc_put"...
Build a Mini TiKV with Raft + MVCC Regions
Build a mini version of TiKV: partition the key space into regions, each with its own Raft group and MVCC storage. ```json Request: {"type": "region...
Benchmark Contended Key Under OCC vs MVCC
Benchmark a contended-key scenario: 100 clients all updating the same key simultaneously. Compare OCC vs MVCC in terms of abort rate and throughput. ...
Interview Prep
Common interview questions for Distributed Systems / Backend Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
Use a Raft-replicated state machine: all writes go through the leader, which replicates to a majority before acknowledging. Reads from the leader provide linearizability (or use ReadIndex protocol to avoid log round-trips). Partition tolerance: the majority partition continues serving both reads and writes; the minority partition becomes unavailable for writes (choosing CP in CAP terms). This is the etcd / CockroachDB model. For higher availability at the cost of consistency, use quorum reads/writes (DynamoDB style) where W + R > N.
Model Answer
With N replicas, requiring W write acknowledgments and R read responses such that W + R > N ensures every read quorum intersects every write quorum by at least one node. Example: N=3, W=2, R=2 (2+2>3). A write must reach 2 of 3 nodes. A read queries 2 of 3 nodes. At least 1 node is in both sets — that node has the latest write, and a read that returns the highest version across R responses will return it. This does not guarantee linearizability (concurrent writes can still cause conflicts), but prevents reading a value older than the last quorum write.
Model Answer
DynamoDB uses conditional writes and optimistic locking via version numbers. Without a condition, the last writer wins (based on wall clock, which is susceptible to clock skew). With conditional expressions (e.g., version = expected_version), write is rejected if the version does not match, forcing the client to re-read and retry. DynamoDB's original design used vector clocks (described in the 2007 Dynamo paper), but the production service moved to simpler conflict resolution because clients found vector clock reconciliation complex to implement correctly.
Model Answer
A key-value store persists data durably and provides strong consistency guarantees (e.g., etcd, DynamoDB). A distributed cache stores data in memory for fast access, typically with eviction and optional durability. Redis can function as both: with AOF (append-only file) and/or RDB snapshots enabled, Redis provides durability, but its consistency model (single-threaded, eventually consistent replication) differs from systems like etcd. For critical data, use a durable store with replication; use Redis as a cache in front of it.
Model Answer
Read repair happens when a quorum read returns different values from different replicas. The coordinator detects the inconsistency and sends a write to the stale replicas to bring them up to date. This is background repair triggered by read traffic. It solves replica divergence caused by temporary node failures: if a node was down during a write, it missed the update; the next quorum read that includes this node detects the divergence and repairs it. Cassandra also runs anti-entropy repair (Merkle tree-based) as a scheduled background process to catch divergence that is not detected by reads.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
A single-node read returns whatever that node has. If the latest write only reached a subset of nodes, a node that missed it returns the old value.
The fix
For linearisable reads, either route all reads through the leader (which already has the latest committed value) or perform a read quorum (R + W > N).
Why it happens
Without a total order on writes (e.g., via a log), two concurrent writes to the same key have an undefined winner.
The fix
Linearise writes through the Raft log or use a last-write-wins rule with a tie-breaker (e.g., vector clocks or physical timestamps with node ID as tiebreak).
Why it happens
Acknowledging a write before it is persisted (to disk or a majority of nodes) violates durability.
The fix
Only reply `ok` after the entry is committed (replicated to a majority in Raft, or written to stable storage in simpler designs).
Why it happens
In eventual-consistency designs, a simple "copy the newest value" merge re-instates deleted keys if the deleting node has not propagated its state everywhere.
The fix
Use tombstones: mark deleted keys with a special sentinel and a timestamp. Merge logic must recognise tombstones and never overwrite a tombstone with an older value.
Why it happens
A linear scan over all keys on every read is only acceptable for toy sizes.
The fix
Use a `HashMap` / `dict` keyed by the key string for O(1) average-case reads. If range queries are needed, add a sorted index (e.g., `BTreeMap`).
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | B-Tree | LSM-Tree |
|---|---|---|
| Write pattern | In-place updates (random I/O) | Append-only writes (sequential I/O) |
| Write amplification | Low (one disk write per page update) | High (data written multiple times during compaction) |
| Read amplification | Low (log₂ n page reads for a lookup) | High (must check memtable + multiple SSTable levels) |
| Space amplification | Low (updates are in-place) | High (stale versions exist until compaction) |
| Write throughput | Limited by random I/O | Very high (sequential writes) |
| Crash recovery | WAL (Write-Ahead Log) required | Memtable WAL required; SSTables are immutable |
| Used in | PostgreSQL, MySQL InnoDB, SQLite | LevelDB, RocksDB, Cassandra, HBase |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.