Tracks/The Store

The Store

Advanced

Consensus|15 tasks

Build a linearizable key-value store on top of your Raft implementation. Handle client requests, manage sessions, ensure read consistency, and implement client retry logic.

Subtracks & Tasks

Linearizable Key-Value Store

0/5

LI-1

intermediate

Implement Key-Value Interface

Implement the key-value store interface on top of Raft: 1. GET(key) - Return current value or null 2. PUT(key, value) - Set key to value 3. CAS(key, ...

key-valueAPIoperations

LI-2

intermediate

Handle Client Request Routing

Route client requests to the leader: 1. Client sends request to any node 2. If node is leader, process request 3. If not leader, return redirect with...

leader routingredirectclient sessions

LI-3

advanced

Ensure Read Consistency

Implement linearizable reads: Option 1: Log reads (simple but slow) - Treat reads as log entries, wait for commit Option 2: ReadIndex (Raft optimiza...

linearizable readsread indexlease

LI-4

intermediate

Handle Client Retry and Deduplication

Handle client retries without duplicate execution: 1. Client assigns sequence number to each request 2. Server tracks (client_id -> latest_seq, respo...

idempotencyclient sessionsdeduplication

LI-5

advanced

Implement Log Compaction with Snapshots

Implement log compaction with snapshots: 1. Periodically snapshot state machine state 2. Record snapshot index and term 3. Discard log entries before...

snapshotlog compactionrecovery

Read Optimization

0/5

RE-1

advanced

Implement Read Index for Linearizable Reads

Implement the "read index" optimization: the leader records the current commit index, confirms it still leads via heartbeats, then serves the read. Li...

read indexlinearizable readsheartbeat confirmation+1 more

RE-2

advanced

Implement Lease-Based Reads

Implement lease-based reads: the leader uses its active lease to serve reads without network round trips. Document the clock assumption required. ```...

lease readsclock assumptionzero network round trips+1 more

RE-3

advanced

Add Follower Reads with Bounded Staleness

Implement follower reads with bounded staleness. Clients can opt to read from any follower if they accept data up to T seconds stale. This scales read...

follower readsbounded stalenessread scalability+1 more

RE-4

advanced

Guarantee Read-Your-Writes with Follower Reads

Ensure read-your-writes consistency even when using follower reads. Clients send their `last_write_index` with each read. Followers only serve if they...

read-your-writessession consistencycommit index tracking+1 more

RE-5

intermediate

Benchmark Read Strategies Under Mixed Workload

Benchmark three read strategies under an 80% read / 20% write workload. Measure throughput and latency for each. ```json Request: {"type": "read_ben...

throughput benchmarkread/write ratiolatency comparison+1 more

Transactions on Raft

0/5

TR-1

advanced

Implement Multi-Key Transactions as Atomic Log Entries

Implement multi-key transactions. A transaction is a batch of operations committed as a single log entry for atomicity. ```json Request: {"type": "t...

multi-key transactionatomic batchlog entry+1 more

TR-2

advanced

Implement Optimistic Concurrency Control

Implement optimistic concurrency control (OCC). Read keys with version tracking, then commit only if no versions changed since the read. ```json Requ...

OCCversion checkconflict detection+1 more

TR-3

advanced

Implement Multi-Version Concurrency Control

Implement MVCC: keep multiple versions of each key. Readers get a consistent snapshot without blocking writers. ```json Request: {"type": "mvcc_put"...

MVCCversioned storagesnapshot isolation+1 more

TR-4

advanced

Build a Mini TiKV with Raft + MVCC Regions

Build a mini version of TiKV: partition the key space into regions, each with its own Raft group and MVCC storage. ```json Request: {"type": "region...

TiKVregionsrange partitioning+2 more

TR-5

advanced

Benchmark Contended Key Under OCC vs MVCC

Benchmark a contended-key scenario: 100 clients all updating the same key simultaneously. Compare OCC vs MVCC in terms of abort rate and throughput. ...

contentionabort ratethroughput+2 more

Interview Prep

Common interview questions for Distributed Systems / Backend Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Use a Raft-replicated state machine: all writes go through the leader, which replicates to a majority before acknowledging. Reads from the leader provide linearizability (or use ReadIndex protocol to avoid log round-trips). Partition tolerance: the majority partition continues serving both reads and writes; the minority partition becomes unavailable for writes (choosing CP in CAP terms). This is the etcd / CockroachDB model. For higher availability at the cost of consistency, use quorum reads/writes (DynamoDB style) where W + R > N.

Model Answer

With N replicas, requiring W write acknowledgments and R read responses such that W + R > N ensures every read quorum intersects every write quorum by at least one node. Example: N=3, W=2, R=2 (2+2>3). A write must reach 2 of 3 nodes. A read queries 2 of 3 nodes. At least 1 node is in both sets — that node has the latest write, and a read that returns the highest version across R responses will return it. This does not guarantee linearizability (concurrent writes can still cause conflicts), but prevents reading a value older than the last quorum write.

Model Answer

DynamoDB uses conditional writes and optimistic locking via version numbers. Without a condition, the last writer wins (based on wall clock, which is susceptible to clock skew). With conditional expressions (e.g., version = expected_version), write is rejected if the version does not match, forcing the client to re-read and retry. DynamoDB's original design used vector clocks (described in the 2007 Dynamo paper), but the production service moved to simpler conflict resolution because clients found vector clock reconciliation complex to implement correctly.

Model Answer

A key-value store persists data durably and provides strong consistency guarantees (e.g., etcd, DynamoDB). A distributed cache stores data in memory for fast access, typically with eviction and optional durability. Redis can function as both: with AOF (append-only file) and/or RDB snapshots enabled, Redis provides durability, but its consistency model (single-threaded, eventually consistent replication) differs from systems like etcd. For critical data, use a durable store with replication; use Redis as a cache in front of it.

Model Answer

Read repair happens when a quorum read returns different values from different replicas. The coordinator detects the inconsistency and sends a write to the stale replicas to bring them up to date. This is background repair triggered by read traffic. It solves replica divergence caused by temporary node failures: if a node was down during a write, it missed the update; the next quorum read that includes this node detects the divergence and repairs it. Cassandra also runs anti-entropy repair (Merkle tree-based) as a scheduled background process to catch divergence that is not detected by reads.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

A single-node read returns whatever that node has. If the latest write only reached a subset of nodes, a node that missed it returns the old value.

The fix

For linearisable reads, either route all reads through the leader (which already has the latest committed value) or perform a read quorum (R + W > N).

Why it happens

Without a total order on writes (e.g., via a log), two concurrent writes to the same key have an undefined winner.

The fix

Linearise writes through the Raft log or use a last-write-wins rule with a tie-breaker (e.g., vector clocks or physical timestamps with node ID as tiebreak).

Why it happens

Acknowledging a write before it is persisted (to disk or a majority of nodes) violates durability.

The fix

Only reply `ok` after the entry is committed (replicated to a majority in Raft, or written to stable storage in simpler designs).

Why it happens

In eventual-consistency designs, a simple "copy the newest value" merge re-instates deleted keys if the deleting node has not propagated its state everywhere.

The fix

Use tombstones: mark deleted keys with a special sentinel and a timestamp. Merge logic must recognise tombstones and never overwrite a tombstone with an older value.

Why it happens

A linear scan over all keys on every read is only acceptable for toy sizes.

The fix

Use a `HashMap` / `dict` keyed by the key string for O(1) average-case reads. If range queries are needed, add a sorted index (e.g., `BTreeMap`).

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	B-Tree	LSM-Tree
Write pattern	In-place updates (random I/O)	Append-only writes (sequential I/O)
Write amplification	Low (one disk write per page update)	High (data written multiple times during compaction)
Read amplification	Low (log₂ n page reads for a lookup)	High (must check memtable + multiple SSTable levels)
Space amplification	Low (updates are in-place)	High (stale versions exist until compaction)
Write throughput	Limited by random I/O	Very high (sequential writes)
Crash recovery	WAL (Write-Ahead Log) required	Memtable WAL required; SSTables are immutable
Used in	PostgreSQL, MySQL InnoDB, SQLite	LevelDB, RocksDB, Cassandra, HBase

Verdict:B-Tree for read-heavy workloads with random access. LSM-Tree for write-heavy append workloads where sequential throughput matters.

Concepts Covered

key-valueAPIoperationsleader routingredirectclient sessionslinearizable readsread indexleaseidempotencydeduplicationsnapshotlog compactionrecoveryheartbeat confirmationleader leaselease readsclock assumptionzero network round tripsstale read riskfollower readsbounded stalenessread scalabilityconsistency tradeoffread-your-writessession consistencycommit index trackingclient tokenthroughput benchmarkread/write ratiolatency comparisonscalabilitymulti-key transactionatomic batchlog entryall-or-nothingOCCversion checkconflict detectionabort and retryMVCCversioned storagesnapshot isolationreaders never block writersTiKVregionsrange partitioningRaft per regionmulti-regioncontentionabort ratethroughputOCC vs MVCChot key

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.