Subtracks & Tasks
Advanced Paradigms
Implement MapReduce
Implement MapReduce: Map emits (key, value) pairs, shuffle groups by key, Reduce aggregates. Build word count as example....
Build Distributed Hash Table (Chord)
Build Chord DHT: nodes on ring, finger tables for routing. Achieve O(log n) lookups in P2P network....
Implement Byzantine Fault Tolerance
Implement PBFT: tolerates f Byzantine faults with 3f+1 nodes. Three phases: pre-prepare, prepare, commit....
Build Stream Processing Pipeline
Build stream processor with windowing. Support tumbling and sliding windows with event-time processing....
Implement CRDTs
Build CRDTs for conflict-free replication: G-Counter (grow-only counter), G-Set, OR-Set....
Interview Prep
Common interview questions for Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
Not reliably. Physical clocks drift and NTP corrections can cause non-monotonic updates. Clock skew of tens of milliseconds is normal. For events within the skew window, timestamps do not establish ordering. Use Lamport clocks (logical clocks) if you need causal ordering: if A causally precedes B, then L(A) < L(B). For tracking concurrent events, use vector clocks. For a global total order close to real time, use Hybrid Logical Clocks (max of wall clock and received HLC) as CockroachDB does, or TrueTime as Google Spanner does.
Model Answer
A Lamport clock is a logical counter. Rule: on send, increment and attach. On receive, set to max(local, received) + 1. Guarantee: if A happens-before B, then L(A) < L(B). Does NOT guarantee the converse: if L(A) < L(B), A may or may not have happened before B (they might be concurrent). Lamport clocks establish a total order consistent with causality, but cannot distinguish concurrent events from causally ordered ones.
Model Answer
Options: (1) Vector clocks: each service maintains a vector of counters; causality is fully tracked but vectors are O(N) in size. (2) Hybrid Logical Clocks (HLC): each service tracks max(wallClock, lastSeenHLC). Clock values are bounded to wall time within the system's max clock skew. Events are ordered by HLC; concurrent events have indeterminate order. (3) Distributed tracing with causal metadata (propagate trace IDs and parent span IDs through RPC calls — this is what OpenTelemetry does). Option 2 is the practical choice for most systems; option 1 for systems requiring full causality tracking; option 3 for debugging.
Model Answer
TrueTime is Google's API for the current time, implemented with GPS receivers and atomic clocks in every datacenter. It returns an interval [earliest, latest] where the true current time is guaranteed to be. The uncertainty is typically 1-7ms. Spanner uses it for commit wait: after the leader decides a commit timestamp T, it waits until TrueTime.now().earliest > T before releasing the transaction. This ensures that any subsequent transaction starting anywhere will have a TrueTime range that begins after T, providing external consistency. No other production system has this — CockroachDB approximates it with bounded clock synchronization and conservative uncertainty estimates.
Model Answer
Root cause: physical timestamps from different machines are used to sort log entries, but the machines have clock skew. An event on a fast-clock machine at T may appear before an event on a slow-clock machine at T-5ms even though the slow-clock event causally preceded it. Fix: (1) Use logical timestamps (Lamport/vector clocks) instead of physical time for ordering events in logs. (2) Propagate a monotonic request ID through all service calls and sort by causal chain rather than time. (3) Use a centralized log aggregation system (e.g., Kafka) where producers attach their HLC timestamp, and the aggregator uses the highest received timestamp as a lower bound for ordering.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
NTP-synchronised clocks still drift by tens of milliseconds. Two events that occur "at the same time" on different nodes have no reliable wall-clock ordering.
The fix
Use Lamport timestamps or vector clocks for causal ordering. Only use wall clocks for human-readable logs, not for ordering distributed events.
Why it happens
Spanner's commit-wait depends on the TrueTime uncertainty interval (ε). If you simulate TrueTime with `time.now()` and zero uncertainty, you break the invariant that commit timestamps are truly in the past.
The fix
Model the uncertainty interval explicitly. A transaction's commit timestamp must be greater than `TT.now().latest` — commit, then wait until `TT.now().earliest > commit_ts` before releasing.
Why it happens
A Prepare ACK is a promise only for the specific ballot number used in that Prepare. A higher-ballot Prepare from another proposer supersedes it.
The fix
Acceptors must track the highest ballot they have promised (max_prepared). A Promise is only valid for the exact ballot of the Prepare that triggered it. Any higher ballot invalidates it.
Why it happens
If an acceptor forgets what it accepted in a slot, a new leader may choose a different value for that slot, breaking the "chosen value never changes" invariant.
The fix
Acceptors must persist (ballot, accepted_value) per slot to stable storage before sending the Accepted message. Recovering nodes must replay from persistent state.
Why it happens
Raft requires a leader to commit at least one entry from its own term before committing entries from prior terms. Without committing a no-op, the leader cannot safely advance `commitIndex` for old entries.
The fix
Immediately after election, append and replicate a no-op log entry in the new term. Once it commits, all previous entries are also safe to apply.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | Lamport Timestamps | Vector Clocks | TrueTime |
|---|---|---|---|
| What it captures | Happens-before partial order (scalar) | Happens-before partial order (full causality) | Real-time bounds with uncertainty interval |
| Size | O(1) — one integer | O(n) — one integer per node | O(1) — earliest/latest interval |
| Detects concurrency | No — concurrent events may appear ordered | Yes — concurrent if neither dominates | No — uses real-time intervals |
| Requires special hardware | No | No | Yes — GPS + atomic clocks (Google-specific) |
| Commit-wait overhead | None | None | Yes — must wait out uncertainty interval |
| Used in | Multi-master DBs, event logs | Dynamo, Riak, CRDTs | Google Spanner only |
| Dimension | Paxos | Raft | EPaxos |
|---|---|---|---|
| Leader required | No (any node can propose) | Yes (strict leader) | No (any replica can commit) |
| Throughput bottleneck | Single distinguished proposer in Multi-Paxos | Leader is the bottleneck for all writes | Commutative commands bypass coordination |
| Latency for WAN | 2 round-trips | 2 round-trips to leader | 1 round-trip for non-conflicting commands |
| Complexity | High (many edge cases) | Medium (designed for understandability) | Very high (dependency tracking) |
| Conflict handling | N/A (total order) | N/A (total order via leader) | Explicit — conflicting commands ordered; commutative ones run in parallel |
| Used in | Google Chubby, older ZooKeeper | etcd, CockroachDB, TiKV | Research; not yet widely productionised |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.