Tracks/Advanced

Advanced

Advanced|5 tasks

Explore advanced distributed systems topics: MapReduce for batch processing, distributed hash tables, Byzantine fault tolerance, and real-time stream processing.

Subtracks & Tasks

Advanced Paradigms

0/5

AD-1

advanced

Implement MapReduce

Implement MapReduce: Map emits (key, value) pairs, shuffle groups by key, Reduce aggregates. Build word count as example....

MapReducebatch processingword count

AD-2

advanced

Build Distributed Hash Table (Chord)

Build Chord DHT: nodes on ring, finger tables for routing. Achieve O(log n) lookups in P2P network....

DHTChordfinger table

AD-3

advanced

Implement Byzantine Fault Tolerance

Implement PBFT: tolerates f Byzantine faults with 3f+1 nodes. Three phases: pre-prepare, prepare, commit....

ByzantinePBFTf faults

AD-4

intermediate

Build Stream Processing Pipeline

Build stream processor with windowing. Support tumbling and sliding windows with event-time processing....

streamingwindowingexactly-once

AD-5

advanced

Implement CRDTs

Build CRDTs for conflict-free replication: G-Counter (grow-only counter), G-Set, OR-Set....

CRDTeventual consistencyconflict-free

Interview Prep

Common interview questions for Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Not reliably. Physical clocks drift and NTP corrections can cause non-monotonic updates. Clock skew of tens of milliseconds is normal. For events within the skew window, timestamps do not establish ordering. Use Lamport clocks (logical clocks) if you need causal ordering: if A causally precedes B, then L(A) < L(B). For tracking concurrent events, use vector clocks. For a global total order close to real time, use Hybrid Logical Clocks (max of wall clock and received HLC) as CockroachDB does, or TrueTime as Google Spanner does.

Model Answer

A Lamport clock is a logical counter. Rule: on send, increment and attach. On receive, set to max(local, received) + 1. Guarantee: if A happens-before B, then L(A) < L(B). Does NOT guarantee the converse: if L(A) < L(B), A may or may not have happened before B (they might be concurrent). Lamport clocks establish a total order consistent with causality, but cannot distinguish concurrent events from causally ordered ones.

Model Answer

Options: (1) Vector clocks: each service maintains a vector of counters; causality is fully tracked but vectors are O(N) in size. (2) Hybrid Logical Clocks (HLC): each service tracks max(wallClock, lastSeenHLC). Clock values are bounded to wall time within the system's max clock skew. Events are ordered by HLC; concurrent events have indeterminate order. (3) Distributed tracing with causal metadata (propagate trace IDs and parent span IDs through RPC calls — this is what OpenTelemetry does). Option 2 is the practical choice for most systems; option 1 for systems requiring full causality tracking; option 3 for debugging.

Model Answer

TrueTime is Google's API for the current time, implemented with GPS receivers and atomic clocks in every datacenter. It returns an interval [earliest, latest] where the true current time is guaranteed to be. The uncertainty is typically 1-7ms. Spanner uses it for commit wait: after the leader decides a commit timestamp T, it waits until TrueTime.now().earliest > T before releasing the transaction. This ensures that any subsequent transaction starting anywhere will have a TrueTime range that begins after T, providing external consistency. No other production system has this — CockroachDB approximates it with bounded clock synchronization and conservative uncertainty estimates.

Model Answer

Root cause: physical timestamps from different machines are used to sort log entries, but the machines have clock skew. An event on a fast-clock machine at T may appear before an event on a slow-clock machine at T-5ms even though the slow-clock event causally preceded it. Fix: (1) Use logical timestamps (Lamport/vector clocks) instead of physical time for ordering events in logs. (2) Propagate a monotonic request ID through all service calls and sort by causal chain rather than time. (3) Use a centralized log aggregation system (e.g., Kafka) where producers attach their HLC timestamp, and the aggregator uses the highest received timestamp as a lower bound for ordering.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

NTP-synchronised clocks still drift by tens of milliseconds. Two events that occur "at the same time" on different nodes have no reliable wall-clock ordering.

The fix

Use Lamport timestamps or vector clocks for causal ordering. Only use wall clocks for human-readable logs, not for ordering distributed events.

Why it happens

Spanner's commit-wait depends on the TrueTime uncertainty interval (ε). If you simulate TrueTime with `time.now()` and zero uncertainty, you break the invariant that commit timestamps are truly in the past.

The fix

Model the uncertainty interval explicitly. A transaction's commit timestamp must be greater than `TT.now().latest` — commit, then wait until `TT.now().earliest > commit_ts` before releasing.

Why it happens

A Prepare ACK is a promise only for the specific ballot number used in that Prepare. A higher-ballot Prepare from another proposer supersedes it.

The fix

Acceptors must track the highest ballot they have promised (max_prepared). A Promise is only valid for the exact ballot of the Prepare that triggered it. Any higher ballot invalidates it.

Why it happens

If an acceptor forgets what it accepted in a slot, a new leader may choose a different value for that slot, breaking the "chosen value never changes" invariant.

The fix

Acceptors must persist (ballot, accepted_value) per slot to stable storage before sending the Accepted message. Recovering nodes must replay from persistent state.

Why it happens

Raft requires a leader to commit at least one entry from its own term before committing entries from prior terms. Without committing a no-op, the leader cannot safely advance `commitIndex` for old entries.

The fix

Immediately after election, append and replicate a no-op log entry in the new term. Once it commits, all previous entries are also safe to apply.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	Lamport Timestamps	Vector Clocks	TrueTime
What it captures	Happens-before partial order (scalar)	Happens-before partial order (full causality)	Real-time bounds with uncertainty interval
Size	O(1) — one integer	O(n) — one integer per node	O(1) — earliest/latest interval
Detects concurrency	No — concurrent events may appear ordered	Yes — concurrent if neither dominates	No — uses real-time intervals
Requires special hardware	No	No	Yes — GPS + atomic clocks (Google-specific)
Commit-wait overhead	None	None	Yes — must wait out uncertainty interval
Used in	Multi-master DBs, event logs	Dynamo, Riak, CRDTs	Google Spanner only

Verdict:Lamport clocks for simple total ordering. Vector clocks when you need to detect true concurrency. TrueTime is Google-specific infrastructure — simulate it with bounded uncertainty in academic implementations.

Dimension	Paxos	Raft	EPaxos
Leader required	No (any node can propose)	Yes (strict leader)	No (any replica can commit)
Throughput bottleneck	Single distinguished proposer in Multi-Paxos	Leader is the bottleneck for all writes	Commutative commands bypass coordination
Latency for WAN	2 round-trips	2 round-trips to leader	1 round-trip for non-conflicting commands
Complexity	High (many edge cases)	Medium (designed for understandability)	Very high (dependency tracking)
Conflict handling	N/A (total order)	N/A (total order via leader)	Explicit — conflicting commands ordered; commutative ones run in parallel
Used in	Google Chubby, older ZooKeeper	etcd, CockroachDB, TiKV	Research; not yet widely productionised

Verdict:Raft is the pragmatic choice. EPaxos is theoretically superior for geo-distributed writes but far harder to implement and debug.

Concepts Covered

MapReducebatch processingword countDHTChordfinger tableByzantinePBFTf faultsstreamingwindowingexactly-onceCRDTeventual consistencyconflict-free

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.