Subtracks & Tasks
Two-Phase Commit
Implement Two-Phase Commit
Implement 2PC: Phase 1 sends PREPARE, collects votes. Phase 2 sends COMMIT if all YES, else ABORT....
Handle Coordinator Failure
Handle coordinator failures: log PREPARE before sending, log COMMIT/ABORT decision, recover from log....
Implement Three-Phase Commit
3PC adds PRE-COMMIT phase. If coordinator fails after PRE-COMMIT, participants can commit safely....
Implement Saga Pattern
Sagas: sequence of local transactions with compensations. On failure, compensate in reverse order....
Build Transactional Key-Value Store
Transactional KV: begin, read, write, commit/abort. Use 2PC for cross-shard transactions....
Three-Phase Commit (3PC)
Implement Three-Phase Commit Protocol
Three-Phase Commit (3PC) adds an extra phase to 2PC to reduce blocking scenarios. The third phase (`PreCommit`) ensures participants know a commit is ...
Show How 3PC Unblocks 2PC Scenarios
The key advantage of 3PC over 2PC is that it unblocks one of 2PC's blocking scenarios. When the coordinator crashes after sending `PreCommit`, partici...
Show 3PC Blocking Under Network Partition
While 3PC improves on 2PC, it still has blocking scenarios. The key limitation: if a network partition occurs before `PreCommit`, participants may not...
Compare 2PC vs 3PC Protocols
Understanding the trade-offs between 2PC and 3PC helps choose the right protocol for your use case. **Message complexity**: ``` 2PC (happy path): P...
Implement Paxos Commit Protocol
Paxos Commit replaces the single coordinator with a Paxos consensus group. Each participant's commit decision is reached through Paxos, eliminating th...
Saga Pattern
Implement Saga Pattern with Compensating Transactions
The Saga pattern manages long-running transactions by breaking them into a sequence of local transactions with compensating actions for rollback. Unli...
Implement Choreography-Based Saga
In a choreography-based saga, there is no central coordinator. Each service listens for events and triggers the next step by publishing its own events...
Implement Orchestration-Based Saga
In an orchestration-based saga, a central orchestrator manages the saga lifecycle. It sends commands to services, tracks completion, and initiates com...
Implement Idempotency in Sagas
Idempotency ensures that retrying saga steps doesn't cause duplicate operations like double-charging payments. **Idempotency key**: Each saga step i...
Implement E-Commerce Checkout Saga
Implement a complete e-commerce checkout saga with inventory, payment, and shipping services. This demonstrates the saga pattern in a realistic scenar...
Interview Prep
Common interview questions for Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
In 2PC, a coordinator sends PREPARE to all participants; each votes YES (durably logs the tentative commit) or NO. If all vote YES, coordinator sends COMMIT; otherwise ABORT. Blocking problem: if the coordinator crashes after collecting YES votes but before sending COMMIT, participants are stuck holding locks indefinitely — they cannot unilaterally abort (might conflict with a commit the coordinator decided). Production solutions: (1) Replicate the coordinator using Raft so another node takes over, (2) Store coordinator decisions in a shared store participants can read (Percolator/Spanner approach), (3) Use timeouts with careful retry logic and human intervention for stuck transactions.
Model Answer
A distributed transaction (2PC-based) provides atomicity: all participants commit or none do. Intermediate states are not visible. Requires coordination and can block on failures. A saga is a sequence of local transactions, each with a compensating transaction. If step k fails, compensating transactions for steps 1..k-1 are executed. Intermediate states are visible. No coordinator blocking — each step acknowledges independently. Sagas are preferred in microservices architectures because they avoid distributed locking and work across services with different storage systems. Tradeoff: sagas do not provide isolation — concurrent sagas can interfere with each other.
Model Answer
Options by consistency requirement: (1) 2PC with a coordinator (strong atomicity, blocking on coordinator failure), (2) Saga: debit A, then credit B; if credit fails, execute compensating transaction to re-credit A (eventual consistency, intermediate state visible), (3) Outbox pattern: write debit and a "pending credit" event to the same DB transaction on A's side; a separate process reads the outbox and applies the credit to B, retrying until success (at-least-once delivery). For financial systems, option 3 with idempotency keys is common because it avoids distributed coordination while ensuring eventual consistency.
Model Answer
XA is a standard interface for distributed transactions, supported by most relational databases (Postgres, MySQL, Oracle) and message brokers. An XA transaction coordinator manages 2PC across multiple XA-capable resources. It is still used in enterprise Java applications (JTA/JTS) and mainframe-style architectures. It is generally avoided in modern distributed systems because: high latency (multiple round-trips per transaction), coordinator as SPOF, poor performance at scale. Cloud-native systems prefer sagas or application-level 2PC over a Raft-replicated coordinator.
Model Answer
Spanner uses 2PC where each shard is a Paxos group (not a single node). The 2PC coordinator is a Paxos leader (highly available, no single point of failure for blocking). TrueTime provides bounded clock uncertainty — Spanner assigns commit timestamps that are guaranteed to be after all prior transactions by waiting out the TrueTime uncertainty interval (commit wait). This ensures any transaction that starts after commit T will see the committed write, providing external consistency without a global sequencer. CockroachDB replicates this using Raft per range and Hybrid Logical Clocks instead of TrueTime.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
If the coordinator crashes after sending some Prepare messages but before collecting all votes, no one knows the global outcome. Participants are blocked.
The fix
Log the transaction and its state (PREPARED, COMMITTED, ABORTED) to durable storage before sending Phase 2 messages. On recovery, replay from the log.
Why it happens
2PC requires locks to be held from Phase 1 until Phase 2 completes. Network latency turns lock hold time from microseconds to hundreds of milliseconds.
The fix
Minimise the Prepare phase — do all local validation quickly and send the Phase 2 abort/commit message as fast as possible. Use read-committed isolation when strict serializability is not required.
Why it happens
Without a timeout, the coordinator waits forever for a Prepare or Commit ACK that will never arrive.
The fix
Set a deadline for each RPC. On timeout, assume the participant failed and abort the transaction. Log the abort so a recovered participant can query the outcome.
Why it happens
If a participant has no record of the transaction (it was never delivered the Prepare), it must abort — not commit.
The fix
Participants must only respond PREPARED if they successfully executed and persisted the Prepare phase locally. An unknown transaction ID always yields ABORT.
Why it happens
Transaction IDs generated with a simple counter can collide if the coordinator restarts and reuses IDs from before the crash.
The fix
Generate globally unique transaction IDs (UUIDs or a coordinator-epoch + sequence). Include the coordinator epoch so post-crash IDs are always fresh.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | 2PC | Saga | TCC |
|---|---|---|---|
| Atomicity | Atomic — all commit or all abort | Not atomic — uses compensating transactions | Atomic at the application level |
| Blocking on failure | Yes — blocks if coordinator crashes in Phase 2 | No — each step is independent | Partial — Try phase reserves resources |
| Latency | High — two network round-trips minimum | Low — each step fires and forgets | Medium — three phases but async-friendly |
| Isolation | Full isolation (locks held across phases) | None — intermediate states are visible | Partial — reserved but uncommitted state visible |
| Rollback complexity | Automatic abort if any participant votes NO | Must write explicit compensating transactions | Cancel phase must be idempotent |
| Used in | Relational databases (XA), some NoSQL | Microservices (Kafka choreography, Conductor) | Payment systems, inventory reservation |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.