Tracks/The Coordinator

The Coordinator

Advanced

Advanced|15 tasks

Implement distributed transactions with two-phase commit, three-phase commit, and saga patterns.

Subtracks & Tasks

Two-Phase Commit

0/5

TW-1

advanced

Implement Two-Phase Commit

Implement 2PC: Phase 1 sends PREPARE, collects votes. Phase 2 sends COMMIT if all YES, else ABORT....

2PCatomic commitprepare-commit

TW-2

advanced

Handle Coordinator Failure

Handle coordinator failures: log PREPARE before sending, log COMMIT/ABORT decision, recover from log....

failure recoveryblockingwrite-ahead log

TW-3

advanced

Implement Three-Phase Commit

3PC adds PRE-COMMIT phase. If coordinator fails after PRE-COMMIT, participants can commit safely....

3PCnon-blockingpre-commit

TW-4

advanced

Implement Saga Pattern

Sagas: sequence of local transactions with compensations. On failure, compensate in reverse order....

sagacompensationeventual consistency

TW-5

advanced

Build Transactional Key-Value Store

Transactional KV: begin, read, write, commit/abort. Use 2PC for cross-shard transactions....

transactionsACIDisolation

Three-Phase Commit (3PC)

0/5

TH-1

advanced

Implement Three-Phase Commit Protocol

Three-Phase Commit (3PC) adds an extra phase to 2PC to reduce blocking scenarios. The third phase (`PreCommit`) ensures participants know a commit is ...

three-phase commitCanCommitPreCommit+3 more

TH-2

advanced

Show How 3PC Unblocks 2PC Scenarios

The key advantage of 3PC over 2PC is that it unblocks one of 2PC's blocking scenarios. When the coordinator crashes after sending `PreCommit`, partici...

blocking vs non-blockingrecovery procedurestimeout handling+2 more

TH-3

advanced

Show 3PC Blocking Under Network Partition

While 3PC improves on 2PC, it still has blocking scenarios. The key limitation: if a network partition occurs before `PreCommit`, participants may not...

network partitionblocking scenariossplit brain+2 more

TH-4

intermediate

Compare 2PC vs 3PC Protocols

Understanding the trade-offs between 2PC and 3PC helps choose the right protocol for your use case. **Message complexity**: ``` 2PC (happy path): P...

protocol comparisonmessage complexityblocking scenarios+2 more

TH-5

advanced

Implement Paxos Commit Protocol

Paxos Commit replaces the single coordinator with a Paxos consensus group. Each participant's commit decision is reached through Paxos, eliminating th...

Paxos commitconsensus-based commitno single point of failure+3 more

Saga Pattern

0/5

SA-1

intermediate

Implement Saga Pattern with Compensating Transactions

The Saga pattern manages long-running transactions by breaking them into a sequence of local transactions with compensating actions for rollback. Unli...

saga patterncompensating transactionslocal transactions+2 more

SA-2

advanced

Implement Choreography-Based Saga

In a choreography-based saga, there is no central coordinator. Each service listens for events and triggers the next step by publishing its own events...

choreographyevent-driven architectureservice coordination+2 more

SA-3

intermediate

Implement Orchestration-Based Saga

In an orchestration-based saga, a central orchestrator manages the saga lifecycle. It sends commands to services, tracks completion, and initiates com...

orchestrationsaga orchestratorcentral coordinator+2 more

SA-4

intermediate

Implement Idempotency in Sagas

Idempotency ensures that retrying saga steps doesn't cause duplicate operations like double-charging payments. **Idempotency key**: Each saga step i...

idempotencydeduplicationexactly-once semantics+2 more

SA-5

advanced

Implement E-Commerce Checkout Saga

Implement a complete e-commerce checkout saga with inventory, payment, and shipping services. This demonstrates the saga pattern in a realistic scenar...

e-commerce sagainventory reservationpayment processing+3 more

Interview Prep

Common interview questions for Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

In 2PC, a coordinator sends PREPARE to all participants; each votes YES (durably logs the tentative commit) or NO. If all vote YES, coordinator sends COMMIT; otherwise ABORT. Blocking problem: if the coordinator crashes after collecting YES votes but before sending COMMIT, participants are stuck holding locks indefinitely — they cannot unilaterally abort (might conflict with a commit the coordinator decided). Production solutions: (1) Replicate the coordinator using Raft so another node takes over, (2) Store coordinator decisions in a shared store participants can read (Percolator/Spanner approach), (3) Use timeouts with careful retry logic and human intervention for stuck transactions.

Model Answer

A distributed transaction (2PC-based) provides atomicity: all participants commit or none do. Intermediate states are not visible. Requires coordination and can block on failures. A saga is a sequence of local transactions, each with a compensating transaction. If step k fails, compensating transactions for steps 1..k-1 are executed. Intermediate states are visible. No coordinator blocking — each step acknowledges independently. Sagas are preferred in microservices architectures because they avoid distributed locking and work across services with different storage systems. Tradeoff: sagas do not provide isolation — concurrent sagas can interfere with each other.

Model Answer

Options by consistency requirement: (1) 2PC with a coordinator (strong atomicity, blocking on coordinator failure), (2) Saga: debit A, then credit B; if credit fails, execute compensating transaction to re-credit A (eventual consistency, intermediate state visible), (3) Outbox pattern: write debit and a "pending credit" event to the same DB transaction on A's side; a separate process reads the outbox and applies the credit to B, retrying until success (at-least-once delivery). For financial systems, option 3 with idempotency keys is common because it avoids distributed coordination while ensuring eventual consistency.

Model Answer

XA is a standard interface for distributed transactions, supported by most relational databases (Postgres, MySQL, Oracle) and message brokers. An XA transaction coordinator manages 2PC across multiple XA-capable resources. It is still used in enterprise Java applications (JTA/JTS) and mainframe-style architectures. It is generally avoided in modern distributed systems because: high latency (multiple round-trips per transaction), coordinator as SPOF, poor performance at scale. Cloud-native systems prefer sagas or application-level 2PC over a Raft-replicated coordinator.

Model Answer

Spanner uses 2PC where each shard is a Paxos group (not a single node). The 2PC coordinator is a Paxos leader (highly available, no single point of failure for blocking). TrueTime provides bounded clock uncertainty — Spanner assigns commit timestamps that are guaranteed to be after all prior transactions by waiting out the TrueTime uncertainty interval (commit wait). This ensures any transaction that starts after commit T will see the committed write, providing external consistency without a global sequencer. CockroachDB replicates this using Raft per range and Hybrid Logical Clocks instead of TrueTime.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

If the coordinator crashes after sending some Prepare messages but before collecting all votes, no one knows the global outcome. Participants are blocked.

The fix

Log the transaction and its state (PREPARED, COMMITTED, ABORTED) to durable storage before sending Phase 2 messages. On recovery, replay from the log.

Why it happens

2PC requires locks to be held from Phase 1 until Phase 2 completes. Network latency turns lock hold time from microseconds to hundreds of milliseconds.

The fix

Minimise the Prepare phase — do all local validation quickly and send the Phase 2 abort/commit message as fast as possible. Use read-committed isolation when strict serializability is not required.

Why it happens

Without a timeout, the coordinator waits forever for a Prepare or Commit ACK that will never arrive.

The fix

Set a deadline for each RPC. On timeout, assume the participant failed and abort the transaction. Log the abort so a recovered participant can query the outcome.

Why it happens

If a participant has no record of the transaction (it was never delivered the Prepare), it must abort — not commit.

The fix

Participants must only respond PREPARED if they successfully executed and persisted the Prepare phase locally. An unknown transaction ID always yields ABORT.

Why it happens

Transaction IDs generated with a simple counter can collide if the coordinator restarts and reuses IDs from before the crash.

The fix

Generate globally unique transaction IDs (UUIDs or a coordinator-epoch + sequence). Include the coordinator epoch so post-crash IDs are always fresh.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	2PC	Saga	TCC
Atomicity	Atomic — all commit or all abort	Not atomic — uses compensating transactions	Atomic at the application level
Blocking on failure	Yes — blocks if coordinator crashes in Phase 2	No — each step is independent	Partial — Try phase reserves resources
Latency	High — two network round-trips minimum	Low — each step fires and forgets	Medium — three phases but async-friendly
Isolation	Full isolation (locks held across phases)	None — intermediate states are visible	Partial — reserved but uncommitted state visible
Rollback complexity	Automatic abort if any participant votes NO	Must write explicit compensating transactions	Cancel phase must be idempotent
Used in	Relational databases (XA), some NoSQL	Microservices (Kafka choreography, Conductor)	Payment systems, inventory reservation

Verdict:2PC for strict atomicity within a trust boundary. Saga for long-running workflows across microservices. TCC when you need business-level atomicity with explicit reservation semantics.

Concepts Covered

2PCatomic commitprepare-commitfailure recoveryblockingwrite-ahead log3PCnon-blockingpre-commitsagacompensationeventual consistencytransactionsACIDisolationthree-phase commitCanCommitPreCommitDoCommitnon-blocking commitcoordinator recoveryblocking vs non-blockingrecovery procedurestimeout handlingcoordinator failureparticipant uncertaintynetwork partitionblocking scenariossplit brainsafety vs livenessCAP theoremprotocol comparisonmessage complexityreal-world usageperformance trade-offsPaxos commitconsensus-based commitno single point of failureacceptorsproposerslearnerssaga patterncompensating transactionslocal transactionsrollbacklong-running transactionschoreographyevent-driven architectureservice coordinationno central orchestratorevent publishingorchestrationsaga orchestratorcentral coordinatorstate machinecommand patternsidempotencydeduplicationexactly-once semanticsmessage retriessaga_id + step_ide-commerce sagainventory reservationpayment processingshipment creationreal-world saga

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.