ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/projects/mini-spanner/tasks/spanner-t1-s1-paxos-phase2
DS

Paxos Phase 2: Accept and Commit

Mini-Spanner / Paxos Consensus
advanced

Concept

Paxos Phase 2: ACCEPT / COMMIT Proposer P1 Acceptor A1 Acceptor A2 Acceptor A3 (crashed) ACCEPT(7,V) ACCEPT(7,V) ACCEPT(7,V) lost ACCEPTED ACCEPTED CHOSEN V majority: 2 of 3 accepted Learner notify

Paxos Phase 2: Choosing a Value

After winning Phase 1 and determining the value to propose, the proposer enters Phase 2. This is where a value is actually chosen — the irreversible moment of consensus. The proposer broadcasts ACCEPT(n, value) to all acceptors and waits for a majority to respond with ACCEPTED.

The Accept Rule

Each acceptor applies a simple decision on receiving ACCEPT(n, value):

on ACCEPT(n, value):
    if n >= max_n_promised:
        accepted_n   = n
        accepted_val = value
        persist_to_stable_storage()
        return ACCEPTED
    else:
        return REJECTED

The condition is >=, not >. The acceptor already promised for round n during Phase 1, so it must honor that by accepting the corresponding Phase 2 message. If a competing proposer ran Phase 1 with a higher n' in the meantime, then max_n_promised was updated to n', and the original proposal is now stale — it gets rejected.

When Is a Value Chosen?

A value is chosen the moment a majority of acceptors have accepted it. This does not require all acceptors to accept — the diagram above shows A3 being unreachable, yet V is still chosen because A1 and A2 constitute a majority of 3. The crucial property of majority quorums is that any two majorities overlap in at least one node. This overlap node carries knowledge of the chosen value into any future Phase 1, ensuring the value is preserved.

The Safety Guarantee: Why Chosen Values Are Permanent

Suppose value V is chosen at round n by quorum Q = {A1, A2}. Now suppose a new proposer runs Phase 1 with n' > n. It must collect promises from a majority, say {A1, A3}. The overlap between Q and any future majority guarantees that at least one member — here A1 — was in Q. A1 will include (n, V) in its promise. The new proposer must therefore use V in Phase 2. The safety chain is preserved forever.

Learners and Commit Notification

A learner is any node that needs to know the chosen value — typically state-machine replicas that apply committed log entries. Once the proposer reaches a majority of ACCEPTED responses, it notifies learners with CHOSEN V (or equivalently, writes the decision to a commit log that learners poll). In Spanner, learners are the Paxos replicas that apply writes to local storage once they observe a commit decision.

Multi-Paxos: Eliminating Phase 1 Per Write

Classic Paxos runs both phases for every single log entry. Multi-Paxos, as used in Spanner, runs Phase 1 once to establish a long-lived leader for all future log slots. As long as the leader is stable, every subsequent write needs only Phase 2 — a single round-trip to a quorum. This is critical for Spanner's performance: each cross-region write already incurs ~20-100ms of inter-datacenter latency, so eliminating the extra Phase 1 round halves the per-write cost under normal conditions.

Correctness Invariants

  • Durability of accepted values: acceptors must write accepted_n and accepted_val to stable storage before responding. A crash-and-restart that forgets an accepted value can allow the same value position to be overwritten.
  • No gaps in leader authority: before a new proposer can take over, it must complete Phase 1 with a higher ballot and learn any previously accepted values for every log slot it wants to use.
  • Accept condition is inclusive: using >= means a proposer can retry its own Phase 2 after a transient failure, provided no other proposer ran Phase 1 with a higher ballot in the meantime.

How Spanner Uses Phase 2

In Spanner, Phase 2 corresponds to the replication of a write batch across a Paxos group. The leader (Paxos proposer) sends the write to followers (acceptors) over a dedicated replication channel. Once a majority acknowledge, the write is committed and the leader applies it to the local key-value store. The commit timestamp — assigned via TrueTime — is embedded in the log entry itself, making the MVCC version available to snapshot readers as soon as the entry is applied.

main.py
python

Sign in to run and submit code.