Concept
Paxos Phase 1: Establishing a Ballot
Paxos was introduced by Leslie Lamport in 1989 and published in 1998. Despite its reputation for complexity, the core algorithm is a clean two-round message exchange. Phase 1 is the "leader election plus history scan" phase: a proposer broadcasts its intention to lead round n and learns whether any value was already chosen in a prior round.
Why Two Phases Are Necessary
Consider two concurrent proposers P1 and P2, each wanting to propose a different value. Without coordination, P1 might convince acceptors A1 and A2 to accept value X, while P2 convinces A2 and A3 to accept value Y. A2 would have accepted two different values — violating the fundamental consensus requirement that at most one value is ever chosen.
Paxos avoids this through a two-step discipline:
- Phase 1 (Prepare/Promise): before proposing a value, collect promises from a majority that they will not accept any older proposal. Also discover any value that was previously accepted — if one exists, you must propose that value, not your own.
- Phase 2 (Accept/Commit): only after establishing leadership do you broadcast the value and wait for majority acceptance.
The PREPARE Message and Proposal Numbers
Each proposal carries a unique, monotonically increasing proposal number n. In practice, n is often structured as (round, server_id) to ensure uniqueness across concurrent proposers. A proposer sends PREPARE(n) to all acceptors:
on PREPARE(n) received by acceptor:
if n > max_n_promised:
max_n_promised = n
return PROMISE(n, accepted_n, accepted_val)
else:
return DENY
An acceptor that responds with PROMISE commits to two things: it will not promise for any lower round, and it will not accept any proposal numbered less than n. This is the key safety invariant — promises make lower-numbered rounds permanently ineffective.
Handling Prior Accepted Values
A PROMISE carries the acceptor's current (accepted_n, accepted_val) — the proposal number and value it most recently accepted, if any. The proposer collects all promises and checks: did any acceptor report a previously accepted value? If so, the proposer must use the value from the promise with the highest accepted_n. This is not merely convention — it is a safety requirement.
The intuition: if acceptor A1 has already accepted (n=5, val=V), it means that value V was potentially chosen by a prior quorum. If a new proposer ignores V and proposes something else, it might contradict an already-chosen value. By forcing the new proposer to use V, Paxos ensures that any chosen value propagates forward through time.
Quorum and Phase 1 Outcome
A proposer wins Phase 1 when it has collected PROMISE responses from a strict majority of acceptors (more than half). With 3 acceptors, majority = 2; with 5, majority = 3. The proposer then determines its value for Phase 2:
- If at least one promise included an
accepted_val, use the value from the promise with the highestaccepted_n. - If no promise included an
accepted_val, the proposer is free to choose any value — outputFREE_CHOICE.
If fewer than a majority respond with PROMISE (either because they DENY or because they are unreachable), Phase 1 fails. The proposer should back off and retry with a higher proposal number.
Correctness Invariants
- Promise durability: acceptors must persist
max_n_promisedto stable storage before responding. A crashed-and-restarted acceptor that forgets its promise could cause safety violations. - Monotonicity: proposal numbers must strictly increase. If P1 uses
n=5and P2 usesn=5, they are in conflict and neither is safe to proceed. - Value preservation: if any promise carries an accepted value, the proposer loses its right to choose — it must propagate the discovered value.
How Spanner Uses This
In Spanner, each shard is a Paxos group of typically 5 replicas. Paxos is used in Multi-Paxos mode: a single stable leader runs Phase 1 once to establish authority over all future log slots, then runs only Phase 2 for each write. This reduces steady-state latency to one round-trip. Leader leases (backed by TrueTime) prevent two nodes from simultaneously believing they are leader.
Sign in to run and submit code.