Building Distributed Systems from Scratch

TrueTime: Embracing Clock Uncertainty

Most distributed systems paper over clock skew with NTP and hope the error stays small. Spanner takes the opposite approach: it treats clock uncertainty as a measurable, bounded quantity and exposes it explicitly through the TrueTime API. The result is a fundamentally simpler and safer reasoning model for distributed timestamps.

The TrueTime API

TrueTime exposes three operations, all defined in the original Spanner paper (Corbett et al., OSDI 2012):

TT.now() returns a TTinterval{earliest, latest} such that the true absolute time t_abs satisfies earliest ≤ t_abs ≤ latest. The half-width epsilon = (latest - earliest) / 2 is the instantaneous uncertainty.
TT.after(t) returns true if and only if t is definitely in the past: t < TT.now().earliest. Equivalently: t < current_time - epsilon.
TT.before(t) returns true if and only if t is definitely in the future: t > TT.now().latest. Equivalently: t > current_time + epsilon.

The key guarantee is that neither TT.after nor TT.before will ever return a false positive. TT.after(t) being true means you can be completely certain t is in the past — not merely likely, but provably so given the bounded uncertainty.

Hardware Backing: GPS and Atomic Clocks

Google backs TrueTime with two independent time references per datacenter. Each datacenter has GPS receivers with dedicated antennas, plus atomic (cesium or rubidium) oscillators. GPS provides accurate time but can lose signal. Atomic clocks drift independently of GPS and serve as a backup. The probability of both failing simultaneously is engineered to be vanishingly small. The time masters poll both sources, run a variant of Marzullo's algorithm to reject outliers, and advertise their local epsilon to clients. Clients combine readings from multiple masters to bound their own uncertainty.

In practice, Spanner's epsilon is typically under 7 milliseconds. This means the TT.now() interval spans at most 14ms — the window within which the true time could be anything. Commit wait therefore adds at most ~14ms of latency to write transactions.

Why Bounded Uncertainty Enables External Consistency

External consistency is the property that if a real-world observer sees transaction T1 commit, then starts T2, then T2's commit timestamp must be greater than T1's. This matches real-world causality — it is stronger than serializability because it respects wall-clock order across independent clients.

Without TrueTime, achieving external consistency requires either a global lock server (a bottleneck), vector clocks (require client cooperation), or Lamport clocks (cannot bound skew). TrueTime sidesteps all of these by making the uncertainty window a hard, measurable bound that the system can wait out.

Correctness Invariants

Interval guarantee: the true time always falls within [earliest, latest]. This is a hard guarantee backed by hardware and network monitoring.
Monotonicity: successive calls to TT.now() must return non-decreasing earliest values. A clock that jumps backward would break the after/before predicates.
Epsilon tracking: when a time master loses contact with its reference, epsilon grows at the crystal oscillator's drift rate until contact is restored. The system must propagate this growing epsilon to all clients promptly.

How Spanner Uses TrueTime

Every write transaction in Spanner is assigned a commit timestamp s = TT.now().latest — the pessimistic upper bound on the current time. The transaction is then held in the commit-wait state until TT.after(s) is true. Only then is it released to clients. This ensures that any future transaction starting after the release will see a TT.now().earliest strictly greater than s, guaranteeing that the future transaction's commit timestamp will be strictly greater than s. External consistency follows directly.