ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/projects/mini-dynamo/tasks/dynamo-t3-s2-read-repair
DS

Heal Stale Replicas with Read Repair

Mini-Dynamo / Anti-Entropy
intermediate

Concept

Read Repair

Even with quorum writes, some replicas may miss updates — a node that was briefly unreachable, or one that received only a hint. Read repair is an efficient way to heal these replicas lazily, without a dedicated background sweeper.

Protocol

  1. Coordinator reads from R replicas (quorum read).
  2. It finds the value with the highest version — the winner.
  3. Any responding replica whose version is lower than the winner is stale.
  4. The coordinator sends the winner's value to each stale replica in the background.

Why "In the Background"?

The repair is sent after the client receives its response. The client doesn't wait for repairs — it would increase tail latency. Instead, the coordinator fires and forgets the repair messages asynchronously.

Eventual Consistency

Over time, every read causes any stale replica it contacts to catch up. This is a form of gossip-driven convergence: eventually, every replica that a read touches will hold the latest value, even without a central synchronisation protocol.

Limitations

Read repair only heals replicas that are read. A key that is written but never read again may remain stale indefinitely on some replicas. This is why Dynamo also runs anti-entropy (Merkle tree syncs) as a background sweep over all keys.

main.py
python

Sign in to run and submit code.