Concept
Sloppy Quorums and Hinted Handoff
Strict quorums sacrifice availability: if W of the preferred replicas are down, the write fails. For a system like Amazon's shopping cart — where it is completely unacceptable to tell a customer they cannot add an item — this is not good enough. Dynamo's solution is the sloppy quorum: when preferred nodes are unavailable, write to any reachable node in the ring and record a hint about the intended destination.
The Preference List
Every key has an ordered preference list of N nodes derived by walking the consistent hash ring clockwise from the key's position. Under normal operation, Dynamo writes to the top N nodes on this list. These are called the preferred replicas. During a partition or node failure, some preferred nodes may be unreachable.
Accepting Hints
If a preferred node is down, Dynamo finds the next available node in the ring that is not on the preference list and writes there instead. That node stores the value along with a hint — metadata saying "this data actually belongs to node X":
{ key: "session:xyz", value: "...", hint: "node_B" }
The write to the hint-holder counts toward the W quorum. So a sloppy quorum of W=3 can be satisfied even if zero preferred nodes are available, as long as W other nodes in the cluster are reachable. The "sloppy" qualifier means the quorum may be formed from non-preferred nodes.
Delivering Hints on Recovery
Hint-holding nodes periodically scan their local hint store. When the intended target recovers, the hint-holder delivers the stored writes and removes the local copies:
def on_recover(target_node):
for key, value, intended in self.hints.get(target_node, []):
send(target_node, key, value) # async delivery
del self.hints[target_node]
This ensures that even nodes that were completely offline during a partition will eventually receive all writes intended for them once they come back. The system heals itself without any external coordination.
Why This Matters
Hinted handoff is the mechanism that allows Amazon to claim "always-on" writes for its shopping cart. During a network partition, customers can still add items to their cart. The items are written to hint-holding nodes in reachable parts of the network. When the partition heals, the hints are delivered and the cluster converges. No write is ever lost due to a temporary failure, as long as the hint-holder itself survives.
Common Pitfalls
- Hints are not replicas: a hint-holder is not part of the preference list and does not participate in normal reads. Data written to a hint-holder is invisible to quorum reads until it is delivered to the intended node.
- Hint-holder can fail too: if the hint-holder itself fails before delivering, the hint is lost. This is why durable hints should be persisted to disk, not just held in memory.
- Not enough nodes for quorum: if fewer than W nodes are available even counting hint-holders, the write must fail. Sloppy quorums help but cannot make a fully partitioned cluster writable.
Sign in to run and submit code.