ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/sharder
Tracks/The Sharder
08

The Sharder

Advanced
Advanced|15 tasks

Implement horizontal scaling through sharding. Build shard controllers, consistent hashing, configuration changes, and data migration to create a scalable distributed storage system.

Subtracks & Tasks

Interview Prep

Common interview questions for Distributed Systems / Backend Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Modulo hashing (key % N) remaps ~(N-1)/N of all keys when a node is added/removed — most keys move. Consistent hashing maps both keys and nodes to a ring; each key goes to the next clockwise node. When a node is added/removed, only ~1/N of keys move (just those between the new node and its predecessor). This minimizes disruption to caches and stateful backends during cluster changes. Virtual nodes (multiple ring positions per physical node) improve balance and allow heterogeneous capacity.

Model Answer

Detection: monitor per-shard write throughput and CPU. A hotspot shows significantly higher rates on one shard. Causes: celebrity problem (one key is extremely popular), time-based keys (all writes go to the "today" shard), poor shard key choice. Solutions: (1) Add random suffix to the hot key and scatter writes across N virtual shards, then aggregate on read, (2) Cache the hot key (read cache, write buffer), (3) Use a different shard key that distributes better, (4) Pre-shard by anticipating known hotspots (Twitter does this for celebrity accounts during major events).

Model Answer

Cross-shard transactions require distributed coordination. Options: (1) Two-phase commit (2PC): a coordinator sends PREPARE to both shards, collects votes, then sends COMMIT or ABORT. Problem: coordinator failure blocks participants. (2) Saga pattern: sequence of local transactions with compensating transactions on failure. No atomicity guarantees during execution, but eventual consistency. (3) Google Spanner / CockroachDB: 2PC over Paxos/Raft groups per shard, using TrueTime/HLC for external consistency. Cross-shard transactions are inherently more expensive than single-shard ones — a common design goal is to colocate frequently transacted data on the same shard.

Model Answer

Cassandra uses a consistent hash ring with a configurable partitioner (Murmur3 by default). The partition key is hashed to a position on the ring. With replication factor N, the row is stored on the N consecutive nodes clockwise from that position. When a node fails, reads/writes use hinted handoff (temporarily written to another node, replayed when the failed node recovers) or serve from the remaining replicas. With RF=3, Cassandra can tolerate 1 node failure for all operations and 2 node failures for reads (depending on consistency level).

Model Answer

A shard controller (or metadata service) maintains the mapping of key ranges to physical shard locations. Clients query the controller to find which shard to contact. It handles shard splits (when a shard is too large), merges, and rebalancing when nodes are added/removed. In MongoDB, this is mongos + config servers. In Vitess, it is the vtgate + topology service. The shard controller is a critical single-source-of-truth and must itself be highly available (replicated via Raft/Paxos) and consistent.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

`key % n` changes for nearly all keys when n changes by 1. In a cluster of 10 nodes, removing one node remaps ~90% of keys.

The fix

Use consistent hashing with a virtual node ring. Adding/removing a physical node only remaps the keys on the adjacent arc of the ring, proportional to 1/n of the keyspace.

Why it happens

Pure sharding without replication means each shard is owned by exactly one node. That node going down takes its entire keyspace offline.

The fix

Replicate each shard across R nodes (typically R=3). Reads and writes use quorum (R=2 of 3). This is the approach used by Dynamo and Cassandra.

Why it happens

If the key distribution is skewed (e.g., one key gets 90% of writes), consistent hashing still maps all that traffic to a single virtual node.

The fix

For write-heavy keys, shard at a sub-key level (e.g., by user + random suffix). For read-heavy keys, replicate aggressively and spread reads across replicas.

Why it happens

During a rebalance, the node that receives a request may no longer own the key by the time it processes it.

The fix

When a node receives a request for a key it no longer owns, it should forward to the correct owner (or return a redirect hint) and the client library should retry once.

Why it happens

An in-memory shard map is lost on restart. If nodes independently re-derive the map they may produce different results.

The fix

Store the shard map in a consistent, highly available metadata service (etcd, ZooKeeper, or the Maelstrom `lin-kv` service). All nodes read the same authoritative map.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

DimensionRange ShardingHash ShardingConsistent Hashing
Key mappingKey falls within a shard's range (e.g., A-M, N-Z)hash(key) % n determines the shardKey maps to position on a ring; nearest clockwise shard owns it
Range query supportExcellent — contiguous keys are co-locatedNone — related keys scatter across shardsNone by default
Rebalancing on add/removeSplit or merge affected ranges onlyRemaps ~all keys (catastrophic)Remaps only ~1/n of keys
Hotspot riskHigh — sequential writes concentrate on one shardLow — hash distributes uniformlyLow with virtual nodes
Metadata overheadShard map needed (small)Just the formulaRing with virtual nodes (larger)
Used inHBase, Bigtable, SpannerNaive setups, memcached (old)DynamoDB, Cassandra, Riak
Verdict:Use range sharding when range queries are critical. Consistent hashing for key-value stores that need smooth scaling. Avoid plain modulo hashing in production.

Concepts Covered

shardingconfigurationcoordinationconsistent hashingkey distributionvirtual nodesatomic transitionmigrationdata transferconsistencysharded storageroutingend-to-endhash ringkey ownershipclockwise lookupminimal disruptionvnodeseven distributionload balancinghash collisionnode additionkey migrationpredecessor takeovernode removalgraceful shutdowncrash recoverykey takeoversuccessor promotionrendezvous hashinghighest random weightHRWconsistent hashing alternativeweighted nodesscatter-gatherquery coordinatorpartial resultstimeout handlingfault tolerancedistributed aggregationspartial aggregatesCOUNTSUMAVGmerge functionsalgebraic propertiesdistributed joinshash partitioningco-located joinsshuffle joinsjoin reorderingnetwork overheadsecondary indexesglobal indexeslocal indexesindex shardingwrite amplificationdistributed sortingtop-N querymerge sorttie handlingpaginationconsistent ordering

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.