Subtracks & Tasks
Why Unique IDs Are Hard
Generate Unique IDs Using Node ID and Timestamp
Distributed systems need unique identifiers for entities, events, and messages. Without a central authority, each node must generate IDs independently...
Add Random Salt to Prevent Collisions
Your basic ID generator might produce duplicates if called multiple times within the same millisecond. Add a random component or sequence counter to e...
Implement ID Generation During Network Partition
Distributed systems must handle **network partitions** - times when nodes cannot communicate with each other. Your ID generator must continue working ...
Validate Uniqueness Across Distributed Nodes
Run your ID generator through Maelstrom verification to prove uniqueness. Maelstrom collects all generated IDs across all nodes and checks for duplica...
Optimize for High-Throughput ID Generation
Optimize your ID generator for maximum throughput. Real systems like Twitter generate millions of IDs per second. Your implementation should handle hi...
Snowflake IDs (Twitter's Approach)
Implement Snowflake ID Bit Layout
Twitter's Snowflake generates unique, roughly-sorted 64-bit IDs without coordination. The layout is: ``` | 1 bit unused | 41 bits timestamp | 10 bits...
Implement Timestamp Component with Custom Epoch
The timestamp component of a Snowflake ID uses 41 bits to represent milliseconds since a **custom epoch**. Using Unix epoch (1970-01-01) wastes over 5...
Implement Sequence Counter with Overflow Handling
Within a single millisecond, each Snowflake node can generate up to 4096 unique IDs (12-bit sequence counter). When traffic bursts exceed this limit, ...
Multi-Node Snowflake ID Verification
Snowflake IDs derive their uniqueness from the machine_id component. With 10 bits, you can have 1024 unique machines generating IDs without any coordi...
Logical Clocks as IDs
Implement a Lamport Clock
Leslie Lamport showed that in a distributed system, you don't need physical clocks to order events. A **Lamport clock** is a simple counter that provi...
Demonstrate Lamport Clock Causality Limitation
Lamport clocks guarantee: if A happens-before B, then L(A) < L(B). But the **converse is not true**: L(A) < L(B) does NOT mean A happened before B. Tw...
Implement Vector Clocks
Vector clocks solve the limitation of Lamport clocks: they can detect **concurrent events**. Each node maintains a vector of N counters (one per node)...
Implement Happened-Before Detector with Vector Clocks
With vector clocks, you can determine the **exact causal relationship** between any two events: - **A -> B** (A happened before B): A[i] <= B[i] for ...
Vector Clock Conflict Detection in Key-Value Store
In databases like Riak, vector clocks detect **write conflicts**. When two clients write to the same key concurrently (neither saw the other's write),...
Hybrid Logical Clocks (HLC)
Understand and Implement HLC Format
Hybrid Logical Clocks (HLC), used in CockroachDB and Spanner, combine physical time with a logical counter. The format is `(physical_ms, logical_count...
Implement HLC Receive Rule
The HLC receive rule is more complex than Lamport's. On receiving a message with HLC `(msg.pt, msg.lc)`: 1. `new_pt = max(local.pt, msg.pt, now_ms())...
HLC Handles Backward Clock Gracefully
NTP can adjust the system clock backward. Physical timestamps break. Lamport clocks keep working but lose time correlation. HLC handles it gracefully ...
Compare HLC, UUID v4, and Snowflake IDs
Different ID schemes have different tradeoffs. Your task is to implement all three and return a comparison table. Implement a `compare_ids` handler t...
HLC-Based Unique ID Generation for Maelstrom
Integrate HLC-based IDs into the Maelstrom `generate` workload. Each generated ID must be globally unique across all nodes and should preserve causal ...
Interview Prep
Common interview questions for Backend / Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
Snowflake-style: 41-bit millisecond timestamp + 10-bit machine ID (datacenter + worker) + 12-bit sequence. Gives 4096 IDs/ms/machine, IDs sort by time, no coordination. Alternatives: ULIDs (128-bit, base32 encoded, monotonic within millisecond), UUID v7 (timestamp-first UUID). Discuss clock drift and the monotonicity problem when NTP steps the clock backward.
Model Answer
Auto-increment requires a central sequence generator, which is a single point of failure and a scaling bottleneck. Solutions: distributed ID generators (Snowflake), UUIDs (random, no coordination but poor index locality), shard-aware IDs that encode shard ID. Instagram used Postgres sequences per shard with a large step size to avoid coordination.
Model Answer
If time moves backward, you might generate a duplicate ID (same timestamp + same sequence). Snowflake's solution: detect backward clock movement (currentTime < lastTimestamp), then wait until the clock catches up. Some implementations add a machine-restart counter to the ID. Hybrid Logical Clocks (HLCs) use max(wallClock, lastSeenHLC) to stay monotonic even across NTP adjustments.
Model Answer
The birthday problem: with n IDs drawn from a space of size N = 2^122, the collision probability is approximately 1 - e^(-n^2 / 2N). With n = 10^9, this is about 10^18 / (2 * 2^122) ≈ 10^-19. Effectively zero. In practice, UUID collisions are so rare that the RNG quality matters more than the math.
Model Answer
B-tree indexes store keys in sorted order. Random UUIDs insert into random positions, causing frequent page splits and fragmentation. Sequential IDs always append to the right of the B-tree, maximizing page fill and minimizing splits. Impact: 3-5x write throughput difference on high-insert workloads. Solution: use time-sortable IDs (Snowflake, ULID, UUID v7) which have the uniqueness of UUIDs with the sequential insert properties of auto-increment.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
Clocks on different nodes are not perfectly synchronised. Even on the same node, multiple requests in the same millisecond share the same timestamp.
The fix
Combine timestamp with a node-unique component (node ID, MAC address, or a monotonic sequence counter). This is the Snowflake approach: `timestamp | node_id | sequence`.
Why it happens
UUID v4 is purely random — there is no ordering signal. Sorting UUIDs does not give insertion order, which hurts database index locality.
The fix
Use UUID v7 or ULIDs when sort order matters. They embed a millisecond timestamp prefix so lexicographic order approximates creation time.
Why it happens
A shared global sequence counter would require coordination. If each node starts its sequence from 0 and you omit the node ID component, IDs collide.
The fix
Reserve a fixed bit range for the node ID (e.g., 10 bits for up to 1024 nodes). The sequence counter only needs to be unique within a single node within a single millisecond.
Why it happens
NTP can step the clock backward. A naive implementation uses `time.now()` directly, so a backward step produces a smaller timestamp than the previous ID.
The fix
Track the last-issued timestamp. If the current time is less than or equal to the last timestamp, either wait (spin) until the clock advances or increment only the sequence. Snowflake-style implementations detect this explicitly.
Why it happens
Concatenating variable-length strings without a separator or fixed-width encoding creates ambiguous encodings.
The fix
Use fixed-width fields (zero-padded integers, bit packing) or a delimited format with a character that cannot appear in any component.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | UUID v4 | Snowflake | ULID |
|---|---|---|---|
| Globally unique | Yes (probabilistic) | Yes (deterministic) | Yes (probabilistic) |
| Lexicographically sortable | No | Yes (time-prefix) | Yes (ms-precision prefix) |
| Coordination required | None | Node ID assignment | None |
| Size | 128-bit / 36-char string | 64-bit integer | 128-bit / 26-char string |
| Clock dependency | None | Yes — clock skew causes duplicates | Yes — monotone increment handles skew |
| DB index performance | Poor (random inserts fragment B-tree) | Excellent (sequential) | Excellent (sequential) |
| Standard | RFC 4122 | Twitter/internal | Community spec |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.