Tracks/The Identifier

The Identifier

Beginner

Foundations|19 tasks

Solve the fundamental problem of generating globally unique identifiers in a distributed system. You will implement various ID generation strategies, handle network partitions, and ensure uniqueness across nodes without centralized coordination.

Subtracks & Tasks

Why Unique IDs Are Hard

0/5

WH-1

beginner

Generate Unique IDs Using Node ID and Timestamp

Distributed systems need unique identifiers for entities, events, and messages. Without a central authority, each node must generate IDs independently...

unique IDstimestampsnode identity

WH-2

beginner

Add Random Salt to Prevent Collisions

Your basic ID generator might produce duplicates if called multiple times within the same millisecond. Add a random component or sequence counter to e...

randomnesscollision preventionUUID

WH-3

intermediate

Implement ID Generation During Network Partition

Distributed systems must handle **network partitions** - times when nodes cannot communicate with each other. Your ID generator must continue working ...

network partitionsavailabilityCAP theorem

WH-4

intermediate

Validate Uniqueness Across Distributed Nodes

Run your ID generator through Maelstrom verification to prove uniqueness. Maelstrom collects all generated IDs across all nodes and checks for duplica...

testingverificationglobal uniqueness

WH-5

intermediate

Optimize for High-Throughput ID Generation

Optimize your ID generator for maximum throughput. Real systems like Twitter generate millions of IDs per second. Your implementation should handle hi...

performancethroughputoptimization

Snowflake IDs (Twitter's Approach)

0/4

SN-1

intermediate

Implement Snowflake ID Bit Layout

Twitter's Snowflake generates unique, roughly-sorted 64-bit IDs without coordination. The layout is: ``` | 1 bit unused | 41 bits timestamp | 10 bits...

bit manipulationSnowflake IDbit layout+1 more

SN-2

intermediate

Implement Timestamp Component with Custom Epoch

The timestamp component of a Snowflake ID uses 41 bits to represent milliseconds since a **custom epoch**. Using Unix epoch (1970-01-01) wastes over 5...

timestampepochtime representation+1 more

SN-3

intermediate

Implement Sequence Counter with Overflow Handling

Within a single millisecond, each Snowflake node can generate up to 4096 unique IDs (12-bit sequence counter). When traffic bursts exceed this limit, ...

sequence numberoverflow handlingspin wait+1 more

SN-4

advanced

Multi-Node Snowflake ID Verification

Snowflake IDs derive their uniqueness from the machine_id component. With 10 bits, you can have 1024 unique machines generating IDs without any coordi...

multi-node coordinationuniqueness verificationmonotonicity+1 more

Logical Clocks as IDs

0/5

LO-1

intermediate

Implement a Lamport Clock

Leslie Lamport showed that in a distributed system, you don't need physical clocks to order events. A **Lamport clock** is a simple counter that provi...

Lamport clocklogical timepartial order+1 more

LO-2

intermediate

Demonstrate Lamport Clock Causality Limitation

Lamport clocks guarantee: if A happens-before B, then L(A) < L(B). But the **converse is not true**: L(A) < L(B) does NOT mean A happened before B. Tw...

causalityconcurrent eventspartial order+1 more

LO-3

advanced

Implement Vector Clocks

Vector clocks solve the limitation of Lamport clocks: they can detect **concurrent events**. Each node maintains a vector of N counters (one per node)...

vector clockcausality trackingelement-wise max+1 more

LO-4

advanced

Implement Happened-Before Detector with Vector Clocks

With vector clocks, you can determine the **exact causal relationship** between any two events: - **A -> B** (A happened before B): A[i] <= B[i] for ...

happened-beforeconcurrent eventsvector comparison+1 more

LO-5

advanced

Vector Clock Conflict Detection in Key-Value Store

In databases like Riak, vector clocks detect **write conflicts**. When two clients write to the same key concurrently (neither saw the other's write),...

conflict detectionkey-value storesibling values+1 more

Hybrid Logical Clocks (HLC)

0/5

HY-1

advanced

Understand and Implement HLC Format

Hybrid Logical Clocks (HLC), used in CockroachDB and Spanner, combine physical time with a logical counter. The format is `(physical_ms, logical_count...

HLChybrid clockphysical time+1 more

HY-2

advanced

Implement HLC Receive Rule

The HLC receive rule is more complex than Lamport's. On receiving a message with HLC `(msg.pt, msg.lc)`: 1. `new_pt = max(local.pt, msg.pt, now_ms())...

HLC mergereceive ruleclock synchronization+1 more

HY-3

advanced

HLC Handles Backward Clock Gracefully

NTP can adjust the system clock backward. Physical timestamps break. Lamport clocks keep working but lose time correlation. HLC handles it gracefully ...

NTP adjustmentclock backwardmonotonic guarantee+1 more

HY-4

intermediate

Compare HLC, UUID v4, and Snowflake IDs

Different ID schemes have different tradeoffs. Your task is to implement all three and return a comparison table. Implement a `compare_ids` handler t...

UUIDSnowflakeHLC+2 more

HY-5

advanced

HLC-Based Unique ID Generation for Maelstrom

Integrate HLC-based IDs into the Maelstrom `generate` workload. Each generated ID must be globally unique across all nodes and should preserve causal ...

unique ID generationMaelstrom workloadlinearizability+1 more

Interview Prep

Common interview questions for Backend / Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Snowflake-style: 41-bit millisecond timestamp + 10-bit machine ID (datacenter + worker) + 12-bit sequence. Gives 4096 IDs/ms/machine, IDs sort by time, no coordination. Alternatives: ULIDs (128-bit, base32 encoded, monotonic within millisecond), UUID v7 (timestamp-first UUID). Discuss clock drift and the monotonicity problem when NTP steps the clock backward.

Model Answer

Auto-increment requires a central sequence generator, which is a single point of failure and a scaling bottleneck. Solutions: distributed ID generators (Snowflake), UUIDs (random, no coordination but poor index locality), shard-aware IDs that encode shard ID. Instagram used Postgres sequences per shard with a large step size to avoid coordination.

Model Answer

If time moves backward, you might generate a duplicate ID (same timestamp + same sequence). Snowflake's solution: detect backward clock movement (currentTime < lastTimestamp), then wait until the clock catches up. Some implementations add a machine-restart counter to the ID. Hybrid Logical Clocks (HLCs) use max(wallClock, lastSeenHLC) to stay monotonic even across NTP adjustments.

Model Answer

The birthday problem: with n IDs drawn from a space of size N = 2^122, the collision probability is approximately 1 - e^(-n^2 / 2N). With n = 10^9, this is about 10^18 / (2 * 2^122) ≈ 10^-19. Effectively zero. In practice, UUID collisions are so rare that the RNG quality matters more than the math.

Model Answer

B-tree indexes store keys in sorted order. Random UUIDs insert into random positions, causing frequent page splits and fragmentation. Sequential IDs always append to the right of the B-tree, maximizing page fill and minimizing splits. Impact: 3-5x write throughput difference on high-insert workloads. Solution: use time-sortable IDs (Snowflake, ULID, UUID v7) which have the uniqueness of UUIDs with the sequential insert properties of auto-increment.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

Clocks on different nodes are not perfectly synchronised. Even on the same node, multiple requests in the same millisecond share the same timestamp.

The fix

Combine timestamp with a node-unique component (node ID, MAC address, or a monotonic sequence counter). This is the Snowflake approach: `timestamp | node_id | sequence`.

Why it happens

UUID v4 is purely random — there is no ordering signal. Sorting UUIDs does not give insertion order, which hurts database index locality.

The fix

Use UUID v7 or ULIDs when sort order matters. They embed a millisecond timestamp prefix so lexicographic order approximates creation time.

Why it happens

A shared global sequence counter would require coordination. If each node starts its sequence from 0 and you omit the node ID component, IDs collide.

The fix

Reserve a fixed bit range for the node ID (e.g., 10 bits for up to 1024 nodes). The sequence counter only needs to be unique within a single node within a single millisecond.

Why it happens

NTP can step the clock backward. A naive implementation uses `time.now()` directly, so a backward step produces a smaller timestamp than the previous ID.

The fix

Track the last-issued timestamp. If the current time is less than or equal to the last timestamp, either wait (spin) until the clock advances or increment only the sequence. Snowflake-style implementations detect this explicitly.

Why it happens

Concatenating variable-length strings without a separator or fixed-width encoding creates ambiguous encodings.

The fix

Use fixed-width fields (zero-padded integers, bit packing) or a delimited format with a character that cannot appear in any component.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	UUID v4	Snowflake	ULID
Globally unique	Yes (probabilistic)	Yes (deterministic)	Yes (probabilistic)
Lexicographically sortable	No	Yes (time-prefix)	Yes (ms-precision prefix)
Coordination required	None	Node ID assignment	None
Size	128-bit / 36-char string	64-bit integer	128-bit / 26-char string
Clock dependency	None	Yes — clock skew causes duplicates	Yes — monotone increment handles skew
DB index performance	Poor (random inserts fragment B-tree)	Excellent (sequential)	Excellent (sequential)
Standard	RFC 4122	Twitter/internal	Community spec

Verdict:UUID v4 for simplicity. Snowflake or ULID when you need time-ordered IDs with good database index locality.

Concepts Covered

unique IDstimestampsnode identityrandomnesscollision preventionUUIDnetwork partitionsavailabilityCAP theoremtestingverificationglobal uniquenessperformancethroughputoptimizationbit manipulationSnowflake IDbit layoutscalabilitytimestampepochtime representationoverflow planningsequence numberoverflow handlingspin waitthroughput limitsmulti-node coordinationuniqueness verificationmonotonicityID distributionLamport clocklogical timepartial orderhappens-beforecausalityconcurrent eventshappens-before relationvector clockcausality trackingelement-wise maxconcurrent detectionhappened-beforevector comparisonconflict detectionkey-value storesibling valueslast-writer-winsHLChybrid clockphysical timelogical counterHLC mergereceive ruleclock synchronizationcausal consistencyNTP adjustmentclock backwardmonotonic guaranteeHLC resilienceSnowflakeID tradeoffsbenchmarkingunique ID generationMaelstrom workloadlinearizabilityHLC integration

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.