ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/identifier
Tracks/The Identifier
02

The Identifier

Beginner
Foundations|19 tasks

Solve the fundamental problem of generating globally unique identifiers in a distributed system. You will implement various ID generation strategies, handle network partitions, and ensure uniqueness across nodes without centralized coordination.

Subtracks & Tasks

Interview Prep

Common interview questions for Backend / Distributed Systems Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Snowflake-style: 41-bit millisecond timestamp + 10-bit machine ID (datacenter + worker) + 12-bit sequence. Gives 4096 IDs/ms/machine, IDs sort by time, no coordination. Alternatives: ULIDs (128-bit, base32 encoded, monotonic within millisecond), UUID v7 (timestamp-first UUID). Discuss clock drift and the monotonicity problem when NTP steps the clock backward.

Model Answer

Auto-increment requires a central sequence generator, which is a single point of failure and a scaling bottleneck. Solutions: distributed ID generators (Snowflake), UUIDs (random, no coordination but poor index locality), shard-aware IDs that encode shard ID. Instagram used Postgres sequences per shard with a large step size to avoid coordination.

Model Answer

If time moves backward, you might generate a duplicate ID (same timestamp + same sequence). Snowflake's solution: detect backward clock movement (currentTime < lastTimestamp), then wait until the clock catches up. Some implementations add a machine-restart counter to the ID. Hybrid Logical Clocks (HLCs) use max(wallClock, lastSeenHLC) to stay monotonic even across NTP adjustments.

Model Answer

The birthday problem: with n IDs drawn from a space of size N = 2^122, the collision probability is approximately 1 - e^(-n^2 / 2N). With n = 10^9, this is about 10^18 / (2 * 2^122) ≈ 10^-19. Effectively zero. In practice, UUID collisions are so rare that the RNG quality matters more than the math.

Model Answer

B-tree indexes store keys in sorted order. Random UUIDs insert into random positions, causing frequent page splits and fragmentation. Sequential IDs always append to the right of the B-tree, maximizing page fill and minimizing splits. Impact: 3-5x write throughput difference on high-insert workloads. Solution: use time-sortable IDs (Snowflake, ULID, UUID v7) which have the uniqueness of UUIDs with the sequential insert properties of auto-increment.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

Clocks on different nodes are not perfectly synchronised. Even on the same node, multiple requests in the same millisecond share the same timestamp.

The fix

Combine timestamp with a node-unique component (node ID, MAC address, or a monotonic sequence counter). This is the Snowflake approach: `timestamp | node_id | sequence`.

Why it happens

UUID v4 is purely random — there is no ordering signal. Sorting UUIDs does not give insertion order, which hurts database index locality.

The fix

Use UUID v7 or ULIDs when sort order matters. They embed a millisecond timestamp prefix so lexicographic order approximates creation time.

Why it happens

A shared global sequence counter would require coordination. If each node starts its sequence from 0 and you omit the node ID component, IDs collide.

The fix

Reserve a fixed bit range for the node ID (e.g., 10 bits for up to 1024 nodes). The sequence counter only needs to be unique within a single node within a single millisecond.

Why it happens

NTP can step the clock backward. A naive implementation uses `time.now()` directly, so a backward step produces a smaller timestamp than the previous ID.

The fix

Track the last-issued timestamp. If the current time is less than or equal to the last timestamp, either wait (spin) until the clock advances or increment only the sequence. Snowflake-style implementations detect this explicitly.

Why it happens

Concatenating variable-length strings without a separator or fixed-width encoding creates ambiguous encodings.

The fix

Use fixed-width fields (zero-padded integers, bit packing) or a delimited format with a character that cannot appear in any component.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

DimensionUUID v4SnowflakeULID
Globally uniqueYes (probabilistic)Yes (deterministic)Yes (probabilistic)
Lexicographically sortableNoYes (time-prefix)Yes (ms-precision prefix)
Coordination requiredNoneNode ID assignmentNone
Size128-bit / 36-char string64-bit integer128-bit / 26-char string
Clock dependencyNoneYes — clock skew causes duplicatesYes — monotone increment handles skew
DB index performancePoor (random inserts fragment B-tree)Excellent (sequential)Excellent (sequential)
StandardRFC 4122Twitter/internalCommunity spec
Verdict:UUID v4 for simplicity. Snowflake or ULID when you need time-ordered IDs with good database index locality.

Concepts Covered

unique IDstimestampsnode identityrandomnesscollision preventionUUIDnetwork partitionsavailabilityCAP theoremtestingverificationglobal uniquenessperformancethroughputoptimizationbit manipulationSnowflake IDbit layoutscalabilitytimestampepochtime representationoverflow planningsequence numberoverflow handlingspin waitthroughput limitsmulti-node coordinationuniqueness verificationmonotonicityID distributionLamport clocklogical timepartial orderhappens-beforecausalityconcurrent eventshappens-before relationvector clockcausality trackingelement-wise maxconcurrent detectionhappened-beforevector comparisonconflict detectionkey-value storesibling valueslast-writer-winsHLChybrid clockphysical timelogical counterHLC mergereceive ruleclock synchronizationcausal consistencyNTP adjustmentclock backwardmonotonic guaranteeHLC resilienceSnowflakeID tradeoffsbenchmarkingunique ID generationMaelstrom workloadlinearizabilityHLC integration

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.