Idempotent Producer with Sequence Number Deduplication
Mini-Kafka / Producers and Consumer GroupsConcept
Exactly-Once Semantics with Idempotent Producers
In a distributed system, a producer retrying a failed write faces a fundamental dilemma: it cannot distinguish between "the broker crashed before writing the message" and "the broker wrote the message but the ACK was lost in transit." In both cases, the producer sees a timeout and retries. Without special handling, this retry delivers the message a second time — creating a duplicate record that will confuse every downstream consumer.
The Root Cause: At-Least-Once Is the Default
Kafka's default delivery guarantee is at-least-once: messages are never lost, but may be duplicated on retry. For many workloads — logging, analytics, metrics — occasional duplicates are acceptable. But for financial transactions, inventory updates, or any stateful computation, duplicates are a correctness bug that can compound across the pipeline.
The Idempotency Solution: Producer ID + Sequence Numbers
Kafka's idempotent producer turns at-least-once into exactly-once for a single partition, without requiring any distributed coordination. The mechanism is simple:
- When a producer initialises with
enable.idempotence=true, the broker assigns it a Producer ID (PID) — a cluster-wide unique identifier for this producer session. - Every message carries both the PID and a sequence number that starts at 0 and increments by 1 per successfully acknowledged message, per partition.
- The broker stores the last accepted sequence number per (PID, partition) pair. For each incoming message, it applies exactly one of three outcomes:
expected = last_seq[pid] + 1
if msg.seq == expected:
# Accept: this is the next message in sequence
write_to_log(msg)
last_seq[pid] = msg.seq
return "OK offset:{n}"
elif msg.seq == last_seq[pid]:
# Duplicate retry: already accepted this sequence number
# Do NOT write again. Return the same result as before.
return "DUPLICATE"
else:
# Sequence gap: something went wrong (message lost, out-of-order delivery)
# This is a hard error — the producer must restart its session
return "ERROR: sequence gap"
Delivery Semantic Comparison
| Semantic | What it guarantees | How to achieve in Kafka |
|---|---|---|
| At-most-once | No duplicates; messages may be lost | acks=0, no retries |
| At-least-once | No loss; duplicates possible on retry | acks=all, retries enabled |
| Exactly-once | No loss, no duplicates | Idempotent producer + transactions |
Why Sequence Numbers Work
The deduplication check is O(1) per message — the broker just compares two integers. There is no distributed lock, no log scan, no expensive coordination. The PID scopes the sequence counter to a single producer session, so multiple independent producers never interfere with each other's deduplication state.
Crucially, the sequence number does not need to be globally unique across all producers or all topics — it only needs to be unique per (PID, partition) pair. The broker maintains a small hash table mapping each (PID, partition) to its last accepted sequence number.
Extending to Transactions
Idempotency per partition eliminates single-partition duplicates. But many real workloads require atomically writing to multiple partitions — for example, debiting one account and crediting another, or producing to an output topic and committing a consumer offset simultaneously. Kafka's transactions extend idempotency to cross-partition atomic writes using a two-phase commit protocol coordinated by a transaction coordinator broker. A transactional write is either fully visible to all consumers reading in read_committed isolation, or entirely invisible — there is no partial state.
Edge Cases to Handle
- Sequence starting at 0: The first message from a newly initialised producer must have
seq=0. A sequence that starts at 1 is immediately a gap error. - Multiple producers, independent tracking: Each PID has its own
last_seqentry. A duplicate from producer p1 does not affect producer p2's sequence state. - Sequence gap on retry after gap error: If the producer receives a sequence gap error, it must reinitialise (get a new PID) before producing again. Continuing with the old PID will always produce gap errors for any sequence number other than the last accepted one.
- LOG_SIZE counts only accepted writes: Duplicates and rejected messages must not increment the log size or the offset counter.
Sign in to run and submit code.