Subtracks & Tasks
At-Most-Once and At-Least-Once Delivery
Implement Basic Message Queue
Build a basic in-memory message queue: 1. Producers enqueue messages 2. Consumers dequeue messages 3. Messages delivered in FIFO order 4. Thread-safe...
Add Consumer Groups with Partitions
Implement Kafka-style consumer groups: 1. Topic has multiple partitions 2. Messages with same key go to same partition 3. Consumer group: each partit...
Implement At-Least-Once Delivery
Guarantee at-least-once delivery: 1. Consumer receives message (not removed from queue) 2. Message marked as "in-flight" with timestamp 3. Consumer p...
Implement Exactly-Once Semantics
Achieve exactly-once processing semantics: Producer side: 1. Assign unique ID to each message 2. Queue deduplicates by ID Consumer side: 1. Track pr...
Add Dead Letter Queues
Implement dead letter queues for failed messages: 1. Track retry count for each message 2. On processing failure, increment retry count 3. After N fa...
Exactly-Once Delivery
Understand Exactly-Once Delivery Challenges
**The exactly-once challenge**:. ```. Problem: Distributed systems make guarantees hard. Scenario 1: Producer retry. 1. Producer sends message to queu...
Implement Idempotent Consumers
**Idempotent consumers**:. ```. Problem: Messages may be delivered multiple times. Causes:. 1. Consumer crashes before ACK. 2. Network failures. 3. Qu...
Implement Transactional Message Processing
Transactional message processing ensures atomicity between message consumption and database updates, enabling exactly-once semantics through coordinat...
Implement Outbox Pattern
**The outbox pattern**:. ```. Problem: Publishing messages reliably. Scenario:. 1. Start database transaction. 2. Update business data (create order)....
Implement Two-Phase Commit for Queue and Database
Two-phase commit (2PC) coordinates atomic commits across multiple distributed systems, ensuring all participants commit or rollback together. **Probl...
Interview Prep
Common interview questions for Backend / Data Infrastructure Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
At-most-once: messages may be lost, never duplicated. Producer fires and forgets; no retries. At-least-once: messages are never lost but may be duplicated. Producer retries on failure; consumer may process same message twice. Kafka default is at-least-once for producers with acks=all and retries enabled. Exactly-once: each message is processed exactly once. Kafka supports this since 0.11 via idempotent producers (dedup by sequence number) and Kafka Streams transactions. Requires careful configuration and is slower. Most production systems use at-least-once + idempotent consumers rather than exactly-once semantics in the broker.
Model Answer
10M/day is ~116 events/second — modest. Kafka handles millions/second. Architecture: IoT devices -> MQTT broker (lightweight protocol for constrained devices) -> Kafka (durable event log, partitioned by device ID for ordering) -> Stream processor (Flink/Kafka Streams) for real-time aggregations -> Time-series DB (InfluxDB/TimescaleDB) for storage -> dashboard. Key decisions: partition by device ID to ensure ordering per device; use compacted topics for latest-value semantics; set retention based on replay needs. Scale: Kafka clusters can handle this with a single broker; add replicas for durability.
Model Answer
Kafka triggers a consumer group rebalance. The group coordinator detects the consumer death (via missed heartbeats within session.timeout.ms, default 10s). A new partition assignment is computed and the remaining 2 consumers each take one more partition (1 consumer gets 2 partitions, the other gets 1). During the rebalance, consumption pauses. The new assignment starts from the last committed offset of the dead consumer — any messages processed but not committed since the last commit are reprocessed (at-least-once delivery). To minimize rebalance disruption, use incremental cooperative rebalancing (KIP-429) available since Kafka 2.4.
Model Answer
A dead letter queue (DLQ) receives messages that failed processing after a configured number of retries. Scenario: an order-processing service receives a malformed order payload that always raises a JSON parsing exception. Without a DLQ, the message is retried indefinitely, blocking the queue (if FIFO) or consuming retries for other messages. With a DLQ: after 3 failures, the message moves to the DLQ. Normal processing continues. An operator inspects the DLQ, identifies the malformed order, fixes the bug, and replays. DLQs are essential in any at-least-once delivery system for separating transient failures (retry) from poison pill messages (DLQ).
Model Answer
Backpressure signals the producer to slow down when the consumer is overwhelmed. Approaches: (1) Bounded queues: when the queue is full, the producer blocks or receives an error. Producer reduces rate or sheds load. (2) Pull-based consumption: consumers pull at their own rate (Kafka's model). Consumer lag grows visibly; auto-scaling can add consumers. (3) Rate limiting at ingestion: API gateway applies rate limits based on consumer group lag metrics. (4) Reactive Streams: back-pressure protocol built into the stream abstraction (RxJava, Project Reactor, Akka Streams). In practice: monitor consumer lag with alerts, auto-scale consumers, and set circuit breakers on the producer side to shed non-critical events under sustained lag.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
An early ACK tells the broker the message is done. If the consumer then crashes, the message is gone — the broker will not redeliver it.
The fix
ACK only after all processing and any downstream writes are complete. Use at-least-once delivery semantics with idempotent processing to handle redeliveries.
Why it happens
A message that causes a consumer to crash will be redelivered indefinitely (until the max delivery count is exceeded if configured).
The fix
Implement a dead-letter queue (DLQ). After N failed delivery attempts, route the message to the DLQ for manual inspection rather than retrying forever.
Why it happens
In Kafka-style partitioned queues, each partition is assigned to exactly one consumer in a group. Extra consumers beyond the number of partitions have nothing to consume.
The fix
Keep consumer count <= partition count. If you need more throughput, increase the number of partitions first, then scale consumers to match.
Why it happens
Messages that are no longer actionable (e.g., a location update from 2 hours ago) waste consumer capacity and may cause incorrect behaviour if processed late.
The fix
Set a per-message TTL or a queue-level TTL for time-sensitive messages. The broker discards messages that expire before being consumed.
Why it happens
Waiting for a publish ACK from the broker adds the broker's round-trip latency to every API response.
The fix
Publish asynchronously using a local outbox pattern: write the event to a local DB table (outbox) atomically with the business transaction, then have a background process relay it to the broker.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | Point-to-Point | Pub-Sub | Log-Based |
|---|---|---|---|
| Message routing | One producer → one consumer (competing consumers OK) | One producer → all subscribers of the topic | One producer → all consumer groups (each gets every message) |
| Message deleted after consume | Yes | Yes (after all subscribers ACK) | No — messages retained for configurable period |
| Replayability | No | No | Yes — seek to any offset and replay |
| Ordering guarantee | FIFO per queue | Per-partition in most systems | Total order within a partition |
| Consumer scaling | Add consumers to the same queue | Add subscriber queues | Add partitions; consumer groups scale independently |
| Examples | RabbitMQ queues, Amazon SQS | Google Pub/Sub, SNS, Redis Pub/Sub | Apache Kafka, Apache Pulsar, AWS Kinesis |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.