Message Queues: The Right Way to Decouple Services
A message queue looks simple from the outside. Under the load and failure scenarios of production, the design decisions that seem minor turn out to matter enormously.
Message Queues: The Right Way to Decouple Services
A message queue looks simple from the outside. Producer puts a message in. Consumer takes it out. Queue guarantees delivery. The end.
Under the load and failure scenarios of production, the design decisions that seem minor turn out to matter enormously. Exactly what does "guarantees delivery" mean when the consumer crashes after reading but before processing? What happens when producers are faster than consumers? What does "in order" mean when you have multiple consumers?
Why Queues Exist
The core value of a message queue is temporal decoupling. The producer and consumer do not need to be available at the same time. The producer can publish at its own pace. The consumer can process at its own pace. The queue absorbs the difference.
Without a queue, direct service calls create tight coupling. If service A calls service B synchronously, and service B is down, service A fails too. If service B is slow, service A blocks. If service B needs to be scaled, you have to coordinate the scaling with A's traffic patterns.
With a queue, service A publishes to the queue and moves on. Service B consumes from the queue when it is ready. B's failures do not propagate to A. B's scaling is independent of A. The coupling is through message format (which you define explicitly) rather than through availability (which you cannot control).
This is why queues appear at every major architectural boundary in large systems. AWS uses SQS between its own internal services. Google uses Pub/Sub. Every microservices architecture eventually gets a queue somewhere.
Delivery Guarantees
The most important design axis for a message queue is its delivery guarantee. There are three:
At-most-once: a message is delivered zero or one times. The queue sends the message and forgets about it. If the consumer crashes before processing it, the message is gone. Acceptable for high-volume telemetry where losing a percentage of events is fine. Never acceptable for financial transactions.
At-least-once: a message is delivered one or more times. The queue retains the message until the consumer acknowledges it. If the consumer crashes before acknowledging, the queue redelivers the message. The consumer might process the same message twice.
Exactly-once: a message is delivered exactly once. Theoretically clean. Practically expensive. Achieving exactly-once delivery requires coordination between the queue and the consumer, usually via distributed transactions or idempotency keys. Kafka's exactly-once semantics require careful configuration and introduce significant latency overhead.
Most production systems use at-least-once delivery and build idempotency into the consumer. Idempotency means that processing the same message twice has the same effect as processing it once. For a payment confirmation, you would check whether you have already recorded this payment ID before processing. For an email send, you would check whether this message ID has already been sent. The idempotency check is cheaper than the coordination overhead of true exactly-once delivery.
Ordering
Message ordering is the other major design axis. Within a single queue, you generally want messages from a given producer to be consumed in the order they were produced. But what does "order" mean across multiple producers, or across a partitioned queue?
Kafka solves this by assigning each message to a partition based on a key. All messages with the same key go to the same partition and are consumed in order within that partition. Messages across partitions have no ordering guarantee.
This is the right model for most use cases. You want all messages for user ID 12345 to be processed in order (so you do not process "cancel subscription" before "create subscription"). You do not care about the relative ordering between user 12345 and user 67890. Partitioning by user ID gives you the ordering guarantee you need without the throughput penalty of global ordering.
Global ordering — where every message across all producers is totally ordered — requires a single partition (or a centralized sequencer), which is a scaling ceiling. RabbitMQ's traditional model is essentially a single-partition queue. This is why Kafka, which was designed for high throughput, chose partition-based ordering.
Backpressure
What happens when producers write faster than consumers can process?
The queue fills up. If you set a size limit and producers keep writing, the queue starts rejecting messages (or blocking producers). If you do not set a size limit, the queue grows until the host runs out of memory.
Neither outcome is great. The right answer is backpressure: the queue communicates to producers that they should slow down. Producers that respect backpressure reduce their write rate. This propagates the signal up the stack — if the processing step is backed up, the data ingestion step should slow down.
Kafka handles this through flow control: consumers fetch at their own rate, and the broker applies no pressure on producers. Producers can write as fast as they want, up to the disk capacity of the broker. Consumer lag (the offset difference between the latest message and the last consumed message) is the observable signal of backup.
TCP's congestion control is the most well-known example of backpressure in networking. When the network is congested, TCP reduces the send rate. Message queues face the same problem at a higher level of abstraction.
Dead Letter Queues
Some messages cannot be processed successfully, no matter how many times you retry. The payload is malformed. The business logic raises an exception that will always raise. The external API the consumer depends on is gone.
Without a dead letter queue (DLQ), these messages get retried forever, or the consumer crashes on them repeatedly, or they get lost.
A DLQ is a separate queue where messages go after a configurable number of failed delivery attempts. Processing resumes for other messages. Failed messages sit in the DLQ where an operator can inspect them, fix the underlying issue, and replay them.
DLQs are a first-class feature in AWS SQS, RabbitMQ, and most production queue systems. The details of configuration matter: how many retry attempts before DLQ, what is the visibility timeout, are failures tracked per message or per consumer.
What Kafka Changed
Apache Kafka, released by LinkedIn in 2011, changed how the industry thinks about message queues. The key insight was to treat the queue as a durable, ordered log rather than a transient buffer.
Traditional queues (RabbitMQ, ActiveMQ) delete messages after they are consumed. Kafka retains messages for a configurable retention period, regardless of whether they have been consumed. This has two consequences.
First, consumers can replay. If your consumer has a bug and processes a week of messages incorrectly, you can fix the bug, reset the consumer offset to a week ago, and replay. The raw events are still there.
Second, multiple independent consumers can read the same stream without interfering with each other. Your analytics pipeline and your notification service can both read the same event stream. They maintain separate consumer group offsets. Neither knows the other exists.
These properties make Kafka more of an event log than a traditional queue. It is the foundation of the event sourcing architectural pattern. Your system state is derived from a sequence of immutable events stored in Kafka. Any component can replay the event log to rebuild state.
Martin Kleppmann's book "Designing Data-Intensive Applications" describes this as "the database inside out" — instead of a database with a change log tacked on, you have a log with a derived view tacked on.
Building a Queue
The core components of a message queue are: a storage layer (where messages live), an indexing structure (how you find the next message to deliver), a delivery protocol (how messages get from broker to consumer), and an acknowledgment protocol (how consumers signal successful processing).
In Kafka's model, the storage layer is a set of append-only log files on disk. The index is the offset — a monotonically increasing integer for each partition. Delivery is pull-based (consumers fetch, the broker does not push). Acknowledgment is a commit of the consumer group offset.
This design is extremely simple at each layer, which is why Kafka can handle millions of messages per second on commodity hardware. The complexity is in the Raft-based replication that keeps the log durable across broker failures.
Ready to build it? The Queues track builds a durable message queue with configurable delivery guarantees. You will implement the storage layer, consumer offset tracking, and acknowledgment-based delivery. The same concepts power Kafka, SQS, RabbitMQ, and Pub/Sub.
Build it yourself
Reading about distributed systems is useful. Building them is how you actually learn.
Start the Queues track