Concept
Consumer Groups
Kafka's consumer group model is what enables horizontal scaling of message consumption. Multiple consumers in a group share the work of consuming a topic — each partition is assigned to exactly one consumer in the group at any time. This guarantees that messages within a partition are processed in order (a single consumer processes each partition serially), while allowing parallelism across partitions (different consumers handle different partitions concurrently).
The Group Coordinator
Every consumer group has a dedicated group coordinator — a broker elected to manage group membership. Consumers maintain a heartbeat connection to the coordinator. The coordinator detects failures (missed heartbeats) and voluntary changes (explicit JOIN/LEAVE) and triggers a rebalance whenever group membership changes.
The rebalance protocol has two phases:
- Join Phase: All active consumers send a
JoinGrouprequest to the coordinator. One consumer is elected as the group leader (not the same as a partition leader). The coordinator tells the group leader the full list of current members. - Sync Phase: The group leader computes the partition assignment and sends it to the coordinator via
SyncGroup. The coordinator distributes each consumer's slice of the assignment back to that consumer. Consumption resumes.
Round-Robin Partition Assignment
The default assignment strategy distributes partitions as evenly as possible across consumers using a round-robin over sorted partition and consumer lists:
consumers = sorted(group.members) # e.g., [c1, c2, c3]
partitions = sorted(all_partitions) # e.g., [p0, p1, p2, p3]
assignment = {c: [] for c in consumers}
for i, partition in enumerate(partitions):
assignment[consumers[i % len(consumers)]].append(partition)
# Result: c1=[p0,p3], c2=[p1], c3=[p2]
When consumer c3 leaves, the coordinator triggers a rebalance. With only c1 and c2 remaining, the same algorithm produces: c1=[p0,p2], c2=[p1,p3]. Each consumer now handles 2 partitions.
Offset Commits and Consumer Resume
Consumers do not automatically track their position in the log. Instead, they periodically commit their offset — recording how far they have consumed — to Kafka's internal __consumer_offsets topic. This topic is itself a regular Kafka log, replicated and retained.
COMMIT c1 p0 42 # c1 has successfully processed up to offset 42 on p0
COMMIT c1 p1 17 # c1 has processed up to offset 17 on p1
When a consumer restarts (crash recovery, deployment, or scale-in/scale-out), it fetches its last committed offset from the coordinator and resumes from exactly that position. Messages between the last commit and the crash are re-processed — this is the at-least-once guarantee inherent to consumer groups. To achieve exactly-once, consumers can participate in Kafka transactions.
Why Rebalance Latency Matters
During a rebalance, all consumption pauses across the entire group — even consumers whose partition assignments have not changed. This stop-the-world behaviour is the main scaling bottleneck for large consumer groups. In a group with 100 consumers and 1000 partitions, a single consumer failure triggers a rebalance that temporarily halts all 100 consumers.
Kafka 2.4 introduced incremental cooperative rebalancing: instead of revoking all assignments and redistributing from scratch, only the partitions that need to move are revoked. Consumers retain their unchanged assignments and continue consuming during the rebalance. This dramatically reduces disruption for large groups.
Why Kafka Uses This Approach
The consumer group model deliberately places partition assignment logic at the consumer level rather than in the broker. This means the broker does not need to track per-message acknowledgements or maintain consumer-specific queues. The broker's job is just to maintain the log. Consumers are responsible for their own progress tracking. This separation of concerns is why Kafka can sustain millions of consumers and trillions of messages without broker-side state explosion.
Edge Cases and Invariants
- More consumers than partitions: Some consumers will receive an empty assignment. An empty assignment is valid — the consumer joins the group but reads nothing until a rebalance gives it a partition.
- Consumer rejoins after leaving: The group triggers a rebalance on JOIN, regardless of whether the consumer was previously in the group. Its previous offset commits are preserved in
__consumer_offsetsand will be used when it is reassigned a partition. - Sorted order is deterministic: Round-robin assignment must use consistently sorted consumer and partition lists. Non-deterministic ordering would produce different assignments on different brokers or after restarts, breaking the invariant that each partition has exactly one consumer.
- POSITIONS reflects committed offsets, not current position: A consumer that has read 100 messages but only committed offset 50 will show offset 50 in POSITIONS. The uncommitted 50 will be re-processed on restart.
Sign in to run and submit code.