TASK
Implementation
At its core, Apache Kafka is a distributed log. Each topic is split into partitions, and each partition is an independent, ordered, append-only log stored as a directory of segment files.
Partition structure on disk:
topic-orders/partition-0/
00000000.log # messages at offsets 0-999
00000000.index # sparse index: offset -> byte position
00001000.log # messages at offsets 1000-1999
00001000.indexKey operations:
- Append: producers write a message to the end of the log, receiving the assigned offset
- Read: consumers specify an offset and read messages sequentially from that point
- No mutation: once written, messages are never modified or deleted (until retention policy kicks in)
This append-only design is why Kafka achieves such high throughput: sequential disk I/O is nearly as fast as memory access on modern SSDs.
Request: {"type": "partition_append", "msg_id": 1, "topic": "orders", "partition": 0, "message": "order-1234"}
Response: {"type": "partition_append_ok", "in_reply_to": 1, "offset": 0, "segment": "00000000.log", "timestamp": 1700000000}
Request: {"type": "partition_read", "msg_id": 2, "topic": "orders", "partition": 0, "offset": 0}
Response: {"type": "partition_read_ok", "in_reply_to": 2, "message": "order-1234", "offset": 0, "next_offset": 1}
Request: {"type": "partition_info", "msg_id": 3, "topic": "orders", "partition": 0}
Response: {"type": "partition_info_ok", "in_reply_to": 3, "start_offset": 0, "end_offset": 42, "segments": 2, "total_bytes": 524288}Sample Test Cases
Append returns monotonically increasing offsetTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"partition_append","msg_id":2,"topic":"orders","partition":0,"message":"msg-1"}}
{"src":"c1","dest":"n1","body":{"type":"partition_append","msg_id":3,"topic":"orders","partition":0,"message":"msg-2"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "partition_append_ok", "in_reply_to": 2, "offset": 0, "msg_id": 1}}
{"src": "n1", "dest": "c1", "body": {"type": "partition_append_ok", "in_reply_to": 3, "offset": 1, "msg_id": 2}}
Read at offset returns correct messageTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"partition_append","msg_id":2,"topic":"t","partition":0,"message":"hello"}}
{"src":"c1","dest":"n1","body":{"type":"partition_read","msg_id":3,"topic":"t","partition":0,"offset":0}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "partition_append_ok", "in_reply_to": 2, "offset": 0, "msg_id": 1}}
{"src": "n1", "dest": "c1", "body": {"type": "partition_read_ok", "in_reply_to": 3, "message": "hello", "offset": 0, "msg_id": 2}}
Hints
Hint 1▾
A Kafka partition is essentially a WAL stored as a directory of segment files
Hint 2▾
Each segment has a .log file (raw messages) and a .index file (offset -> byte position)
Hint 3▾
Producers can only append to the end of the log (no random writes)
Hint 4▾
Consumers read at a given offset and can independently read at their own pace
Hint 5▾
Kafka achieves millions of messages per second because appending is sequential I/O, which is extremely fast
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
Kafka partitionsegment fileoffsetappend-onlysequential I/O
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()