TASK
Implementation
A single WAL file has a problem: it grows without bound. Once it reaches gigabytes, seeks become slow and cleanup is impossible without rewriting the entire file.
The solution: segment files. When a WAL segment exceeds a size threshold, seal it (make it immutable) and open a new active segment. An offset index enables O(1) lookups by mapping each log offset to the correct segment file and byte position.
This is how Kafka, etcd, and most production systems organize their logs:
- Segments are named by their starting offset (e.g.
00000000.log,00001000.log) - Each segment has a companion
.indexfile mapping offset -> byte position - Old sealed segments can be deleted, compressed, or archived independently
- The active segment is the only one receiving new writes
Request: {"type": "wal_segment_config", "msg_id": 1, "max_segment_bytes": 67108864}
Response: {"type": "wal_segment_config_ok", "in_reply_to": 1, "max_segment_bytes": 67108864}
Request: {"type": "wal_segment_info", "msg_id": 2}
Response: {"type": "wal_segment_info_ok", "in_reply_to": 2, "segments": [
{"file": "00000000.log", "start_offset": 0, "end_offset": 999, "size_bytes": 67108000, "sealed": true},
{"file": "00001000.log", "start_offset": 1000, "end_offset": 1050, "size_bytes": 5120, "sealed": false}
], "active_segment": "00001000.log"}Sample Test Cases
Configure segment size thresholdTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_segment_config","msg_id":2,"max_segment_bytes":1024}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "wal_segment_config_ok", "in_reply_to": 2, "max_segment_bytes": 1024, "msg_id": 1}}
Segment info shows active segment on empty logTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_segment_info","msg_id":2}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
When the active segment exceeds a size threshold (e.g. 64MB), seal it and open a new one
Hint 2▾
Maintain an index mapping (log_offset -> segment_file + byte_offset) for O(1) seeks
Hint 3▾
Sealed segments are immutable — they can be safely compressed, archived, or deleted
Hint 4▾
Name segments by their starting offset: 00000000.log, 00001000.log, etc.
Hint 5▾
This is exactly how Kafka organizes its partition logs on disk
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
segment fileslog segmentationoffset indexfast seeksimmutable segments
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()