TASK
Implementation
Without compaction, the WAL grows forever. Every database solves this with the same pattern: periodically take a snapshot of the state machine, then truncate all WAL entries before the snapshot.
The compaction algorithm:
- Serialize the current state machine to a snapshot file
- Record the WAL offset at which the snapshot was taken
- Delete all WAL entries (segments) before that offset
- On recovery: load the snapshot, then replay only entries after the snapshot offset
The critical requirement: atomicity. If the system crashes between taking the snapshot and truncating the WAL, there must be no data loss. The standard approach:
- Write snapshot to a temp file
- Atomically rename the temp file to the final snapshot path
- Only then delete old WAL segments
Request: {"type": "wal_snapshot", "msg_id": 1}
Response: {"type": "wal_snapshot_ok", "in_reply_to": 1, "snapshot_offset": 500, "snapshot_size_bytes": 2048}
Request: {"type": "wal_compact", "msg_id": 2, "snapshot_at_offset": 500}
Response: {"type": "wal_compact_ok", "in_reply_to": 2, "snapshot_offset": 500, "entries_before": 1000, "entries_after": 500, "bytes_freed": 32000}Sample Test Cases
Take a snapshot of current stateTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_append","msg_id":2,"payload":"set x=1"}}
{"src":"c1","dest":"n1","body":{"type":"wal_append","msg_id":3,"payload":"set y=2"}}
{"src":"c1","dest":"n1","body":{"type":"wal_snapshot","msg_id":4}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Compaction frees old entriesTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_compact","msg_id":2,"snapshot_at_offset":500}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Once a snapshot of the state machine is taken, old WAL entries become unnecessary
Hint 2▾
The process: take snapshot -> record snapshot offset -> delete WAL entries before that offset
Hint 3▾
The snapshot + truncation MUST be atomic: a crash between them means data loss
Hint 4▾
Write the snapshot to a temp file, then atomically rename it into place
Hint 5▾
On recovery: load snapshot first, then replay only WAL entries after the snapshot offset
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
log compactionsnapshottruncationatomic operationspace reclamation
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()