TASK
Implementation
Compaction is the background process that keeps the LSM tree healthy. Without it, reads degrade because more and more SSTables accumulate, each requiring a disk seek.
The compaction algorithm (leveled compaction):
- Trigger: when L0 exceeds 4 SSTables, or when a level exceeds its size limit
- Select: choose SSTables from the source level whose key ranges overlap with the target level
- Merge: read all selected SSTables, merge-sort by key, deduplicate (keep newest version of each key)
- Write: write new sorted, non-overlapping SSTables to the target level
- Cleanup: delete the old source SSTables
Key metric - write amplification: a single user write may be rewritten 10-30x across levels as data is compacted. This is the fundamental cost of LSM trees. RocksDB's leveled compaction has write amplification of approximately 10 * (number_of_levels - 1).
Request: {"type": "lsm_compact", "msg_id": 1, "source_level": 0, "target_level": 1}
Response: {"type": "lsm_compact_ok", "in_reply_to": 1, "sstables_merged": 4, "new_sstables": 2, "keys_deduplicated": 500, "bytes_read": 16000000, "bytes_written": 14000000, "write_amplification": 2.3}
Request: {"type": "lsm_compact_status", "msg_id": 2}
Response: {"type": "lsm_compact_status_ok", "in_reply_to": 2, "pending_compactions": 0, "total_compactions": 5, "total_bytes_compacted": 100000000}Sample Test Cases
Compact L0 into L1Timeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_compact","msg_id":2,"source_level":0,"target_level":1}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Compaction deduplicates keysTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_compact","msg_id":2,"source_level":0,"target_level":1}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
When L0 has >4 SSTables, merge them into L1 (compaction trigger)
Hint 2▾
Compaction = merge sort: read overlapping SSTables from both levels, merge by key, write new SSTables to the target level
Hint 3▾
Deduplication: if the same key appears in multiple SSTables, keep only the newest version
Hint 4▾
Write amplification = total bytes written to disk / total bytes of new data. LSM trees have high write amp (10-30x)
Hint 5▾
After compaction, delete the old SSTables that were merged
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
compactionmerge sortdeduplicationwrite amplificationspace reclamation
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()