TASK
Implementation
A single flat list of SSTables becomes unmanageable as data grows. The multi-level LSM tree organizes SSTables into levels (L0, L1, L2, ...) with carefully maintained invariants.
Level structure:
- L0 (special): receives direct flushes from MemTable. SSTables may have overlapping key ranges. Reads must check ALL L0 files.
- L1, L2, ... (sorted levels): SSTables within the same level have non-overlapping key ranges. A read only needs to check at most ONE SSTable per level.
- Size ratio: each level is ~10x larger than the previous. L0: 40MB, L1: 400MB, L2: 4GB, L3: 40GB.
Read path (multi-level):
- Check MemTable (newest data)
- Check L0 (all files, newest first)
- Check L1 (binary search by key range, at most 1 file)
- Check L2, L3, ... until found or exhausted
Request: {"type": "lsm_level_info", "msg_id": 1}
Response: {"type": "lsm_level_info_ok", "in_reply_to": 1, "levels": [
{"level": 0, "sstables": 4, "total_bytes": 16777216, "sorted": false, "max_bytes": 41943040},
{"level": 1, "sstables": 5, "total_bytes": 52428800, "sorted": true, "max_bytes": 419430400},
{"level": 2, "sstables": 10, "total_bytes": 524288000, "sorted": true, "max_bytes": 4194304000}
]}
Request: {"type": "lsm_read", "msg_id": 2, "key": "user:42"}
Response: {"type": "lsm_read_ok", "in_reply_to": 2, "value": "Alice", "found_at": "L1", "levels_checked": 2}Sample Test Cases
Level info shows multi-level structureTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_level_info","msg_id":2}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Read searches levels in orderTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_read","msg_id":2,"key":"user:42"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
L0 contains unsorted SSTables (recent flushes). Reads must check ALL L0 files.
Hint 2▾
L1+ levels are sorted by key range with NO overlapping SSTables within a level
Hint 3▾
Each level is approximately 10x larger than the previous (size ratio = 10)
Hint 4▾
Reads check: MemTable -> L0 (all files) -> L1 -> L2 -> ... until the key is found
Hint 5▾
The level selection policy chooses which level to compact based on size limits
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
multi-level LSML0L1sorted runslevel policysize ratio
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()