TASK
Implementation
Choosing between B-Tree and LSM Tree is one of the most important storage engine decisions. They optimize for opposite ends of the read/write spectrum. Understanding the three amplification factors is key:
Read Amplification (disk reads per logical read):
- B-Tree: O(log_B(N)) where B is branching factor. Typically 2-4 reads.
- LSM: must check MemTable + L0 (all files) + L1 + L2... Bloom filters help, but worst case checks every level.
Write Amplification (bytes written per byte of user data):
- B-Tree: ~2x (read page, modify, write back). With WAL: ~3x.
- LSM: 10-30x. Data is written to MemTable, flushed to L0, then compacted through L1, L2, L3...
Space Amplification (disk used / logical data):
- B-Tree: ~1.0 (in-place updates, no dead versions). Some fragmentation from splits.
- LSM: 1.1-1.5 (old versions exist until compacted away).
When to choose each:
- B-Tree: PostgreSQL, MySQL (OLTP, mixed read/write)
- LSM: RocksDB, Cassandra (write-heavy, time series, logging)
Request: {"type": "btree_vs_lsm", "msg_id": 1, "entries": 100000, "workload": {"read_pct": 50, "write_pct": 50}}
Response: {"type": "btree_vs_lsm_ok", "in_reply_to": 1, "btree": {"write_ops_sec": 50000, "read_ops_sec": 200000, "space_amp": 1.0, "write_amp": 3.0, "read_amp": 1.0}, "lsm": {"write_ops_sec": 200000, "read_ops_sec": 80000, "space_amp": 1.3, "write_amp": 10.0, "read_amp": 3.0}}Sample Test Cases
Compare both engines under mixed workloadTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"btree_vs_lsm","msg_id":2,"entries":1000,"workload":{"read_pct":50,"write_pct":50}}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Read-heavy workload favors B-TreeTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"btree_vs_lsm","msg_id":2,"entries":1000,"workload":{"read_pct":90,"write_pct":10}}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Read amplification: how many disk reads per logical read. B-Tree: O(log N). LSM: O(levels * fanout).
Hint 2▾
Write amplification: how many bytes written to disk per byte of user data. B-Tree: ~2x. LSM: 10-30x.
Hint 3▾
Space amplification: disk space used / logical data size. B-Tree: ~1.0. LSM: 1.1-1.5 (dead versions).
Hint 4▾
B-Tree excels for: read-heavy OLTP, point lookups, range scans with buffer pool
Hint 5▾
LSM excels for: write-heavy workloads, logging, time series, Kafka-style append-only
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
performance comparisonread amplificationwrite amplificationspace amplificationengine selection
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()