TASK
Implementation
The choice between LSM tree and B-Tree is one of the most important storage engine decisions. They represent fundamentally different tradeoffs:
LSM Tree (RocksDB, Cassandra, LevelDB):
- Writes are sequential (append to MemTable, flush to SSTable) -> very high write throughput
- Reads may need to check multiple levels -> higher read latency
- Space amplification > 1.0 (old versions exist until compaction)
- Write amplification is high (data rewritten during compaction)
B-Tree (PostgreSQL, MySQL InnoDB, SQLite):
- Writes are random I/O (in-place updates) -> lower write throughput
- Reads traverse a balanced tree (O(log N) disk reads) -> lower read latency
- Space amplification ~1.0 (in-place updates, no dead versions)
- Write amplification is lower
Benchmark both engines and measure: write ops/sec, read p50 latency, and space amplification.
Request: {"type": "lsm_benchmark", "msg_id": 1, "entries": 100000}
Response: {"type": "lsm_benchmark_ok", "in_reply_to": 1, "lsm": {"write_ops_sec": 200000, "read_p50_us": 50, "read_without_bloom_p50_us": 500, "space_amplification": 1.3}, "btree": {"write_ops_sec": 50000, "read_p50_us": 20, "space_amplification": 1.0}}Sample Test Cases
Full benchmark comparisonTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_benchmark","msg_id":2,"entries":1000}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Bloom filter impact on readsTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_benchmark","msg_id":2,"entries":5000}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Measure write throughput: LSM trees excel at sequential inserts (append-only writes)
Hint 2▾
Measure read latency with and without Bloom filter — the filter should dramatically reduce disk reads
Hint 3▾
Space amplification = total bytes on disk / total bytes of unique data (LSM > 1.0 due to duplicate versions)
Hint 4▾
Compare to a naive B-Tree: B-Trees have lower read latency but higher write latency (in-place updates)
Hint 5▾
LSM wins for write-heavy workloads; B-Tree wins for read-heavy workloads
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
write throughputread latencyspace amplificationBloom filter impactengine comparison
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()