Compare B-Tree vs LSM Tree with Amplification Metrics - The Logger

                      TASK
                    

Implementation

Choosing between B-Tree and LSM Tree is one of the most important storage engine decisions. They optimize for opposite ends of the read/write spectrum. Understanding the three amplification factors is key:

Read Amplification (disk reads per logical read):

B-Tree: O(log_B(N)) where B is branching factor. Typically 2-4 reads.
LSM: must check MemTable + L0 (all files) + L1 + L2... Bloom filters help, but worst case checks every level.

Write Amplification (bytes written per byte of user data):

B-Tree: ~2x (read page, modify, write back). With WAL: ~3x.
LSM: 10-30x. Data is written to MemTable, flushed to L0, then compacted through L1, L2, L3...

Space Amplification (disk used / logical data):

B-Tree: ~1.0 (in-place updates, no dead versions). Some fragmentation from splits.
LSM: 1.1-1.5 (old versions exist until compacted away).

When to choose each:

B-Tree: PostgreSQL, MySQL (OLTP, mixed read/write)
LSM: RocksDB, Cassandra (write-heavy, time series, logging)

Request:  {"type": "btree_vs_lsm", "msg_id": 1, "entries": 100000, "workload": {"read_pct": 50, "write_pct": 50}}
Response: {"type": "btree_vs_lsm_ok", "in_reply_to": 1, "btree": {"write_ops_sec": 50000, "read_ops_sec": 200000, "space_amp": 1.0, "write_amp": 3.0, "read_amp": 1.0}, "lsm": {"write_ops_sec": 200000, "read_ops_sec": 80000, "space_amp": 1.3, "write_amp": 10.0, "read_amp": 3.0}}

Sample Test Cases

Compare both engines under mixed workloadTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"btree_vs_lsm","msg_id":2,"entries":1000,"workload":{"read_pct":50,"write_pct":50}}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Read-heavy workload favors B-TreeTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"btree_vs_lsm","msg_id":2,"entries":1000,"workload":{"read_pct":90,"write_pct":10}}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Hints

Hint 1▾

Read amplification: how many disk reads per logical read. B-Tree: O(log N). LSM: O(levels * fanout).

Hint 2▾

Write amplification: how many bytes written to disk per byte of user data. B-Tree: ~2x. LSM: 10-30x.

Hint 3▾

Space amplification: disk space used / logical data size. B-Tree: ~1.0. LSM: 1.1-1.5 (dead versions).

Hint 4▾

B-Tree excels for: read-heavy OLTP, point lookups, range scans with buffer pool

Hint 5▾

LSM excels for: write-heavy workloads, logging, time series, Kafka-style append-only

Resources

B-Tree vs LSM Tree Analysis

Detailed academic analysis of write amplification, read amplification, and space amplification tradeoffs

                      OVERVIEW
                    

Theoretical Hub

Concept overview coming soon

Key Concepts

performance comparisonread amplificationwrite amplificationspace amplificationengine selection

main.py

python