ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/indexes/tasks/task-13-3-lsm-tree
TASK

Implementation

Build a Log-Structured Merge Tree (LSM Tree):

  1. Writes go to in-memory memtable (sorted structure)
  2. When memtable is full, flush to SSTable on disk
  3. SSTables are immutable and sorted
  4. Read checks memtable, then each SSTable (newest first)
  5. Background compaction merges SSTables

LSM Trees optimize for write throughput by using sequential I/O.

Sample Test Cases

LSM write and readTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_put","msg_id":2,"key":"x","value":100}}
{"src":"c2","dest":"n1","body":{"type":"lsm_get","msg_id":3,"key":"x"}}
Expected Output
{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}
{"src":"n1","dest":"c1","body":{"type":"lsm_put_ok","in_reply_to":2,"msg_id":1}}
{"src":"n1","dest":"c2","body":{"type":"lsm_get_ok","in_reply_to":3,"msg_id":2,"value":100}}

Hints

Hint 1
Write to in-memory memtable first
Hint 2
Flush to sorted SSTable when full
Hint 3
Compact SSTables in background
OVERVIEW

Theoretical Hub

LSM Tree Architecture

LSM Trees batch writes in memory (memtable), then flush sorted runs to disk (SSTables). This converts random writes to sequential writes, dramatically improving write throughput. Systems like Cassandra and RocksDB use LSM Trees.

Compaction

As SSTables accumulate, reads slow down (must check each file). Compaction merges SSTables, removing deleted keys and combining overlapping ranges. Size-tiered and leveled compaction are common strategies.

Key Concepts

LSM treememtableSSTablecompaction
main.py
python
Implement LSM Tree - Indexes | Build Distributed Systems