TASK
Implementation
Build a Log-Structured Merge Tree (LSM Tree):
- Writes go to in-memory memtable (sorted structure)
- When memtable is full, flush to SSTable on disk
- SSTables are immutable and sorted
- Read checks memtable, then each SSTable (newest first)
- Background compaction merges SSTables
LSM Trees optimize for write throughput by using sequential I/O.
Sample Test Cases
LSM write and readTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"lsm_put","msg_id":2,"key":"x","value":100}}
{"src":"c2","dest":"n1","body":{"type":"lsm_get","msg_id":3,"key":"x"}}
Expected Output
{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}
{"src":"n1","dest":"c1","body":{"type":"lsm_put_ok","in_reply_to":2,"msg_id":1}}
{"src":"n1","dest":"c2","body":{"type":"lsm_get_ok","in_reply_to":3,"msg_id":2,"value":100}}
Hints
Hint 1▾
Write to in-memory memtable first
Hint 2▾
Flush to sorted SSTable when full
Hint 3▾
Compact SSTables in background
OVERVIEW
Theoretical Hub
LSM Tree Architecture
LSM Trees batch writes in memory (memtable), then flush sorted runs to disk (SSTables). This converts random writes to sequential writes, dramatically improving write throughput. Systems like Cassandra and RocksDB use LSM Trees.
Compaction
As SSTables accumulate, reads slow down (must check each file). Compaction merges SSTables, removing deleted keys and combining overlapping ranges. Size-tiered and leveled compaction are common strategies.
Key Concepts
LSM treememtableSSTablecompaction
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/env python3
import sys
import json
import os
import bisect
class SSTable:
def __init__(self, path, data=None):
self.path = path
if data:
self._write(data)
def _write(self, data):
# TODO: Write sorted data to file
pass
def get(self, key):
# TODO: Binary search in sorted file
pass
def scan(self):
# TODO: Iterate all key-value pairs
pass
class LSMTree:
def __init__(self, memtable_size=1000):
self.memtable = {} # Current in-memory table
self.memtable_size = memtable_size
self.sstables = [] # Oldest to newest
self.sstable_counter = 0
def put(self, key, value):
# TODO: Add to memtable, flush if full
pass
def get(self, key):