TASK
Implementation
When the in-memory MemTable exceeds its size threshold, it must be persisted to disk as an SSTable (Sorted String Table). The SSTable is a fundamental building block of LSM trees, used in RocksDB, Cassandra, LevelDB, and HBase.
SSTable properties:
- Sorted: keys are stored in lexicographic order, enabling efficient range scans and merge operations
- Immutable: once written, an SSTable is never modified (append-only philosophy)
- Bloom filter: a probabilistic data structure attached to each SSTable that answers "is this key possibly in this file?" with no false negatives
- Sparse index: samples every Nth key to enable fast binary search within the file
The flush process:
- Freeze the current MemTable (stop writes to it, create a new MemTable for incoming writes)
- Sort the frozen MemTable entries by key
- Write them sequentially to a new SSTable file
- Build the Bloom filter and sparse index
- Write the footer (Bloom filter + index) and close the file
Request: {"type": "sstable_flush", "msg_id": 1, "memtable_size_bytes": 4194304}
Response: {"type": "sstable_flush_ok", "in_reply_to": 1, "sstable_file": "L0_001.sst", "entries": 50000, "bloom_filter_bits": 480000, "size_bytes": 4000000}
Request: {"type": "sstable_lookup", "msg_id": 2, "sstable": "L0_001.sst", "key": "user:42"}
Response: {"type": "sstable_lookup_ok", "in_reply_to": 2, "found": true, "value": "Alice", "bloom_checked": true, "disk_reads": 2}Sample Test Cases
Flush creates SSTable with metadataTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"sstable_flush","msg_id":2,"memtable_size_bytes":4194304}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Lookup existing key in SSTableTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"sstable_flush","msg_id":2,"memtable_size_bytes":1024}}
{"src":"c1","dest":"n1","body":{"type":"sstable_lookup","msg_id":3,"sstable":"L0_001.sst","key":"user:42"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
When the MemTable exceeds 4MB, sort its entries and flush to an immutable SSTable file on disk
Hint 2▾
SSTable format: sorted key-value pairs followed by an index block and a Bloom filter footer
Hint 3▾
The Bloom filter enables O(1) negative lookups: "this key is definitely NOT in this SSTable"
Hint 4▾
SSTables are immutable once written — they are never modified, only eventually merged in compaction
Hint 5▾
Each SSTable also has a sparse index: sample every Nth key for fast binary search within the file
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
SSTableflushBloom filterimmutable filesorted strings
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()