TASK
Implementation
The MemTable is the entry point for all writes in an LSM tree. It is an in-memory sorted data structure (typically a skip list or red-black tree) that acts as a write buffer.
How it works in the LSM write path:
- Write: insert the key-value pair into the MemTable (O(log N))
- Read: check the MemTable first. If the key is found, return it immediately (freshest data). If not, check SSTables on disk.
- Flush: when the MemTable exceeds a size threshold (e.g. 4MB), freeze it, create a new empty MemTable for new writes, and flush the frozen one to disk as an immutable SSTable.
The MemTable must support:
put(key, value)— insert or updateget(key)— point lookupscan(start, end)— range scan (sorted iteration)
Request: {"type": "memtable_put", "msg_id": 1, "key": "user:1", "value": "Alice"}
Response: {"type": "memtable_put_ok", "in_reply_to": 1, "size_bytes": 64}
Request: {"type": "memtable_get", "msg_id": 2, "key": "user:1"}
Response: {"type": "memtable_get_ok", "in_reply_to": 2, "value": "Alice", "source": "memtable"}
Request: {"type": "memtable_scan", "msg_id": 3, "start": "user:1", "end": "user:5"}
Response: {"type": "memtable_scan_ok", "in_reply_to": 3, "entries": [
{"key": "user:1", "value": "Alice"},
{"key": "user:3", "value": "Bob"}
]}Sample Test Cases
Put and get from memtableTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"memtable_put","msg_id":2,"key":"k1","value":"v1"}}
{"src":"c1","dest":"n1","body":{"type":"memtable_get","msg_id":3,"key":"k1"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "memtable_put_ok", "in_reply_to": 2, "msg_id": 1}}
{"src": "n1", "dest": "c1", "body": {"type": "memtable_get_ok", "in_reply_to": 3, "value": "v1", "source": "memtable", "msg_id": 2}}
Get for missing key returns nullTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"memtable_get","msg_id":2,"key":"nonexistent"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "memtable_get_ok", "in_reply_to": 2, "value": null, "source": "memtable", "msg_id": 1}}
Hints
Hint 1▾
All writes go to the MemTable first — it is the "write buffer" of the LSM tree
Hint 2▾
Use a sorted data structure (skip list or red-black tree) for O(log N) inserts and sorted iteration
Hint 3▾
Reads check the MemTable BEFORE checking any on-disk SSTables (freshest data wins)
Hint 4▾
When the MemTable exceeds a threshold (e.g. 4MB), it is frozen and flushed to disk as an SSTable
Hint 5▾
The MemTable supports range scans because it is sorted — unlike a hash map
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
MemTablesorted treeskip listwrite pathin-memory buffer
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()