TASK
Implementation
Disk I/O is 1000x slower than memory access. The buffer pool bridges this gap by caching frequently accessed B-Tree pages in RAM. This is how every production database (PostgreSQL, MySQL, Oracle) achieves acceptable performance.
Buffer pool design:
- Fixed-size memory region: holds N pages (e.g. 100 pages * 4KB = 400KB)
- Page table: maps (page_id -> frame_number) for O(1) lookup
- LRU eviction: when a new page is needed but the pool is full, evict the least recently used page
- Dirty page tracking: pages modified in memory are marked "dirty". Before evicting a dirty page, flush it to disk.
- Hit rate monitoring: track the ratio of cache hits to total reads. High hit rates (>95%) mean the working set fits in memory.
The eviction policy is critical:
- LRU works well for most workloads but is vulnerable to sequential scans (a full table scan can evict the entire cache)
- MySQL uses a "young" and "old" sublist to protect against this
- PostgreSQL uses a clock sweep algorithm (approximated LRU)
Request: {"type": "buffer_pool_config", "msg_id": 1, "max_pages": 100, "page_size_bytes": 4096}
Response: {"type": "buffer_pool_config_ok", "in_reply_to": 1, "total_memory_bytes": 409600}
Request: {"type": "buffer_pool_read", "msg_id": 2, "page_id": 42}
Response: {"type": "buffer_pool_read_ok", "in_reply_to": 2, "cache_hit": false, "disk_read": true}
Request: {"type": "buffer_pool_stats", "msg_id": 3}
Response: {"type": "buffer_pool_stats_ok", "in_reply_to": 3, "total_reads": 1000, "cache_hits": 850, "hit_rate": 0.85, "dirty_pages": 12, "evictions": 50}Sample Test Cases
Configure buffer pool sizeTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"buffer_pool_config","msg_id":2,"max_pages":100,"page_size_bytes":4096}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "buffer_pool_config_ok", "in_reply_to": 2, "total_memory_bytes": 409600, "msg_id": 1}}
First read is a cache missTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"buffer_pool_config","msg_id":2,"max_pages":10,"page_size_bytes":4096}}
{"src":"c1","dest":"n1","body":{"type":"buffer_pool_read","msg_id":3,"page_id":1}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "buffer_pool_config_ok", "in_reply_to": 2, "total_memory_bytes": 40960, "msg_id": 1}}
{"src": "n1", "dest": "c1", "body": {"type": "buffer_pool_read_ok", "in_reply_to": 3, "cache_hit": false, "disk_read": true, "msg_id": 2}}
Hints
Hint 1▾
The buffer pool caches frequently accessed B-Tree pages in memory to avoid disk I/O
Hint 2▾
Use LRU (Least Recently Used) to decide which pages to evict when the pool is full
Hint 3▾
Dirty pages (modified but not yet flushed) must be written to disk BEFORE eviction
Hint 4▾
Hit rate = cache hits / total reads. A good buffer pool achieves 95%+ hit rates
Hint 5▾
This is exactly how PostgreSQL and MySQL manage their shared buffer pool
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
buffer poolLRU evictionpage cachedirty pageshit rate
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()