Implement a Buffer Pool with LRU Eviction - The Logger

                      TASK
                    

Implementation

Disk I/O is 1000x slower than memory access. The buffer pool bridges this gap by caching frequently accessed B-Tree pages in RAM. This is how every production database (PostgreSQL, MySQL, Oracle) achieves acceptable performance.

Buffer pool design:

Fixed-size memory region: holds N pages (e.g. 100 pages * 4KB = 400KB)
Page table: maps (page_id -> frame_number) for O(1) lookup
LRU eviction: when a new page is needed but the pool is full, evict the least recently used page
Dirty page tracking: pages modified in memory are marked "dirty". Before evicting a dirty page, flush it to disk.
Hit rate monitoring: track the ratio of cache hits to total reads. High hit rates (>95%) mean the working set fits in memory.

The eviction policy is critical:

LRU works well for most workloads but is vulnerable to sequential scans (a full table scan can evict the entire cache)
MySQL uses a "young" and "old" sublist to protect against this
PostgreSQL uses a clock sweep algorithm (approximated LRU)

Request:  {"type": "buffer_pool_config", "msg_id": 1, "max_pages": 100, "page_size_bytes": 4096}
Response: {"type": "buffer_pool_config_ok", "in_reply_to": 1, "total_memory_bytes": 409600}

Request:  {"type": "buffer_pool_read", "msg_id": 2, "page_id": 42}
Response: {"type": "buffer_pool_read_ok", "in_reply_to": 2, "cache_hit": false, "disk_read": true}

Request:  {"type": "buffer_pool_stats", "msg_id": 3}
Response: {"type": "buffer_pool_stats_ok", "in_reply_to": 3, "total_reads": 1000, "cache_hits": 850, "hit_rate": 0.85, "dirty_pages": 12, "evictions": 50}

Sample Test Cases

Configure buffer pool sizeTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"buffer_pool_config","msg_id":2,"max_pages":100,"page_size_bytes":4096}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "buffer_pool_config_ok", "in_reply_to": 2, "total_memory_bytes": 409600, "msg_id": 1}}

First read is a cache missTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"buffer_pool_config","msg_id":2,"max_pages":10,"page_size_bytes":4096}}
{"src":"c1","dest":"n1","body":{"type":"buffer_pool_read","msg_id":3,"page_id":1}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "buffer_pool_config_ok", "in_reply_to": 2, "total_memory_bytes": 40960, "msg_id": 1}}
{"src": "n1", "dest": "c1", "body": {"type": "buffer_pool_read_ok", "in_reply_to": 3, "cache_hit": false, "disk_read": true, "msg_id": 2}}

Hints

Hint 1▾

The buffer pool caches frequently accessed B-Tree pages in memory to avoid disk I/O

Hint 2▾

Use LRU (Least Recently Used) to decide which pages to evict when the pool is full

Hint 3▾

Dirty pages (modified but not yet flushed) must be written to disk BEFORE eviction

Hint 4▾

Hit rate = cache hits / total reads. A good buffer pool achieves 95%+ hit rates

Hint 5▾

This is exactly how PostgreSQL and MySQL manage their shared buffer pool

Resources

CMU 15-445 Buffer Pool Management

Carnegie Mellon database course lecture on buffer pool design and eviction policies

                      OVERVIEW
                    

Theoretical Hub

Concept overview coming soon

Key Concepts

buffer poolLRU evictionpage cachedirty pageshit rate

main.py

python