Subtracks & Tasks
The Commit Log (WAL)
Implement a Write-Ahead Log
The Write-Ahead Log (WAL) is the most fundamental durability primitive in distributed systems. Every database (PostgreSQL, MySQL, SQLite), every conse...
Implement WAL Recovery on Startup
WAL recovery is the other half of the durability story. When a node restarts after a crash, it must scan the WAL and replay all valid entries to recon...
Add WAL Segment Files with Offset Index
A single WAL file has a problem: it grows without bound. Once it reaches gigabytes, seeks become slow and cleanup is impossible without rewriting the ...
Implement WAL Compaction with Atomic Snapshot
Without compaction, the WAL grows forever. Every database solves this with the same pattern: periodically take a **snapshot** of the state machine, th...
Benchmark WAL fsync Strategies
The `fsync` system call forces the OS to flush data from kernel buffers to the physical disk. Without it, data that appears "written" may only exist i...
LSM Tree (Log-Structured Merge Tree)
Implement an In-Memory MemTable
The MemTable is the entry point for all writes in an LSM tree. It is an in-memory sorted data structure (typically a skip list or red-black tree) that...
Implement SSTable Flush with Bloom Filter
When the in-memory MemTable exceeds its size threshold, it must be persisted to disk as an SSTable (Sorted String Table). The SSTable is a fundamental...
Implement a Multi-Level LSM Tree
A single flat list of SSTables becomes unmanageable as data grows. The multi-level LSM tree organizes SSTables into levels (L0, L1, L2, ...) with care...
Implement LSM Compaction with Merge Sort
Compaction is the background process that keeps the LSM tree healthy. Without it, reads degrade because more and more SSTables accumulate, each requir...
Benchmark LSM Tree vs B-Tree Performance
The choice between LSM tree and B-Tree is one of the most important storage engine decisions. They represent fundamentally different tradeoffs: **LSM...
B-Tree on Disk
Implement a B-Tree Node and Search
The B-Tree is the most widely used on-disk data structure. PostgreSQL, MySQL InnoDB, SQLite, and virtually every relational database uses B-Trees for ...
Implement B-Tree Insert with Node Splits
B-Tree insertion maintains the tree's balance invariant: all leaf nodes are always at the same depth. The key mechanism is the **node split** — when a...
Implement B-Tree Delete with Merge and Borrow
B-Tree deletion is the most complex operation because it must maintain two invariants: 1. **Minimum occupancy**: every non-root node must have at leas...
Implement a Buffer Pool with LRU Eviction
Disk I/O is 1000x slower than memory access. The buffer pool bridges this gap by caching frequently accessed B-Tree pages in RAM. This is how every pr...
Compare B-Tree vs LSM Tree with Amplification Metrics
Choosing between B-Tree and LSM Tree is one of the most important storage engine decisions. They optimize for opposite ends of the read/write spectrum...
Distributed Log (Kafka Architecture)
Model a Kafka Partition as a Write-Ahead Log
At its core, Apache Kafka is a distributed log. Each topic is split into partitions, and each partition is an independent, ordered, append-only log st...
Implement Consumer Group Offset Tracking
Consumer offset tracking enables consumers to read at their own pace and resume after restarts. Each consumer group independently tracks its position ...
Implement Partition Leader Election via Raft
In a distributed log like Kafka, each partition must have exactly one leader broker that handles all reads and writes. Followers replicate data from t...
Implement In-Sync Replicas (ISR) Management
The In-Sync Replica (ISR) set is Kafka's mechanism for balancing durability and availability. It tracks which replicas are "caught up" with the leader...
Implement Consumer Group Rebalancing
Consumer group rebalancing ensures that partitions are evenly distributed among consumers. When the group membership changes (a consumer joins, leaves...
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.