Subtracks & Tasks
Distributed File Storage
Design a GFS-Style Distributed File System Architecture
The Google File System (GFS) architecture is the foundation of modern distributed storage. It separates metadata (managed by a master) from data (stor...
Implement the Master Namespace Tree
The master's namespace is a hierarchical tree of directories and files. It maps every file to its chunks and their locations. This is stored entirely ...
Implement Chunk Creation and Allocation
When a client creates a file or appends a new chunk, the master must allocate chunk storage on appropriate chunk servers. Chunk creation flow: 1. Cli...
Implement Chunk Replication with Pipeline Writes
When a client writes data, the primary chunk server coordinates replication to all secondaries. GFS uses a **pipeline** design where data flows in a c...
Implement Chunk Leases for Primary Assignment
A chunk lease grants one chunk server the exclusive right to define the mutation order for a chunk. This avoids per-operation consensus while maintain...
Fault Tolerance and Rebalancing
Implement Chunk Server Heartbeats
Chunk server heartbeats are the master's only mechanism for tracking which servers are alive and which chunks they hold. Without heartbeats, the maste...
Implement Automatic Re-Replication
When a chunk server dies, its chunks become under-replicated. The master must automatically schedule re-replication to restore the target replication ...
Implement Chunk Server Load Balancing
Over time, chunk distribution becomes uneven: new servers start empty, old servers fill up, and some receive more writes. Load balancing moves chunks ...
Implement Master Failover with Shadow Master
The master is a single point of failure. A shadow master mitigates this by continuously replaying the primary's WAL, staying nearly synchronized. Fai...
Implement Chunk Checksums for Data Integrity
Disks can silently corrupt data without any error signal. Chunk checksums detect this corruption before it is returned to users. Checksum design: 1. ...
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.