Implement Chunk Replication with Pipeline Writes - The Filesystem

                      TASK
                    

Implementation

When a client writes data, the primary chunk server coordinates replication to all secondaries. GFS uses a pipeline design where data flows in a chain to maximize network throughput.

Write replication flow:

Client sends data to the closest chunk server (not necessarily the primary)
That server forwards the data to the next closest server in the chain
Data flows as a pipeline: server A -> server B -> server C
Once all servers have the data cached, the client sends a write request to the primary
The primary assigns a serial number to the write (for ordering)
The primary applies the write locally, then forwards the serial order to secondaries
Secondaries apply the write in the same order
All servers acknowledge -> primary replies to client

This separates data flow (pipeline for throughput) from control flow (primary for ordering).

Request:  {"type": "chunk_write", "msg_id": 1, "chunk_handle": "ch_001", "offset": 0, "data": "hello world", "primary": "cs1", "secondaries": ["cs2", "cs3"]}
Response: {"type": "chunk_write_ok", "in_reply_to": 1, "bytes_written": 11, "replicas_acked": 3, "serial_number": 1}

Sample Test Cases

Write replicates to all serversTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2","n3"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_write","msg_id":2,"chunk_handle":"ch_001","offset":0,"data":"hello","primary":"n1","secondaries":["n2","n3"]}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Sequential writes get increasing serial numbersTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2","n3"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_write","msg_id":2,"chunk_handle":"ch_001","offset":0,"data":"a","primary":"n1","secondaries":["n2","n3"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_write","msg_id":3,"chunk_handle":"ch_001","offset":1,"data":"b","primary":"n1","secondaries":["n2","n3"]}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Hints

Hint 1▾

The primary receives the write and forwards it to the secondaries in a pipeline

Hint 2▾

Pipeline: client -> primary -> secondary1 -> secondary2 (data flows in a chain)

Hint 3▾

All three must acknowledge before the write is considered successful

Hint 4▾

If any replica fails, the write fails and the client retries

Hint 5▾

GFS separates data flow (pipeline) from control flow (primary commits order)

Resources

GFS Data Flow Pipeline

GFS paper section on pipeline writes and data/control flow separation

                      OVERVIEW
                    

Theoretical Hub

Concept overview coming soon

Key Concepts

chunk replicationpipeline writesprimary-secondarywrite acknowledgementdata flow

main.py

python