TASK
Implementation
When a chunk server dies, its chunks become under-replicated. The master must automatically schedule re-replication to restore the target replication factor.
Re-replication algorithm:
- Detect: master notices missing heartbeats from a server and marks it dead
- Scan: identify all chunks that were on the dead server — they now have fewer replicas
- Prioritize: chunks with RF=1 are critical (one more failure = data loss). Re-replicate them first.
- Schedule: for each under-replicated chunk, pick a healthy server that does NOT already hold the chunk
- Copy: instruct an existing replica to send the chunk data to the new server
- Update: add the new server to the chunk's location list in the master's metadata
Request: {"type": "check_replication", "msg_id": 1}
Response: {"type": "check_replication_ok", "in_reply_to": 1, "under_replicated": [
{"chunk": "ch_001", "current_rf": 2, "target_rf": 3, "missing_on": ["cs3"]},
{"chunk": "ch_005", "current_rf": 1, "target_rf": 3, "priority": "critical"}
]}
Request: {"type": "replicate_chunk", "msg_id": 2, "chunk": "ch_005", "source": "cs1", "target": "cs4"}
Response: {"type": "replicate_chunk_ok", "in_reply_to": 2, "chunk": "ch_005", "new_rf": 2, "bytes_copied": 67108864}Sample Test Cases
Check replication identifies under-replicated chunksTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"check_replication","msg_id":2}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Replicate chunk to new serverTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2","n3","n4"]}}
{"src":"c1","dest":"n1","body":{"type":"replicate_chunk","msg_id":2,"chunk":"ch_005","source":"n2","target":"n4"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
When a server dies, its chunks drop below replication factor 3
Hint 2▾
The master scans for under-replicated chunks and schedules re-replication
Hint 3▾
Pick a healthy server that does NOT already hold the chunk to be the new replica
Hint 4▾
Copy chunk data from an existing replica to the new server
Hint 5▾
Prioritize: chunks with replication factor 1 (one server failure from data loss)
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
re-replicationunder-replicated chunksreplication factorfailure recovery
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()