TASK
Implementation
Disks can silently corrupt data without any error signal. Chunk checksums detect this corruption before it is returned to users.
Checksum design:
- Each 64MB chunk is divided into 64KB blocks (1024 blocks per chunk)
- Each block has a CRC32 checksum (4 bytes). Total checksum overhead: 4KB per chunk (0.006%)
- On every read, recompute the block checksum and compare to the stored value
- If they match: return the data. If they mismatch: the block is corrupted.
Corruption handling:
- Report the corrupted chunk to the master
- Read from another replica
- Master schedules re-replication from a healthy replica
- The corrupted replica is discarded
Request: {"type": "chunk_read_verified", "msg_id": 1, "chunk_handle": "ch_001", "block": 42}
Response: {"type": "chunk_read_verified_ok", "in_reply_to": 1, "data": "...", "checksum_valid": true, "stored_checksum": "abc123", "computed_checksum": "abc123"}
Request: {"type": "chunk_read_verified", "msg_id": 2, "chunk_handle": "ch_002", "block": 10}
Response: {"type": "chunk_read_verified_ok", "in_reply_to": 2, "checksum_valid": false, "corruption_reported": true, "fallback_server": "cs3"}Sample Test Cases
Verified read with valid checksumTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_read_verified","msg_id":2,"chunk_handle":"ch_001","block":0}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Corrupted block triggers fallbackTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2","n3"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_read_verified","msg_id":2,"chunk_handle":"ch_002","block":10}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Each chunk stores a CRC32 checksum per 64KB block
Hint 2▾
On every read, recompute the checksum and compare — detect silent corruption
Hint 3▾
If a checksum mismatch is found, read from another replica instead
Hint 4▾
Report corrupted chunks to the master so it can schedule re-replication
Hint 5▾
Disk corruption is rare but real — Google reports ~0.01% of reads hit corruption
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
checksumdata integritycorruption detectionper-block checksumsilent corruption
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()