ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/filesystem/tasks/task-12-2-5-checksums
TASK

Implementation

Disks can silently corrupt data without any error signal. Chunk checksums detect this corruption before it is returned to users.

Checksum design:

  1. Each 64MB chunk is divided into 64KB blocks (1024 blocks per chunk)
  2. Each block has a CRC32 checksum (4 bytes). Total checksum overhead: 4KB per chunk (0.006%)
  3. On every read, recompute the block checksum and compare to the stored value
  4. If they match: return the data. If they mismatch: the block is corrupted.

Corruption handling:

  1. Report the corrupted chunk to the master
  2. Read from another replica
  3. Master schedules re-replication from a healthy replica
  4. The corrupted replica is discarded
Request:  {"type": "chunk_read_verified", "msg_id": 1, "chunk_handle": "ch_001", "block": 42}
Response: {"type": "chunk_read_verified_ok", "in_reply_to": 1, "data": "...", "checksum_valid": true, "stored_checksum": "abc123", "computed_checksum": "abc123"}

Request:  {"type": "chunk_read_verified", "msg_id": 2, "chunk_handle": "ch_002", "block": 10}
Response: {"type": "chunk_read_verified_ok", "in_reply_to": 2, "checksum_valid": false, "corruption_reported": true, "fallback_server": "cs3"}

Sample Test Cases

Verified read with valid checksumTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_read_verified","msg_id":2,"chunk_handle":"ch_001","block":0}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Corrupted block triggers fallbackTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2","n3"]}}
{"src":"c1","dest":"n1","body":{"type":"chunk_read_verified","msg_id":2,"chunk_handle":"ch_002","block":10}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Hints

Hint 1
Each chunk stores a CRC32 checksum per 64KB block
Hint 2
On every read, recompute the checksum and compare — detect silent corruption
Hint 3
If a checksum mismatch is found, read from another replica instead
Hint 4
Report corrupted chunks to the master so it can schedule re-replication
Hint 5
Disk corruption is rare but real — Google reports ~0.01% of reads hit corruption
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

checksumdata integritycorruption detectionper-block checksumsilent corruption
main.py
python
Implement Chunk Checksums for Data Integrity - The Filesystem | Build Distributed Systems