TASK
Implementation
WAL recovery is the other half of the durability story. When a node restarts after a crash, it must scan the WAL and replay all valid entries to reconstruct its state.
The recovery algorithm:
- Open the WAL file and read entries sequentially
- For each entry, validate the checksum (recompute CRC32 and compare)
- If the checksum matches, replay the entry (apply the operation to the state machine)
- If the checksum is invalid, this is a torn write — the entry was partially written before the crash. Skip it and all subsequent entries.
- After replay, the state machine reflects the last consistent state
This is how every database recovers from crashes: PostgreSQL replays its WAL, SQLite replays its journal, and Raft replays its log.
Request: {"type": "wal_recover", "msg_id": 1, "wal_entries": [
{"offset": 0, "payload": "set x=1", "checksum": "valid"},
{"offset": 1, "payload": "set y=2", "checksum": "valid"},
{"offset": 2, "payload": "set z=", "checksum": "invalid"}
]}
Response: {"type": "wal_recover_ok", "in_reply_to": 1, "entries_replayed": 2, "entries_skipped": 1, "state": {"x": "1", "y": "2"}}Sample Test Cases
Recovery replays valid entries and skips corruptedTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_recover","msg_id":2,"wal_entries":[{"offset":0,"payload":"set x=1","checksum":"valid"},{"offset":1,"payload":"set y=2","checksum":"valid"},{"offset":2,"payload":"set z=","checksum":"invalid"}]}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "wal_recover_ok", "in_reply_to": 2, "entries_replayed": 2, "entries_skipped": 1, "msg_id": 1}}
Recovery with all valid entries replays everythingTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_recover","msg_id":2,"wal_entries":[{"offset":0,"payload":"set a=10","checksum":"valid"},{"offset":1,"payload":"set b=20","checksum":"valid"},{"offset":2,"payload":"set c=30","checksum":"valid"}]}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "wal_recover_ok", "in_reply_to": 2, "entries_replayed": 3, "entries_skipped": 0, "msg_id": 1}}
Hints
Hint 1▾
On startup, scan the WAL sequentially from the beginning
Hint 2▾
For each entry, recompute the CRC32 checksum and compare to the stored one
Hint 3▾
Valid entries are replayed to reconstruct state; invalid entries are skipped
Hint 4▾
A torn write (crash mid-write) produces a partial entry with an invalid checksum
Hint 5▾
After recovery, the state machine is identical to the last consistent pre-crash state
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
crash recoverylog replaychecksum validationtorn writesstate reconstruction
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()