TASK
Implementation
The master is a single point of failure. A shadow master mitigates this by continuously replaying the primary's WAL, staying nearly synchronized.
Failover process:
- Normal operation: primary master handles all requests; shadow replays WAL entries from a shared storage
- Failure detection: if the primary misses heartbeats for 10 seconds, the shadow initiates takeover
- WAL catchup: shadow replays any remaining WAL entries not yet processed
- Promote: shadow becomes the new primary, starts accepting requests
- Redirect: chunk servers and clients are notified to use the new master
Request: {"type": "shadow_status", "msg_id": 1}
Response: {"type": "shadow_status_ok", "in_reply_to": 1, "primary": "master1", "shadow": "master2", "wal_lag_entries": 5, "wal_lag_ms": 200}
Request: {"type": "trigger_failover", "msg_id": 2, "failed_master": "master1"}
Response: {"type": "trigger_failover_ok", "in_reply_to": 2, "new_primary": "master2", "wal_entries_replayed": 5, "failover_time_ms": 450}Sample Test Cases
Shadow status shows WAL lagTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2"]}}
{"src":"c1","dest":"n1","body":{"type":"shadow_status","msg_id":2}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Failover promotes shadowTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1","n2"]}}
{"src":"c1","dest":"n1","body":{"type":"trigger_failover","msg_id":2,"failed_master":"n1"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
A shadow master replays the primary WAL periodically to stay nearly up-to-date
Hint 2▾
On primary failure, the shadow takes over with minimal data loss (only recent WAL entries)
Hint 3▾
The shadow cannot serve queries while the primary is alive — it is a hot standby
Hint 4▾
Failover time = time to detect failure + time to replay remaining WAL entries
Hint 5▾
After failover, chunk servers redirect heartbeats to the new master
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
master failovershadow masterWAL replayhot standbyfailover time
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()