ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/scheduler/tasks/task-22-2-3-scheduler-fault-tolerance
TASK

Implementation

A scheduler that crashes loses all in-flight job assignments. A fault-tolerant scheduler writes every decision to a WAL before acting, so it can replay the log on restart and recover exactly where it left off without re-assigning jobs twice.

Implement a node that survives crashes and prevents duplicate assignments:

// After crash and restart, recover from WAL
{ "type": "scheduler_restart" }
-> { "type": "recovery_complete",
    "job_assignments": {"job1": "w1"},
    "pending_notifications": ["w1"] }

// Primary crashes -> elect new leader from replicas
{ "type": "leader_crash", "leader_id": "s1" }
-> { "type": "new_leader_elected",
    "old_leader": "s1", "new_leader": "s2", "term": 2 }

// Generation number prevents duplicate assignment
assign job1->w1 (gen=1)  // accepted
assign job1->w2 (gen=1)  // duplicate same generation -> rejected
-> { "type": "assignment_rejected",
    "reason": "Job already assigned in generation 1",
    "existing_assignment": {"job_id": "job1", "worker": "w1"} }

Sample Test Cases

Scheduler crash recoveryTimeout: 5000ms
Input
{"src":"client","dest":"scheduler","body":{"type":"init","msg_id":1,"replicas":["s1","s2","s3"]}}
{"type":"assign_job","job_id":"job1","worker_id":"w1"}
{"type":"scheduler_crash"}
{"type":"scheduler_restart"}
Expected Output
{"type": "recovery_complete", "job_assignments": {"job1": "w1"}, "pending_notifications": ["w1"]}
Leader election after crashTimeout: 5000ms
Input
{"src":"client","dest":"scheduler_cluster","body":{"type":"init","msg_id":1,"replicas":["s1","s2","s3"]}}
{"type":"leader_crash","leader_id":"s1"}
Expected Output
{"type": "new_leader_elected", "old_leader": "s1", "new_leader": "s2", "term": 2}

Hints

Hint 1
WAL: write every assignment to the log before applying it in memory
Hint 2
On restart, replay the WAL sequentially to rebuild the in-memory assignment map
Hint 3
Generation numbers (epochs): reject an assignment message if the same job is already assigned in that generation
Hint 4
After recovery, resend pending assignment notifications to all affected workers
Hint 5
Leader election: highest-ID surviving replica becomes the new leader
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

WALcrash recoveryleader electiongeneration numbersduplicate prevention
main.py
python
Implement Fault-Tolerant Scheduler - The Scheduler | Build Distributed Systems