ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/orchestrator/tasks/task-26-1-5-job-deadlines
TASK

Implementation

Without deadlines, a hung job can occupy resources indefinitely and starve everything else. Timeout enforcement cancels overdue jobs and reclaims their resources.

Implement a node that enforces three categories of time limit:

// 1. Execution timeout: job runs longer than allowed
{ "type": "execute_job", "msg_id": 1,
  "job_id": "job-123", "timeout_ms": 1000, "duration_ms": 2000 }
-> { "type": "job_timeout", "in_reply_to": 1,
    "job_id": "job-123", "reason": "Execution timeout exceeded" }

// 2. SLA check: warn before a deadline is missed
{ "type": "check_deadlines", "msg_id": 2 }
-> { "type": "deadlines_checked", "in_reply_to": 2,
    "jobs_near_deadline": [{"job_id": "job-456", "minutes_until": 3}],
    "missed": [] }

// 3. Queue timeout: cancel jobs stuck waiting too long
{ "type": "check_queue_timeouts", "msg_id": 3,
  "max_queue_time_ms": 300000 }
-> { "type": "queue_timeouts_enforced", "in_reply_to": 3,
    "cancelled": 1, "reason": "Queue timeout exceeded" }

Cancelling a job for any reason must immediately free its allocated resources.

Sample Test Cases

Execute job with timeoutTimeout: 3000ms
Input
{
  "src": "worker",
  "dest": "scheduler",
  "body": {
    "type": "execute_job",
    "msg_id": 1,
    "job_id": "job-123",
    "timeout_ms": 1000,
    "duration_ms": 2000
  }
}
Expected Output
{"type": "job_timeout", "in_reply_to": 1, "job_id": "job-123", "reason": "Execution timeout exceeded"}
Check SLA deadlinesTimeout: 5000ms
Input
{
  "src": "monitor",
  "dest": "scheduler",
  "body": {
    "type": "check_deadlines",
    "msg_id": 1
  }
}
Expected Output
{"type": "deadlines_checked", "in_reply_to": 1, "jobs_near_deadline": [{"job_id": "job-456", "minutes_until": 3}], "missed": []}

Hints

Hint 1
execute_job with duration_ms > timeout_ms must return job_timeout
Hint 2
check_deadlines scans running jobs and flags those within a warning threshold of their SLA
Hint 3
cancel_job marks the job cancelled and sets resources_freed=true
Hint 4
check_queue_timeouts cancels jobs that have been waiting beyond max_queue_time_ms
Hint 5
Always free allocated resources when cancelling — other jobs may be waiting for them
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

deadlineexecution timeoutSLA enforcementqueue timeoutresource reclamation
main.py
python
Implement Job Deadlines and Timeouts - The Orchestrator | Build Distributed Systems