ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/orchestrator/tasks/task-26-1-3-resource-management
TASK

Implementation

Jobs compete for finite resources like CPU, memory, and GPU. Resource management tracks what is available, grants allocations when possible, and queues jobs that would exceed capacity.

Implement a node that manages a shared resource pool:

// Allocate resources (succeeds if enough available)
{ "type": "allocate", "msg_id": 1,
  "requirements": [{"type": "cpu", "amount": 2},
                   {"type": "memory", "amount": 4}] }
-> { "type": "allocated", "in_reply_to": 1,
    "allocation_id": "<uuid>",
    "resources": {"cpu": 2, "memory": 4} }

// Not enough GPU -> queue the job
{ "type": "allocate", "msg_id": 2,
  "requirements": [{"type": "gpu", "amount": 4}] }
-> { "type": "queued", "in_reply_to": 2,
    "reason": "Insufficient resources",
    "available": {"gpu": 2}, "required": 4 }

// Release resources when job finishes
{ "type": "release", "msg_id": 3,
  "allocation_id": "alloc-123", "job_status": "completed" }
-> { "type": "released", "in_reply_to": 3,
    "allocation_id": "alloc-123",
    "resources_freed": {"cpu": 2, "memory": 4} }

// Submit 3 jobs with a concurrency cap of 2
{ "type": "submit_job", "msg_id": 4,
  "category": "heavy", "concurrency_limit": 2, "jobs": 3 }
-> { "type": "jobs_submitted", "in_reply_to": 4,
    "queued": 1, "running": 2, "waiting": 1 }

Sample Test Cases

Allocate resources for jobTimeout: 5000ms
Input
{
  "src": "scheduler",
  "dest": "resources",
  "body": {
    "type": "allocate",
    "msg_id": 1,
    "requirements": [
      {
        "type": "cpu",
        "amount": 2
      },
      {
        "type": "memory",
        "amount": 4
      }
    ]
  }
}
Expected Output
{"type": "allocated", "in_reply_to": 1, "allocation_id": ".*", "resources": {"cpu": 2, "memory": 4}}
Queue job when resources unavailableTimeout: 5000ms
Input
{
  "src": "scheduler",
  "dest": "resources",
  "body": {
    "type": "allocate",
    "msg_id": 1,
    "requirements": [
      {
        "type": "gpu",
        "amount": 4
      }
    ]
  }
}
Expected Output
{"type": "queued", "in_reply_to": 1, "reason": "Insufficient resources", "available": {"gpu": 2}, "required": 4}

Hints

Hint 1
Track a pool of available resources; subtract on allocate, restore on release
Hint 2
If requested amount exceeds available, return queued with available and required amounts
Hint 3
concurrency_limit caps how many jobs of a category run simultaneously; queue the rest
Hint 4
release returns which resources were freed using the allocation_id as the key
Hint 5
After a release, check if any queued jobs can now be allocated
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

resource allocationresource pooljob queuingconcurrency limitfairness
main.py
python
Implement Resource Management for Jobs - The Orchestrator | Build Distributed Systems