ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/orchestrator/tasks/task-26-2-4-circuit-breaking
TASK

Implementation

When a downstream service is failing, continuing to call it wastes resources and slows down your service. A circuit breaker detects this and fails fast: instead of waiting for a timeout, it immediately returns an error and periodically tests whether the service has recovered.

The circuit breaker has three states:

CLOSED  --(failure threshold exceeded)--> OPEN
OPEN    --(after timeout)--> HALF-OPEN
HALF-OPEN --(success)--> CLOSED
HALF-OPEN --(failure)--> OPEN

Implement a node that enforces this state machine per service:

// Enough failures -> breaker opens
{ "type": "call", "msg_id": 1,
  "force_failures": 5 }
-> { "type": "circuit_breaker_open", "in_reply_to": 1,
    "service": "service-b", "failures": 5 }

// Fail fast while circuit is open
{ "type": "call", "msg_id": 2, "state": "open" }
-> { "type": "error", "in_reply_to": 2,
    "error": "Circuit breaker OPEN", "service": "service-b" }

// Half-open probe succeeds -> close the breaker
{ "type": "call", "msg_id": 3,
  "state": "half_open", "force_success": true }
-> { "type": "circuit_breaker_closed", "in_reply_to": 3,
    "service": "service-b", "state": "closed" }

Sample Test Cases

Circuit breaker opens after failuresTimeout: 10000ms
Input
{
  "src": "service-a",
  "dest": "service-b",
  "body": {
    "type": "call",
    "msg_id": 1,
    "force_failures": 5
  }
}
Expected Output
{"type": "circuit_breaker_open", "in_reply_to": 1, "service": "service-b", "failures": 5}
Circuit breaker recovers on successTimeout: 5000ms
Input
{
  "src": "service-a",
  "dest": "service-b",
  "body": {
    "type": "call",
    "msg_id": 1,
    "state": "half_open",
    "force_success": true
  }
}
Expected Output
{"type": "circuit_breaker_closed", "in_reply_to": 1, "service": "service-b", "state": "closed"}

Hints

Hint 1
Three states: closed (calls pass through), open (fail immediately), half-open (test one call)
Hint 2
Transition to open when consecutive failures >= threshold
Hint 3
In half-open, allow one call through; close on success, re-open on failure
Hint 4
Fail fast: when open, return error immediately without calling the downstream service
Hint 5
Track failures per service so each service has its own independent circuit breaker
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

circuit breakerfail fasthalf-open statecascading failuresservice resilience
main.py
python
Implement Circuit Breaking in Service Mesh - The Orchestrator | Build Distributed Systems