ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/messenger/tasks/task-1-2-5-exponential-backoff
TASK

Implementation

Fixed-interval retries can overwhelm a recovering system. When many nodes retry at the same interval, they create a thundering herd that prevents recovery. Exponential backoff spreads retries over time.

Your task is to implement rpc_with_backoff that:

  1. On the first attempt, waits base_delay (default: 100ms) before retrying
  2. On each subsequent attempt, doubles the delay: delay = base_delay * 2^attempt
  3. Adds random jitter: delay += random(0, delay * 0.1) to decorrelate retries
  4. Caps the total delay at max_delay (default: 2 seconds)
  5. Gives up after max_retries attempts (default: 5)

Implement a backoff_relay message type:

Request:  {"type": "backoff_relay", "msg_id": 1, "target": "n2", "payload": {"type": "read"}}
Response: {"type": "backoff_relay_ok", "in_reply_to": 1, "attempts": 3, "total_delay_ms": 700}

The response includes the number of attempts and total delay in milliseconds. For this task, focus on the backoff calculation logic. Your node should output the computed delay schedule.

Write a comment in your code explaining why exponential backoff helps under high load (at least 100 words).

Sample Test Cases

Init and echo still workTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"echo","msg_id":2,"echo":"backoff"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "echo_ok", "echo": "backoff", "in_reply_to": 2, "msg_id": 1}}
Compute backoff returns scheduleTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"compute_backoff","msg_id":2,"max_retries":3,"base_delay":0.1}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Hints

Hint 1
Base delay doubles with each attempt: delay = base_delay * 2^attempt
Hint 2
Add random jitter to prevent thundering herd: delay += random(0, delay * 0.5)
Hint 3
Cap the maximum delay to prevent unreasonably long waits
Hint 4
Log each retry with the computed delay for observability
Hint 5
The total wait time is the sum of all delays, not just the last one
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

exponential backoffjittercongestion controlload management
main.py
python
Implement Exponential Backoff for Retries - The Messenger | Build Distributed Systems