TASK
Implementation
Fixed-interval retries can overwhelm a recovering system. When many nodes retry at the same interval, they create a thundering herd that prevents recovery. Exponential backoff spreads retries over time.
Your task is to implement rpc_with_backoff that:
- On the first attempt, waits
base_delay(default: 100ms) before retrying - On each subsequent attempt, doubles the delay:
delay = base_delay * 2^attempt - Adds random jitter:
delay += random(0, delay * 0.1)to decorrelate retries - Caps the total delay at
max_delay(default: 2 seconds) - Gives up after
max_retriesattempts (default: 5)
Implement a backoff_relay message type:
Request: {"type": "backoff_relay", "msg_id": 1, "target": "n2", "payload": {"type": "read"}}
Response: {"type": "backoff_relay_ok", "in_reply_to": 1, "attempts": 3, "total_delay_ms": 700}The response includes the number of attempts and total delay in milliseconds. For this task, focus on the backoff calculation logic. Your node should output the computed delay schedule.
Write a comment in your code explaining why exponential backoff helps under high load (at least 100 words).
Sample Test Cases
Init and echo still workTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"echo","msg_id":2,"echo":"backoff"}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "echo_ok", "echo": "backoff", "in_reply_to": 2, "msg_id": 1}}
Compute backoff returns scheduleTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"compute_backoff","msg_id":2,"max_retries":3,"base_delay":0.1}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Base delay doubles with each attempt: delay = base_delay * 2^attempt
Hint 2▾
Add random jitter to prevent thundering herd: delay += random(0, delay * 0.5)
Hint 3▾
Cap the maximum delay to prevent unreasonably long waits
Hint 4▾
Log each retry with the computed delay for observability
Hint 5▾
The total wait time is the sum of all delays, not just the last one
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
exponential backoffjittercongestion controlload management
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/env python3
import sys
import json
import time
import random
class Node:
def __init__(self):
self.node_id = None
self.node_ids = []
self.next_msg_id = 0
def send(self, dest, body):
body["msg_id"] = self.next_msg_id
msg_id = self.next_msg_id
self.next_msg_id += 1
message = {"src": self.node_id, "dest": dest, "body": body}
print(json.dumps(message), flush=True)
return msg_id
def reply(self, request, body):
body["in_reply_to"] = request["body"]["msg_id"]
self.send(request["src"], body)
def compute_backoff_delay(self, attempt, base_delay=0.1, max_delay=2.0):
# TODO: Calculate exponential backoff delay with jitter
# attempt starts at 0
# Return delay in seconds
pass
def get_delay_schedule(self, max_retries=5, base_delay=0.1, max_delay=2.
0):
# TODO: Return list of delays for each retry attempt
# Each delay should use compute_backoff_delay
pass
def main():
node = Node()
for line in sys.stdin: