ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/gossiper/tasks/task-3-3-2-tree-failure
TASK

Implementation

Tree broadcast is efficient but fragile: if a node crashes, all its descendants lose connectivity. Your task is to add failure detection with direct fallback.

When forwarding to a neighbor, expect a broadcast_ack within a timeout. If no ack arrives, fall back to direct delivery to all known nodes that might be in the failed subtree.

Implement:

  1. broadcast with ack tracking
  2. broadcast_ack handler for acknowledgments
  3. check_acks handler to simulate timeout checking and trigger fallback
  4. failure_stats to report failures
Request:  {"type": "check_acks", "msg_id": 1}
Response: {"type": "check_acks_ok", "in_reply_to": 1, "pending": 2, "timed_out": 1, "direct_sent": 3}
Request:  {"type": "failure_stats", "msg_id": 2}
Response: {"type": "failure_stats_ok", "in_reply_to": 2, "total_failures": 1, "direct_deliveries": 3}

Sample Test Cases

Broadcast with no neighborsTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"topology","msg_id":2,"topology":{"n1":[]}}}
{"src":"c1","dest":"n1","body":{"type":"broadcast","msg_id":3,"message":42}}
{"src":"c1","dest":"n1","body":{"type":"read","msg_id":4}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "topology_ok", "in_reply_to": 2, "msg_id": 1}}
{"src": "n1", "dest": "c1", "body": {"type": "broadcast_ok", "in_reply_to": 3, "msg_id": 2}}
{"src": "n1", "dest": "c1", "body": {"type": "read_ok", "messages": [42], "in_reply_to": 4, "msg_id": 3}}
Failure stats initially zeroTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"failure_stats","msg_id":2}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
{"src": "n1", "dest": "c1", "body": {"type": "failure_stats_ok", "total_failures": 0, "direct_deliveries": 0, "in_reply_to": 2, "msg_id": 1}}

Hints

Hint 1
When a child node crashes, the entire subtree below it goes dark
Hint 2
Track acknowledgments from children with timeouts
Hint 3
If no ack within 500ms, send directly to known downstream nodes
Hint 4
Maintain a list of all nodes for direct fallback
Hint 5
Log failed deliveries to stderr for debugging
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

fault tolerancetree failureack timeoutdirect delivery
main.py
python
Handle Tree Node Failure with Direct Fallback - The Gossiper | Build Distributed Systems