TASK
Implementation
Add health checking to your load balancer:
- Periodically send health check requests to servers
- Track consecutive failures per server
- Mark server unhealthy after N failures
- Exclude unhealthy servers from selection
- Re-add servers after successful health checks
Support both active (probing) and passive (observing failures) checks.
Sample Test Cases
Exclude unhealthy serverTimeout: 5000ms
Input
{"src":"c0","dest":"lb","body":{"type":"init","msg_id":1,"node_id":"lb","node_ids":["lb","s1","s2","s3"]}}
{"src":"c1","dest":"lb","body":{"type":"health_status","msg_id":2,"server":"s1","consecutive_failures":5,"threshold":3}}
{"src":"c2","dest":"lb","body":{"type":"get_healthy_servers","msg_id":3}}
Expected Output
{"src":"lb","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}
{"src":"lb","dest":"c1","body":{"type":"health_status_ok","in_reply_to":2,"msg_id":1,"status":"unhealthy"}}
{"src":"lb","dest":"c2","body":{"type":"get_healthy_servers_ok","in_reply_to":3,"msg_id":2,"healthy":["s2","s3"]}}
Hints
Hint 1▾
Periodically probe each server
Hint 2▾
Mark unhealthy after consecutive failures
Hint 3▾
Remove from rotation until healthy
OVERVIEW
Theoretical Hub
Health Checking
Health checks detect server failures before routing requests to them. Active checks send periodic probes (HTTP GET /health). Passive checks observe real request failures.
Graceful Degradation
When servers fail, the load balancer redistributes traffic to healthy servers. Slow re-introduction (ramping up traffic) prevents overwhelming recovering servers.
Key Concepts
health checkfailoverliveness
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/env python3
import sys
import json
import threading
import time
class HealthCheckLB:
def __init__(self, servers, check_interval=10, failure_threshold=3):
self.servers = servers
self.check_interval = check_interval
self.failure_threshold = failure_threshold
self.healthy = {s: True for s in servers}
self.failures = {s: 0 for s in servers}
self.lock = threading.Lock()
def _health_check(self, server):
# TODO: Check if server is healthy
pass
def _health_check_loop(self):
# TODO: Periodically check all servers
pass
def mark_failure(self, server):
# TODO: Increment failures, mark unhealthy if threshold
pass
def mark_success(self, server):
# TODO: Reset failures, mark healthy
pass
def get_healthy_servers(self):
# TODO: Return list of healthy servers
pass
if __name__ == "__main__":
pass