TASK
Implementation
An alerting rules engine evaluates metric conditions and fires notifications when thresholds are breached. It routes alerts to the right channel based on severity, groups duplicate alerts to prevent storms, and auto-resolves when conditions return to normal.
Implement a node that evaluates alert rules and manages notifications:
// Error rate above threshold for 5 minutes -> WARNING
{ "type": "evaluate", "msg_id": 1,
"metric": "error_rate", "value": 0.08,
"threshold": 0.05, "duration_sec": 300 }
-> { "type": "alert_triggered", "in_reply_to": 1,
"rule": "High error rate", "severity": "WARNING", "value": 0.08 }
// Service down -> CRITICAL -> page PagerDuty
{ "type": "evaluate", "msg_id": 2,
"metric": "up", "value": 0, "threshold": 0,
"duration_sec": 60, "service": "api" }
{ routing: {channels: ["pagerduty"]} }
-> { "type": "alert_triggered", "in_reply_to": 2,
"severity": "CRITICAL", "action": "page_sent", "service": "api" }
// Metric returns to normal -> auto-resolve
{ "type": "evaluate", "msg_id": 3,
"metric": "error_rate", "value": 0.01,
"threshold": 0.05, "alert_resolved": true }
-> { "type": "alert_resolved", "in_reply_to": 3,
"rule": "High error rate", "resolution": "Value returned to normal" }Sample Test Cases
Threshold alert triggeredTimeout: 5000ms
Input
{
"src": "metrics",
"dest": "alerter",
"body": {
"type": "evaluate",
"msg_id": 1,
"metric": "error_rate",
"value": 0.08,
"threshold": 0.05,
"duration_sec": 300
}
}Expected Output
{"type": "alert_triggered", "in_reply_to": 1, "rule": "High error rate", "severity": "WARNING", "value": 0.08}Alert routing to PagerDutyTimeout: 5000ms
Input
{
"src": "metrics",
"dest": "alerter",
"body": {
"type": "evaluate",
"msg_id": 1,
"metric": "up",
"value": 0,
"threshold": 0,
"duration_sec": 60,
"service": "api"
},
"routing": {
"channels": [
"pagerduty"
]
}
}Expected Output
{"type": "alert_triggered", "in_reply_to": 1, "severity": "CRITICAL", "action": "page_sent", "service": "api"}Hints
Hint 1▾
Fire alert when metric > threshold for at least duration_sec seconds
Hint 2▾
Route CRITICAL severity to PagerDuty (pager); WARNING to Slack or email
Hint 3▾
Grouping: alerts with the same fingerprint are merged into one notification
Hint 4▾
Resolution: fire alert_resolved when the metric returns below threshold
Hint 5▾
severity is determined by which threshold band the value falls in
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
alert rulesthreshold evaluationalert routingalert groupingauto-resolution
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()