TASK
Implementation
Alert integrations route notifications to the right people and tools. Critical incidents trigger PagerDuty to page the on-call engineer. Non-critical alerts post to Slack for team visibility with action buttons. If no one responds, escalation policies ensure the alert keeps moving up the chain.
Implement a node that handles alert routing and on-call management:
// CRITICAL alert -> PagerDuty incident
{ "type": "create_incident", "msg_id": 1,
"title": "High error rate", "service": "api",
"severity": "critical",
"description": "Error rate is 15% (threshold: 5%)" }
-> { "type": "incident_created", "in_reply_to": 1,
"incident_id": "INC123", "status": "triggered",
"assigned_to": "on-call-engineer" }
// WARNING alert -> Slack channel with action buttons
{ "type": "send_alert", "msg_id": 2,
"channel": "#alerts", "severity": "warning",
"title": "WARNING: High latency", "service": "api" }
-> { "type": "alert_sent", "in_reply_to": 2,
"channel": "#alerts", "notification_id": "<uuid>",
"actions": ["Acknowledge", "View Details"] }
// No response after 15 min -> escalate to next level
{ "type": "escalate_incident", "msg_id": 3,
"incident_id": "INC123", "current_level": 1,
"timeout_minutes": 15, "no_response": true }
-> { "type": "incident_escalated", "in_reply_to": 3,
"incident_id": "INC123",
"from_level": 1, "to_level": 2,
"escalated_to": "team-lead@example.com" }Sample Test Cases
PagerDuty alert integrationTimeout: 5000ms
Input
{
"src": "alerter",
"dest": "pagerduty",
"body": {
"type": "create_incident",
"msg_id": 1,
"title": "High error rate",
"service": "api",
"severity": "critical",
"description": "Error rate is 15% (threshold: 5%)"
}
}Expected Output
{"type": "incident_created", "in_reply_to": 1, "incident_id": "INC123", "status": "triggered", "assigned_to": "on-call-engineer"}Slack alert notificationTimeout: 5000ms
Input
{
"src": "alerter",
"dest": "slack",
"body": {
"type": "send_alert",
"msg_id": 1,
"channel": "#alerts",
"title": "WARNING: High latency",
"severity": "warning",
"description": "P95 latency is 500ms (threshold: 200ms)",
"service": "api"
}
}Expected Output
{"type": "alert_sent", "in_reply_to": 1, "channel": "#alerts", "notification_id": ".*", "actions": ["Acknowledge", "View Details"]}Hints
Hint 1▾
create_incident returns incident_id, status=triggered, and the assigned on-call user
Hint 2▾
send_alert formats a Slack message with action buttons: Acknowledge and View Details
Hint 3▾
get_on_call looks up who is currently on-call for a team at a specific time
Hint 4▾
Escalation: after timeout with no response, escalate to the next level in the policy
Hint 5▾
notification_id in Slack response should be unique per message
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
PagerDutySlackon-call rotationescalation policyincident lifecycle
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()