ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/tracer/tasks/task-23-2-5-alert-integrations
TASK

Implementation

Alert integrations route notifications to the right people and tools. Critical incidents trigger PagerDuty to page the on-call engineer. Non-critical alerts post to Slack for team visibility with action buttons. If no one responds, escalation policies ensure the alert keeps moving up the chain.

Implement a node that handles alert routing and on-call management:

// CRITICAL alert -> PagerDuty incident
{ "type": "create_incident", "msg_id": 1,
  "title": "High error rate", "service": "api",
  "severity": "critical",
  "description": "Error rate is 15% (threshold: 5%)" }
-> { "type": "incident_created", "in_reply_to": 1,
    "incident_id": "INC123", "status": "triggered",
    "assigned_to": "on-call-engineer" }

// WARNING alert -> Slack channel with action buttons
{ "type": "send_alert", "msg_id": 2,
  "channel": "#alerts", "severity": "warning",
  "title": "WARNING: High latency", "service": "api" }
-> { "type": "alert_sent", "in_reply_to": 2,
    "channel": "#alerts", "notification_id": "<uuid>",
    "actions": ["Acknowledge", "View Details"] }

// No response after 15 min -> escalate to next level
{ "type": "escalate_incident", "msg_id": 3,
  "incident_id": "INC123", "current_level": 1,
  "timeout_minutes": 15, "no_response": true }
-> { "type": "incident_escalated", "in_reply_to": 3,
    "incident_id": "INC123",
    "from_level": 1, "to_level": 2,
    "escalated_to": "team-lead@example.com" }

Sample Test Cases

PagerDuty alert integrationTimeout: 5000ms
Input
{
  "src": "alerter",
  "dest": "pagerduty",
  "body": {
    "type": "create_incident",
    "msg_id": 1,
    "title": "High error rate",
    "service": "api",
    "severity": "critical",
    "description": "Error rate is 15% (threshold: 5%)"
  }
}
Expected Output
{"type": "incident_created", "in_reply_to": 1, "incident_id": "INC123", "status": "triggered", "assigned_to": "on-call-engineer"}
Slack alert notificationTimeout: 5000ms
Input
{
  "src": "alerter",
  "dest": "slack",
  "body": {
    "type": "send_alert",
    "msg_id": 1,
    "channel": "#alerts",
    "title": "WARNING: High latency",
    "severity": "warning",
    "description": "P95 latency is 500ms (threshold: 200ms)",
    "service": "api"
  }
}
Expected Output
{"type": "alert_sent", "in_reply_to": 1, "channel": "#alerts", "notification_id": ".*", "actions": ["Acknowledge", "View Details"]}

Hints

Hint 1
create_incident returns incident_id, status=triggered, and the assigned on-call user
Hint 2
send_alert formats a Slack message with action buttons: Acknowledge and View Details
Hint 3
get_on_call looks up who is currently on-call for a team at a specific time
Hint 4
Escalation: after timeout with no response, escalate to the next level in the policy
Hint 5
notification_id in Slack response should be unique per message
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

PagerDutySlackon-call rotationescalation policyincident lifecycle
main.py
python
Implement Alert Integrations and On-Call Management - The Tracer | Build Distributed Systems