ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/tracer/tasks/task-23-2-3-metrics-aggregation
TASK

Implementation

Individual data points are too granular for dashboards and alerting. Aggregation reduces them to meaningful summaries: totals across services, averages across instances, and time rollups that compress many data points into periodic buckets.

Implement a node that performs metric aggregations:

// Sum requests across services, grouped by service name
{ "type": "aggregate_metric", "msg_id": 1,
  "metric": "http_requests_total", "aggregator": "sum",
  "group_by": ["service"],
  "services": ["api","web","auth"], "time_range": "1h" }
-> { "type": "aggregation_result", "in_reply_to": 1,
    "results": [{"service":"api","value":50000},
                 {"service":"web","value":30000},
                 {"service":"auth","value":15000}],
    "total": 95000 }

// Average memory across three instances
{ "type": "aggregate_metric", "msg_id": 2,
  "metric": "memory_usage_bytes", "aggregator": "avg",
  "values": [1073741824, 2147483648, 1610612736] }
-> { "type": "aggregation_result", "in_reply_to": 2,
    "value": 1610612736, "unit": "bytes", "sample_count": 3 }

Sample Test Cases

Sum aggregation across servicesTimeout: 5000ms
Input
{
  "src": "query",
  "dest": "metrics",
  "body": {
    "type": "aggregate_metric",
    "msg_id": 1,
    "metric": "http_requests_total",
    "aggregator": "sum",
    "group_by": [
      "service"
    ],
    "services": [
      "api",
      "web",
      "auth"
    ],
    "time_range": "1h"
  }
}
Expected Output
{"type": "aggregation_result", "in_reply_to": 1, "results": [{"service": "api", "value": 50000}, {"service": "web", "value": 30000}, {"service": "auth", "value": 15000}], "total": 95000}
Average aggregationTimeout: 5000ms
Input
{
  "src": "query",
  "dest": "metrics",
  "body": {
    "type": "aggregate_metric",
    "msg_id": 1,
    "metric": "memory_usage_bytes",
    "aggregator": "avg",
    "instances": [
      "api-1",
      "api-2",
      "api-3"
    ],
    "values": [
      1073741824,
      2147483648,
      1610612736
    ]
  }
}
Expected Output
{"type": "aggregation_result", "in_reply_to": 1, "value": 1610612736, "unit": "bytes", "sample_count": 3}

Hints

Hint 1
sum: add all values across services or instances
Hint 2
avg: total sum / count of values
Hint 3
rollup: group data points by time interval, then compute aggregations within each bucket
Hint 4
p95: sort all values, return the one at index floor(0.95 * count)
Hint 5
total in a sum aggregation is the grand total across all groups
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

aggregationrollupsumaveragepercentiletime buckets
main.py
python
Implement Metrics Aggregation and Rollups - The Tracer | Build Distributed Systems