Implement Metrics Aggregation and Rollups - The Tracer

Implementation

Individual data points are too granular for dashboards and alerting. Aggregation reduces them to meaningful summaries: totals across services, averages across instances, and time rollups that compress many data points into periodic buckets.

Implement a node that performs metric aggregations:

// Sum requests across services, grouped by service name
{ "type": "aggregate_metric", "msg_id": 1,
  "metric": "http_requests_total", "aggregator": "sum",
  "group_by": ["service"],
  "services": ["api","web","auth"], "time_range": "1h" }
-> { "type": "aggregation_result", "in_reply_to": 1,
    "results": [{"service":"api","value":50000},
                 {"service":"web","value":30000},
                 {"service":"auth","value":15000}],
    "total": 95000 }

// Average memory across three instances
{ "type": "aggregate_metric", "msg_id": 2,
  "metric": "memory_usage_bytes", "aggregator": "avg",
  "values": [1073741824, 2147483648, 1610612736] }
-> { "type": "aggregation_result", "in_reply_to": 2,
    "value": 1610612736, "unit": "bytes", "sample_count": 3 }

Sample Test Cases

Sum aggregation across servicesTimeout: 5000ms

Input

{
  "src": "query",
  "dest": "metrics",
  "body": {
    "type": "aggregate_metric",
    "msg_id": 1,
    "metric": "http_requests_total",
    "aggregator": "sum",
    "group_by": [
      "service"
    ],
    "services": [
      "api",
      "web",
      "auth"
    ],
    "time_range": "1h"
  }
}

Expected Output

{"type": "aggregation_result", "in_reply_to": 1, "results": [{"service": "api", "value": 50000}, {"service": "web", "value": 30000}, {"service": "auth", "value": 15000}], "total": 95000}

Average aggregationTimeout: 5000ms

Input

{
  "src": "query",
  "dest": "metrics",
  "body": {
    "type": "aggregate_metric",
    "msg_id": 1,
    "metric": "memory_usage_bytes",
    "aggregator": "avg",
    "instances": [
      "api-1",
      "api-2",
      "api-3"
    ],
    "values": [
      1073741824,
      2147483648,
      1610612736
    ]
  }
}

Expected Output

{"type": "aggregation_result", "in_reply_to": 1, "value": 1610612736, "unit": "bytes", "sample_count": 3}

Hints

Hint 1▾

sum: add all values across services or instances

Hint 2▾

avg: total sum / count of values

Hint 3▾

rollup: group data points by time interval, then compute aggregations within each bucket

Hint 4▾

p95: sort all values, return the one at index floor(0.95 * count)

Hint 5▾

total in a sum aggregation is the grand total across all groups

Implementation

Sample Test Cases

Hints

Resources

Theoretical Hub

Key Concepts