ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/tracer/tasks/task-23-1-3-trace-collector
TASK

Implementation

A trace collector receives spans from many services, groups them by trace ID, and stores the assembled traces. It also applies sampling to reduce storage volume and gracefully handles spans that arrive after a trace was already closed.

Implement a node that manages span collection and trace assembly:

// Spans from two services assembled into one trace
{ "type": "span", "trace_id": "t1", "span_id": "s1", "service": "A" }
{ "type": "span", "trace_id": "t1", "span_id": "s2", "service": "B",
  "parent_span_id": "s1" }
-> { "type": "trace_complete", "trace_id": "t1", "span_count": 2 }

// 1% sampling: reject most traces
{ "type": "span", "trace_id": "t2", "service": "fast" }
(sampling_rate: 0.01)
-> { "type": "span_accepted",
    "sampled": false, "reason": "Trace not sampled (1% rate)" }

// Query traces by service
{ "type": "query_traces", "msg_id": 1,
  "service": "service-a", "time_range": "1h" }
-> { "type": "query_results", "in_reply_to": 1,
    "traces": 125, "avg_duration_ms": 45 }

Late spans for a completed trace return "action": "update_trace" rather than being dropped.

Sample Test Cases

Aggregate spans into tracesTimeout: 5000ms
Input
{"src":"service_a","dest":"collector","body":{"type":"span","trace_id":"t1","span_id":"s1","service":"A"}}
{"src":"service_b","dest":"collector","body":{"type":"span","trace_id":"t1","span_id":"s2","service":"B","parent_span_id":"s1"}}
Expected Output
{"type": "trace_complete", "trace_id": "t1", "span_count": 2}
Trace samplingTimeout: 5000ms
Input
{
  "src": "service",
  "dest": "collector",
  "body": {
    "type": "span",
    "trace_id": "t2",
    "span_id": "s3",
    "service": "fast"
  },
  "sampling_rate": 0.01
}
Expected Output
{"type": "span_accepted", "sampled": false, "reason": "Trace not sampled (1% rate)"}

Hints

Hint 1
Group spans by trace_id; emit trace_complete when a trace has received all its spans
Hint 2
Sampling: hash(trace_id) % 100 < (sampling_rate * 100) to decide consistently per trace
Hint 3
query_traces filters by service and returns span count and average duration
Hint 4
Late spans for an already-completed trace should update it rather than be dropped
Hint 5
span_count increments with each new span for the same trace_id
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

trace collectorspan aggregationtrace samplinglate spanstrace queries
main.py
python
Implement Distributed Trace Collector - The Tracer | Build Distributed Systems