Implement Service Mesh Observability - The Orchestrator

Implementation

Observability in a service mesh means collecting metrics, traces, and logs from every sidecar proxy automatically, without changing application code. The three pillars together let you understand what is happening and diagnose problems quickly.

Implement a node that handles all three observability signals:

// Record a request metric (latency + status code)
{ "type": "record", "msg_id": 1,
  "service": "api", "duration_ms": 150, "status": 200 }
-> { "type": "metrics_recorded", "in_reply_to": 1,
    "service": "api", "request_count": 1 }

// Create a distributed trace with a span
{ "type": "trace", "msg_id": 2,
  "operation": "GET /api/users" }
-> { "type": "trace_created", "in_reply_to": 2,
    "trace_id": "<uuid>", "span_id": "<uuid>" }

// Query access logs by service
{ "type": "query", "msg_id": 3,
  "filter": {"source_service": "api", "target_service": "database"} }
-> { "type": "access_logs", "in_reply_to": 3,
    "logs": [{"source_service": "api", "target_service": "database", "count": 50}] }

// Generate service dependency graph
{ "type": "generate", "msg_id": 4 }
-> { "type": "service_graph", "in_reply_to": 4,
    "nodes": ["api", "database", "cache"],
    "edges": [{"source": "api", "target": "database", "request_count": 500}] }

Sample Test Cases

Collect service metricsTimeout: 5000ms

Input

{
  "src": "sidecar",
  "dest": "metrics",
  "body": {
    "type": "record",
    "msg_id": 1,
    "service": "api",
    "duration_ms": 150,
    "status": 200
  }
}

Expected Output

{"type": "metrics_recorded", "in_reply_to": 1, "service": "api", "request_count": 1}

Create distributed traceTimeout: 5000ms

Input

{
  "src": "service-a",
  "dest": "tracer",
  "body": {
    "type": "trace",
    "msg_id": 1,
    "operation": "GET /api/users"
  }
}

Expected Output

{"type": "trace_created", "in_reply_to": 1, "trace_id": ".*", "span_id": ".*"}

Hints

Hint 1▾

record stores one request metric: service name, duration_ms, and HTTP status code

Hint 2▾

trace creates a trace with a unique trace_id and a span for the given operation

Hint 3▾

query returns access logs filtered by source_service and/or target_service

Hint 4▾

generate builds a service graph: nodes are service names, edges are (source, target, count) pairs

Hint 5▾

request_count increments by 1 for every record call for that service

Implementation

Sample Test Cases

Hints

Resources

Theoretical Hub

Key Concepts