TASK
Implementation
Distributed aggregations require computing partial aggregates on each shard, then merging partial results at the coordinator. Different aggregation functions have different merge strategies.
COUNT aggregation:
- Shard:
SELECT COUNT(*) FROM users WHERE age > 25→{"count": 150} - Coordinator: sum all shard counts → total_count = 150 + 200 + 175 = 525
SUM aggregation:
- Shard:
SELECT SUM(price) FROM orders→{"sum": 15000.50} - Coordinator: sum all shard sums → total_sum = 15000.50 + 20000.75 + 17500.25
AVG aggregation:
- Shard:
SELECT AVG(rating), COUNT(*) FROM reviews→{"sum": 450.5, "count": 100} - Coordinator: compute
global_avg = total_sum / total_count - Cannot simply average the averages! Must weight by count.
Example query:
Request: {"type": "agg_query", "msg_id": 1, "agg_type": "SUM", "field": "price", "table": "orders"}
Response: {"type": "agg_query_ok", "in_reply_to": 1, "result": 52501.50, "shards_responded": 3}AVG query example:
Request: {"type": "agg_query", "msg_id": 2, "agg_type": "AVG", "field": "rating", "table": "reviews"}
Response: {"type": "agg_query_ok", "in_reply_to": 2, "result": 4.35, "shards_responded": 3}Sample Test Cases
SUM aggregation across shardsTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2","s3"]}}
{"src":"c1","dest":"coord","body":{"type":"agg_query","msg_id":2,"agg_type":"SUM","field":"price","table":"orders"}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
COUNT aggregation across shardsTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2"]}}
{"src":"c1","dest":"coord","body":{"type":"agg_query","msg_id":2,"agg_type":"COUNT","field":"*","table":"users"}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
COUNT: each shard returns a count, coordinator sums all counts
Hint 2▾
SUM: each shard returns a sum, coordinator sums all sums
Hint 3▾
AVG: each shard returns (sum, count), coordinator computes total_sum / total_count
Hint 4▾
MIN/MAX: each shard returns its min/max, coordinator takes the global min/max
Hint 5▾
Be careful with COUNT(DISTINCT): cannot simply sum counts
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
distributed aggregationspartial aggregatesCOUNTSUMAVGmerge functionsalgebraic properties
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()