TASK
Implementation
Scatter-gather is a fundamental distributed query execution pattern. The coordinator "scatters" a query to all shards, each shard processes its local data, and the coordinator "gathers" partial results into a final response.
Query execution flow:
- Client sends a query to the coordinator
- Coordinator forwards the query to all known shards
- Each shard executes the query on its local data
- Each shard returns partial results to the coordinator
- Coordinator merges all partial results into a complete response
- Coordinator returns the merged response to the client
Handling partial failures:
- Set a timeout for each shard response (e.g., 1000ms)
- If a shard times out, exclude its results but continue with other shards
- Track which shards responded successfully
- Return a "shards_responded" count so the client knows if results are complete
Example query:
Request: {"type": "scatter_query", "msg_id": 1, "query": "SELECT * FROM users WHERE age > 25"}
Response: {"type": "scatter_query_ok", "in_reply_to": 1, "results": [...], "shards_total": 3, "shards_responded": 3}If shard 2 is down:
Response: {"type": "scatter_query_ok", "in_reply_to": 1, "results": [...], "shards_total": 3, "shards_responded": 2}Sample Test Cases
All shards respond successfullyTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2","s3"]}}
{"src":"c1","dest":"coord","body":{"type":"scatter_query","msg_id":2,"query":"SELECT * FROM users"}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
One shard times outTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2","s3"]}}
{"src":"c1","dest":"coord","body":{"type":"scatter_query","msg_id":2,"query":"SELECT * FROM users","timeout_ms":500}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
The coordinator sends the query to all shards in parallel
Hint 2▾
Each shard executes the query locally and returns partial results
Hint 3▾
The coordinator merges partial results into a final response
Hint 4▾
Use timeouts: if a shard doesn't respond within T ms, proceed without it
Hint 5▾
Track which shards responded: include a "shards_responded" field in the response
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
scatter-gatherquery coordinatorpartial resultstimeout handlingfault tolerance
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()