TASK
Implementation
Implementing ORDER BY score DESC LIMIT 10 in a distributed system requires each shard to return its local top results, then the coordinator merges them to find the global top results.
Naive approach (wrong):
- Each shard returns its top 10 results
- Coordinator picks the top 10 from the combined 30
- Problem: if one shard has the top 100 highest scores, we miss results 11-100!
Correct approach:
- Each shard returns its top (LIMIT * safety_factor) results, e.g., top 30
- Coordinator merges all partial results using a priority queue
- Coordinator returns the global top 10
Example query:
Request: {"type": "top_n_query", "msg_id": 1, "table": "scores", "order_by": "score", "order": "DESC", "limit": 10}
Response: {"type": "top_n_query_ok", "in_reply_to": 1, "results": [...], "total_candidates": 90}Where total_candidates is the sum of candidates from all shards (e.g., 30 per shard × 3 shards = 90).
Handling ties:
When two users have the same score, we need consistent ordering. Use a composite sort key:
function compare(a, b) {
if (a.score !== b.score) return b.score - a.score; // DESC by score
return a.user_id.localeCompare(b.user_id); // ASC by user_id for ties
}Pagination:
For page 2 (LIMIT 10 OFFSET 10), each shard returns top 20 results, and the coordinator merges and returns results 11-20.
Implementation:
- Send
top_n_queryto all shards withlimit * safety_factor - Each shard sorts its local data and returns top N results
- Coordinator merges using a min-heap of size
limit - Coordinator returns the global top
limitresults
Sample Test Cases
Top 10 scores across 3 shardsTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2","s3"]}}
{"src":"c1","dest":"coord","body":{"type":"top_n_query","msg_id":2,"table":"scores","order_by":"score","order":"DESC","limit":10}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Top 10 with ties (consistent ordering)Timeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2"]}}
{"src":"c1","dest":"coord","body":{"type":"top_n_query","msg_id":2,"table":"scores","order_by":"score","order":"DESC","limit":10}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Each shard returns its top K results (where K = LIMIT * safety_factor)
Hint 2▾
Coordinator merges all partial results using a priority queue (min-heap)
Hint 3▾
Coordinator returns the global top K results
Hint 4▾
Use a safety_factor (e.g., 2-3x) to account for uneven distribution
Hint 5▾
Handle ties consistently: use (score, user_id) as the sort key
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
distributed sortingtop-N querymerge sorttie handlingpaginationconsistent ordering
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()