TASK
Implementation
After the map phase, all values for the same key must reach the same reducer. The shuffle phase does exactly this: it partitions map outputs by key hash, groups values for each key, and optionally combines them locally to reduce network traffic.
Your node handles three shuffle operations:
// Partition pairs across N reducers using hash(key) % N
{ "type": "partition", "msg_id": 1,
"pairs": [["hello",1],["world",1],["hello",1]],
"num_reducers": 2 }
→ { "type": "partitioned", "in_reply_to": 1,
"partitions": [
{"reducer_id": 0, "pairs": [["world",1]]},
{"reducer_id": 1, "pairs": [["hello",2]]}
]}
// Group pairs by key (returns key → [values] map)
{ "type": "group", "msg_id": 2,
"pairs": [["hello",1],["world",1],["hello",1]] }
→ { "type": "grouped", "in_reply_to": 2,
"groups": {"hello": [1,1], "world": [1]} }
// Combine (local pre-reduce): sum values per key before sending over network
{ "type": "combine", "msg_id": 3,
"pairs": [["hello",1],["world",1],["hello",1]] }
→ { "type": "combined", "in_reply_to": 3,
"pairs": [["hello",2],["world",1]] }For partition, after assigning pairs to reducer buckets, also group and sort keys alphabetically within each partition before returning.
Sample Test Cases
Partition map outputsTimeout: 5000ms
Input
{
"src": "shuffler",
"dest": "partitioner",
"body": {
"type": "partition",
"msg_id": 1,
"pairs": [
[
"hello",
1
],
[
"world",
1
],
[
"hello",
1
]
],
"num_reducers": 2
}
}Expected Output
{"type": "partitioned", "in_reply_to": 1, "partitions": [{"reducer_id": 0, "pairs": [["world", 1]]}, {"reducer_id": 1, "pairs": [["hello", 2]]}]}Group by keyTimeout: 5000ms
Input
{
"src": "shuffler",
"dest": "grouper",
"body": {
"type": "group",
"msg_id": 1,
"pairs": [
[
"hello",
1
],
[
"world",
1
],
[
"hello",
1
]
]
}
}Expected Output
{"type": "grouped", "in_reply_to": 1, "groups": {"hello": [1, 1], "world": [1]}}Hints
Hint 1▾
Partition key: reducer_id = hash(key) % num_reducers
Hint 2▾
Use a simple string hash: sum of char codes, then abs() % num_reducers
Hint 3▾
group collects all values for the same key before reduce
Hint 4▾
combine is a local pre-reduce: sum values for the same key on the mapper side
Hint 5▾
sort keys alphabetically within each partition
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
shuffle phasehash partitioningkey groupingcombinerreduce assignment
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()