ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/mapreducer/tasks/task-28-1-3-shuffle-phase
TASK

Implementation

After the map phase, all values for the same key must reach the same reducer. The shuffle phase does exactly this: it partitions map outputs by key hash, groups values for each key, and optionally combines them locally to reduce network traffic.

Your node handles three shuffle operations:

// Partition pairs across N reducers using hash(key) % N
{ "type": "partition", "msg_id": 1,
  "pairs": [["hello",1],["world",1],["hello",1]],
  "num_reducers": 2 }{ "type": "partitioned", "in_reply_to": 1,
    "partitions": [
      {"reducer_id": 0, "pairs": [["world",1]]},
      {"reducer_id": 1, "pairs": [["hello",2]]}
    ]}

// Group pairs by key (returns key → [values] map)
{ "type": "group", "msg_id": 2,
  "pairs": [["hello",1],["world",1],["hello",1]] }{ "type": "grouped", "in_reply_to": 2,
    "groups": {"hello": [1,1], "world": [1]} }

// Combine (local pre-reduce): sum values per key before sending over network
{ "type": "combine", "msg_id": 3,
  "pairs": [["hello",1],["world",1],["hello",1]] }{ "type": "combined", "in_reply_to": 3,
    "pairs": [["hello",2],["world",1]] }

For partition, after assigning pairs to reducer buckets, also group and sort keys alphabetically within each partition before returning.

Sample Test Cases

Partition map outputsTimeout: 5000ms
Input
{
  "src": "shuffler",
  "dest": "partitioner",
  "body": {
    "type": "partition",
    "msg_id": 1,
    "pairs": [
      [
        "hello",
        1
      ],
      [
        "world",
        1
      ],
      [
        "hello",
        1
      ]
    ],
    "num_reducers": 2
  }
}
Expected Output
{"type": "partitioned", "in_reply_to": 1, "partitions": [{"reducer_id": 0, "pairs": [["world", 1]]}, {"reducer_id": 1, "pairs": [["hello", 2]]}]}
Group by keyTimeout: 5000ms
Input
{
  "src": "shuffler",
  "dest": "grouper",
  "body": {
    "type": "group",
    "msg_id": 1,
    "pairs": [
      [
        "hello",
        1
      ],
      [
        "world",
        1
      ],
      [
        "hello",
        1
      ]
    ]
  }
}
Expected Output
{"type": "grouped", "in_reply_to": 1, "groups": {"hello": [1, 1], "world": [1]}}

Hints

Hint 1
Partition key: reducer_id = hash(key) % num_reducers
Hint 2
Use a simple string hash: sum of char codes, then abs() % num_reducers
Hint 3
group collects all values for the same key before reduce
Hint 4
combine is a local pre-reduce: sum values for the same key on the mapper side
Hint 5
sort keys alphabetically within each partition
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

shuffle phasehash partitioningkey groupingcombinerreduce assignment
main.py
python
Implement Shuffle Phase with Hash Partitioning - The MapReducer | Build Distributed Systems