Implement Distributed MapReduce - The MapReducer

Implementation

Single-machine MapReduce is limited by one CPU and one memory space. Distributed MapReduce sends different data chunks to different workers so all workers map in parallel, then the master merges and reduces.

Your node plays the role of the master. It receives a distribute job request and must coordinate the workers listed in the message:

// Master receives a distribute request
{ "type": "distribute", "msg_id": 1,
  "lines": ["hello world", "hello mapreduce", "world peace"],
  "workers": ["n2", "n3"] }

// Master splits lines evenly and sends to each worker
→ sends to n2: { "type": "map_chunk", "chunk": ["hello world", "hello mapreduce"] }
→ sends to n3: { "type": "map_chunk", "chunk": ["world peace"] }

// Workers reply with their map results
← n2: { "type": "chunk_result", "pairs": [["hello",1],["world",1],["hello",1],["mapreduce",1]] }
← n3: { "type": "chunk_result", "pairs": [["world",1],["peace",1]] }

// Master merges, reduces, and replies
→ { "type": "distribute_result", "in_reply_to": 1,
    "results": {"hello":2,"world":2,"mapreduce":1,"peace":1} }

Split the input into len(workers) chunks, forward each chunk, collect all pair lists, then run the same reduce logic from task 1 over the merged pairs.

Sample Test Cases

Split input into chunksTimeout: 5000ms

Input

{
  "src": "client",
  "dest": "master",
  "body": {
    "type": "split",
    "msg_id": 1,
    "lines": [
      "a",
      "b",
      "c",
      "d"
    ],
    "num_chunks": 2
  }
}

Expected Output

{"type": "split_result", "in_reply_to": 1, "chunks": [["a", "b"], ["c", "d"]]}

Merge worker resultsTimeout: 5000ms

Input

{
  "src": "client",
  "dest": "master",
  "body": {
    "type": "merge",
    "msg_id": 1,
    "worker_results": [
      [
        [
          "hello",
          1
        ],
        [
          "world",
          1
        ]
      ],
      [
        [
          "hello",
          1
        ],
        [
          "peace",
          1
        ]
      ]
    ]
  }
}

Expected Output

{"type": "merge_result", "in_reply_to": 1, "results": {"hello": 2, "world": 1, "peace": 1}}

Hints

Hint 1▾

split divides the input array into equal-sized chunks, one per worker

Hint 2▾

assign sends each chunk to a worker and waits for its map result

Hint 3▾

merge combines all worker outputs before the reduce phase

Hint 4▾

The master coordinates workers; workers only map their assigned chunk

Hint 5▾

Use worker count to decide chunk size: ceil(total / workers)

Implementation

Sample Test Cases

Hints

Resources

Theoretical Hub

Key Concepts