TASK
Implementation
Single-machine MapReduce is limited by one CPU and one memory space. Distributed MapReduce sends different data chunks to different workers so all workers map in parallel, then the master merges and reduces.
Your node plays the role of the master. It receives a distribute job request and must coordinate the workers listed in the message:
// Master receives a distribute request
{ "type": "distribute", "msg_id": 1,
"lines": ["hello world", "hello mapreduce", "world peace"],
"workers": ["n2", "n3"] }
// Master splits lines evenly and sends to each worker
→ sends to n2: { "type": "map_chunk", "chunk": ["hello world", "hello mapreduce"] }
→ sends to n3: { "type": "map_chunk", "chunk": ["world peace"] }
// Workers reply with their map results
← n2: { "type": "chunk_result", "pairs": [["hello",1],["world",1],["hello",1],["mapreduce",1]] }
← n3: { "type": "chunk_result", "pairs": [["world",1],["peace",1]] }
// Master merges, reduces, and replies
→ { "type": "distribute_result", "in_reply_to": 1,
"results": {"hello":2,"world":2,"mapreduce":1,"peace":1} }Split the input into len(workers) chunks, forward each chunk, collect all pair lists, then run the same reduce logic from task 1 over the merged pairs.
Sample Test Cases
Split input into chunksTimeout: 5000ms
Input
{
"src": "client",
"dest": "master",
"body": {
"type": "split",
"msg_id": 1,
"lines": [
"a",
"b",
"c",
"d"
],
"num_chunks": 2
}
}Expected Output
{"type": "split_result", "in_reply_to": 1, "chunks": [["a", "b"], ["c", "d"]]}Merge worker resultsTimeout: 5000ms
Input
{
"src": "client",
"dest": "master",
"body": {
"type": "merge",
"msg_id": 1,
"worker_results": [
[
[
"hello",
1
],
[
"world",
1
]
],
[
[
"hello",
1
],
[
"peace",
1
]
]
]
}
}Expected Output
{"type": "merge_result", "in_reply_to": 1, "results": {"hello": 2, "world": 1, "peace": 1}}Hints
Hint 1▾
split divides the input array into equal-sized chunks, one per worker
Hint 2▾
assign sends each chunk to a worker and waits for its map result
Hint 3▾
merge combines all worker outputs before the reduce phase
Hint 4▾
The master coordinates workers; workers only map their assigned chunk
Hint 5▾
Use worker count to decide chunk size: ceil(total / workers)
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
distributed MapReduceworker nodesjob splittingparallel processingresult merging
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()