ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/scheduler/tasks/task-22-2-2-work-partitioning
TASK

Implementation

Large datasets are split into partitions so multiple workers can process them simultaneously. The partitioning strategy controls how evenly work is distributed, which directly affects total job time.

Implement a node that partitions work and handles stragglers:

// Split 1TB input evenly across 4 workers
{ "type": "submit_mapreduce", "msg_id": 1,
  "input": "s3://data/*.txt", "map": "wordcount", "num_partitions": 4 }
-> { "type": "job_submitted", "in_reply_to": 1,
    "job_id": "mr1",
    "partition_sizes": [256, 256, 256, 256], "workers": [1,2,3,4] }

// Hash beats range on skewed keys
{ "type": "benchmark_partitioning", "msg_id": 2,
  "strategies": ["range","hash"], "data": "keys_with_skew" }
-> results: range skew_factor=5.0, hash skew_factor=1.2

// Slow partition -> speculative execution on backup worker
{ "type": "submit_mapreduce", ...,
  "straggler_partition": 2, "straggler_slowdown": 5 }
-> { "type": "straggler_detected",
    "partition": 2, "action": "speculative_execution", "backup_worker": 5 }

Sample Test Cases

Partition input dataTimeout: 5000ms
Input
{
  "src": "client",
  "dest": "mr_cluster",
  "body": {
    "type": "submit_mapreduce",
    "msg_id": 1,
    "input": "s3://data/*.txt",
    "map": "wordcount",
    "num_partitions": 4
  }
}
Expected Output
{"src": "mr_cluster", "dest": "client", "body": {"type": "job_submitted", "in_reply_to": 1, "job_id": "mr1", "partition_sizes": [256, 256, 256, 256], "workers": [1,2,3,4]}}
Hash partitioning distributionTimeout: 10000ms
Input
{
  "src": "client",
  "dest": "mr_cluster",
  "body": {
    "type": "benchmark_partitioning",
    "msg_id": 1,
    "strategies": [
      "range",
      "hash"
    ],
    "data": "keys_with_skew",
    "num_partitions": 4
  }
}
Expected Output
{"src": "mr_cluster", "dest": "client", "body": {"type": "benchmark_complete", "in_reply_to": 1, "results": {"range": {"skew_factor": 5.0}, "hash": {"skew_factor": 1.2}}}}

Hints

Hint 1
Hash partitioning: partition_id = hash(job_id) % num_partitions — distributes evenly even for skewed keys
Hint 2
Skew factor = max_partition_size / avg_partition_size — closer to 1.0 means more balanced
Hint 3
Range partitioning produces unequal partitions when key distribution is skewed
Hint 4
A straggler is a partition whose runtime exceeds 2x the median of completed partitions
Hint 5
Speculative execution: launch a backup copy on another worker and take whichever finishes first
OVERVIEW

Theoretical Hub

Concept overview coming soon

Key Concepts

hash partitioningrange partitioningdata skewstraggler mitigationspeculative execution
main.py
python
Implement MapReduce-Style Work Partitioning - The Scheduler | Build Distributed Systems