ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/mapreducer
Tracks/The MapReducer
30

The MapReducer

Advanced
Advanced|10 tasks

Process petabytes with simple map and reduce functions. Build single-machine and distributed MapReduce, shuffle phases, fault tolerance, streaming word counts, windowing, watermarks, and exactly-once processing.

Subtracks & Tasks

Concepts Covered

MapReducemap phasereduce phaseword countkey-value pairsshuffledistributed MapReduceworker nodesjob splittingparallel processingresult mergingshuffle phasehash partitioningkey groupingcombinerreduce assignmentfault toleranceworker failuretask retryheartbeatspeculative executionidempotencepipelinejob chainingmulti-stage processingintermediate datatop-Nsecondary sortstream processingstateful processingrunning aggregatesincremental updatestumbling windowstime-based windowswindow aggregationnon-overlapping windowsevent timesliding windowsoverlapping windowswindow sizeslide intervalmoving averagewatermarksout-of-order eventsallowed latenesslate event handlingexactly-onceidempotencydeduplicationcheckpointingtransactional commits

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.