Subtracks & Tasks
MapReduce Fundamentals
Implement Single-Machine MapReduce
MapReduce splits work into two simple phases: **map** transforms each input record into key-value pairs, and **reduce** aggregates all values for the ...
Implement Distributed MapReduce
Single-machine MapReduce is limited by one CPU and one memory space. Distributed MapReduce sends different data chunks to different workers so all wor...
Implement Shuffle Phase with Hash Partitioning
After the map phase, all values for the same key must reach the same reducer. The shuffle phase does exactly this: it **partitions** map outputs by ke...
Implement Fault Tolerance in MapReduce
Long-running MapReduce jobs will inevitably encounter worker failures. Fault tolerance means detecting failures quickly and retrying the affected task...
Implement Chained MapReduce Pipeline
Complex data analysis often needs multiple MapReduce stages. A chained pipeline feeds the output of one job directly as input to the next, keeping eac...
Stream Processing
Implement Streaming Word Count
Batch MapReduce waits for all data before producing output. Stream processing handles an **infinite flow** of events: state is updated as each event a...
Implement Tumbling Windows
Tumbling windows divide an infinite stream into fixed-size, **non-overlapping** time buckets. Each event belongs to exactly one window. When the windo...
Implement Sliding Windows
Tumbling windows are non-overlapping — an event belongs to exactly one window. Sliding windows **overlap**: each event belongs to multiple windows, en...
Handle Out-of-Order Events with Watermarks
Events in a distributed stream do not always arrive in the order they occurred. A click at 10:00:00 may arrive after a click at 10:00:05 due to networ...
Implement Exactly-Once Processing
Exactly-once processing means each event affects the output exactly once, even when the system retries failed operations. It combines three mechanisms...
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.