The Messenger
Build the foundation of distributed communication. You will implement a Maelstrom node that handles JSON messages, processes initialization, and responds to echo requests. This track teaches the fundamental protocol that underlies all subsequent challenges.
Subtracks & Tasks
Hello, Distributed World
Implement Basic JSON Message Parser
In distributed systems, nodes communicate by exchanging messages. The Maelstrom framework uses JSON messages over stdin/stdout for simplicity and lang...
Handle Init Message and Store Cluster Metadata
Before processing any workload, Maelstrom sends an init message to each node. This message tells your node its identity and the full cluster membershi...
Implement Echo Service with Proper Acknowledgment
The echo workload is the simplest Maelstrom workload. Clients send echo messages containing a value, and your node must echo that value back. Request...
Add Message Envelope Validation
Production systems must handle malformed input gracefully. Your node should validate that incoming messages have the required structure before process...
Create Async Event Loop for Concurrent Message Handling
Real distributed systems handle many messages concurrently. Your current synchronous implementation processes one message at a time, which limits thro...
RPC and the Request-Response Model
Implement Synchronous RPC with Timeout
In a distributed system, nodes often need to call remote procedures on other nodes and wait for the result. This is called **synchronous RPC** (Remote...
Implement Timeout and Retry Loop for RPC
In distributed systems, messages can be lost or delayed indefinitely. A single RPC call with a timeout is not enough — you need a **retry loop** to ha...
Implement Async RPC Using Callbacks
Synchronous RPC blocks the caller until a reply arrives, which prevents the node from handling other messages during that time. In high-throughput dis...
Implement Callback Reaper for Leaked RPCs
When a node sends an async RPC but the recipient crashes or the network drops the message, the callback stays in memory forever. This is a **resource ...
Implement Exponential Backoff for Retries
Fixed-interval retries can overwhelm a recovering system. When many nodes retry at the same interval, they create a **thundering herd** that prevents ...
The Protocol Beneath
Model Message Format with Typed Schema
Raw JSON is just strings. A typed schema wraps the raw message in classes with explicit fields, validation, and serialization methods — making it impo...
Add Message Envelope Logger with Timestamps
In production distributed systems, **message tracing** is critical for debugging. When something goes wrong, you need to answer: "What messages did th...
Implement Message Deduplication with LRU Cache
Networks can duplicate messages. If a sender retries because it did not receive an acknowledgment (but the original was actually delivered), the recei...
Benchmark Node Throughput and Latency
How fast is your node? In production systems, you need to measure **throughput** (messages per second) and **latency** (time to process each message)....
Add Chaos Mode with Random Message Dropping
Real networks drop messages. Netflix pioneered **chaos engineering** — deliberately injecting failures to test resilience. Your task is to add a "chao...
Interview Prep
Common interview questions for Distributed Systems / Backend Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
Message queues (Kafka, SQS) for async decoupling; gRPC/HTTP for synchronous RPCs. Each message must be self-contained. Discuss at-least-once vs at-most-once delivery, idempotency keys to handle retries, and correlation IDs for request tracing across services.
Model Answer
Correlation/request IDs: the sender attaches a unique ID to each outgoing request. The receiver echoes this ID in its response. The sender maps incoming response IDs to pending callbacks. This is how HTTP/2 stream IDs, Kafka consumer group offsets, and Maelstrom msg_ids all work.
Model Answer
Use timeouts with exponential backoff and jitter. Retry only idempotent operations (GET, PUT with full replacement) or operations with idempotency keys. Use circuit breakers to stop retrying against a consistently failing downstream. Distinguish between 503 (retry) and 400/404 (do not retry).
Model Answer
A message broker (Kafka, RabbitMQ) provides async, durable message delivery with decoupled producers and consumers. A service mesh (Istio, Linkerd) handles synchronous service-to-service traffic with features like mTLS, retries, circuit breaking, and observability. Use a broker when you need temporal decoupling or fan-out; use a mesh for sync RPC with cross-cutting network concerns.
Model Answer
Check for duplicate delivery: is the queue at-least-once? Is the consumer crashing after processing but before acknowledging? Add idempotency: track processed message IDs in a store and skip duplicates. Use exactly-once semantics in Kafka (requires transactions + idempotent producers) for critical flows. Log message IDs at each processing step to trace duplicates.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
Maelstrom sends messages line by line over a long-lived process. Buffering all of stdin blocks the read loop until the process is killed.
The fix
Read stdin line by line in a loop using a buffered reader. Each newline-delimited JSON object is one complete message.
Why it happens
Most languages buffer stdout by default. The written bytes sit in an OS buffer and are never flushed to the pipe that Maelstrom is reading.
The fix
Explicitly flush stdout after every `json.dump` / `fmt.Println` call. In Python use `sys.stdout.flush()`. In Go, `os.Stdout` is unbuffered by default, but wrapping it in a `bufio.Writer` requires a manual flush.
Why it happens
The reply must route back: `dest` = original `src`, `src` = this node's own ID. Copying the outgoing message directly from the incoming one inverts this.
The fix
Always set `reply.dest = incoming.src` and `reply.src = self.node_id`. Build a dedicated `reply()` helper so this is never done ad-hoc.
Why it happens
The protocol requires `body.in_reply_to` to equal the original `body.msg_id`. Without it Maelstrom cannot correlate responses to requests.
The fix
Copy `incoming.body.msg_id` into `reply.body.in_reply_to`. Most node frameworks do this automatically; if writing raw JSON, do it explicitly.
Why it happens
Maelstrom may run a workload that fires many simultaneous RPCs. Any shared map or counter accessed from multiple goroutines / threads without synchronization is a data race.
The fix
Protect every shared data structure with a mutex (Go `sync.Mutex`) or thread lock (Python `threading.Lock`). Prefer immutable message handlers that only mutate state through a single serialised channel.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | JSON | Protobuf | MessagePack |
|---|---|---|---|
| Human readable | Yes | No (binary) | No (binary) |
| Payload size | Large (field names repeated) | Small (field tags, no names) | Medium (compact binary JSON) |
| Parse speed | Slow | Fast | Fast |
| Schema required | No | Yes (.proto file) | No |
| Schema evolution | Flexible but fragile | Excellent (field numbers) | Fragile (no field names) |
| Debugging ease | Easy (curl, browser) | Hard (need protoc) | Hard (binary) |
| Best for | APIs, prototyping | gRPC, high-throughput services | Cache serialisation, Redis |
| Dimension | TCP | UDP |
|---|---|---|
| Delivery guarantee | Exactly once (at the OS level) | Best effort — packets may be lost |
| Ordering | In-order delivery guaranteed | Out-of-order delivery possible |
| Latency | Higher (handshake, retransmit, ACKs) | Lower (fire and forget) |
| Head-of-line blocking | Yes — one lost packet blocks the stream | No — each datagram is independent |
| Connection setup | 3-way handshake required | Connectionless, first packet is immediately sent |
| Typical use | HTTP, databases, file transfer | DNS, video streaming, gaming, QUIC |