Load Balancers: Routing Traffic Without Routing Yourself Into a Corner
Load balancing looks mechanical until you encounter the edge cases: sticky sessions, health checking under load, consistent hashing for stateful backends. The algorithm you choose determines how your system behaves when things go wrong.
Load Balancers: Routing Traffic Without Routing Yourself Into a Corner
Load balancing looks mechanical until you encounter the edge cases. Round-robin between three backend servers is trivial. Round-robin when one server is slow but not dead, when session state requires routing consistency, when backends have heterogeneous capacity — that is where the interesting decisions live.
The algorithm you choose for request routing determines how your system behaves when things go wrong. Most systems fail at exactly the wrong moment: under the peak load that would have stressed a poorly chosen algorithm.
The Fundamental Problem
A load balancer solves a straightforward problem: you have more incoming requests than a single server can handle, and you want to spread the work across multiple servers.
The complications come immediately. Which server should a given request go to? What if one server is overloaded? What if one server crashes mid-stream? What if the request has session state stored on a specific server?
Each of these questions has multiple reasonable answers. The right answer depends on what you are building. A load balancer for a stateless REST API has different requirements than a load balancer for a WebSocket gateway. A load balancer at the edge (between clients and your infrastructure) has different requirements than a service mesh sidecar (between services inside your infrastructure).
Algorithms
Round-robin is the simplest: send the first request to server 1, the second to server 2, the third to server 3, cycle back. Each server gets an equal share. This works well when all requests have similar cost and all servers have similar capacity.
Weighted round-robin extends this by allowing you to specify that server 1 should get twice the traffic of server 2. Useful when servers have different hardware capabilities. The weight ratio maps to the traffic ratio.
Least connections routes each new request to the server with the fewest active connections. Better than round-robin when request durations vary significantly. A long-running WebSocket connection on server 1 does not affect server 2's utilization in a least-connections model; it would skew traffic away from server 1 in a round-robin model.
Least response time routes to the server with the lowest average response time. More computationally expensive than least connections (you have to track and smooth the response time metric) but often produces better results for latency-sensitive applications.
Random routes to a randomly selected backend. Counter-intuitively, random routing can outperform round-robin under certain load patterns because it avoids the synchronization issues that arise when many clients use round-robin simultaneously with the same backend pool. Netflix's Ribbon load balancer uses weighted random for this reason.
IP hash routes each client to a backend determined by a hash of the client's IP address. The same client always goes to the same backend. This provides session affinity without explicit session tracking. The downside: if that backend fails, all sessions from that IP are disrupted.
Health Checking
A load balancer that routes traffic to a dead backend produces errors visible to users. Health checking avoids this by probing backends periodically and removing unhealthy ones from the rotation.
The probe can be at multiple layers. A TCP probe just checks that the port is open — the process is running. An HTTP probe fetches a specific endpoint (typically /health or /ready) and checks the response code. An application-level probe might check whether the backend can talk to its database.
The tricky cases are not binary up/down states. A backend might be unhealthy in specific ways that are not captured by your health check. It might pass the health check but fail on 20% of requests due to a bug triggered by specific input. It might be healthy but at 99% CPU, about to fall over.
This is where circuit breakers come in. A circuit breaker tracks the error rate on requests to each backend. When the error rate exceeds a threshold, the circuit opens: the backend is removed from rotation for a cool-down period. After the period, a few test requests are sent (the circuit is "half-open"). If those succeed, the circuit closes and the backend returns to full rotation. If they fail, the cool-down resets.
The Netflix Hystrix library popularized the circuit breaker pattern in service-to-service communication. Envoy proxy implements it natively. The pattern prevents a degraded backend from receiving full traffic and potentially cascading failures to the load balancer's other backends.
Consistent Hashing for Stateful Backends
Standard hashing for backend selection — hash the request key, modulo the number of backends — breaks badly when you add or remove servers. With five servers and hash-mod-5 routing, adding a sixth server causes 5/6 of all keys to be remapped to different backends. For stateless backends, this means a temporary cache miss rate spike. For stateful backends (where the server holds actual session data), it means requests routed to a server that does not have their session.
Consistent hashing reduces this disruption. In a consistent hash ring, both keys and servers are hashed to positions on a circle. Each key is routed to the first server clockwise from its position on the ring. When you add a server, only the keys between the new server and its predecessor are remapped — approximately 1/n of the total keys.
This makes consistent hashing essential for any stateful sharding use case: session routing, distributed caches, database shard routing. Cassandra, Amazon DynamoDB, and Memcached all use consistent hashing for data distribution.
The Layer 4 vs Layer 7 Decision
Load balancers operate at different network layers, and the choice matters for performance and capability.
Layer 4 (transport layer) load balancers route based on IP addresses and TCP ports. They do not inspect the contents of the TCP stream. They are extremely fast because the routing decision is simple. AWS Network Load Balancer is layer 4.
Layer 7 (application layer) load balancers inspect the full HTTP request: URL, headers, cookies, body. This enables sophisticated routing: send /api/* to the API cluster and /static/* to the CDN origin cluster. Implement sticky sessions based on a cookie. Terminate TLS and add correlation ID headers. The cost is higher computational overhead per request. AWS Application Load Balancer and Nginx are layer 7.
Most public-facing web applications use layer 7. Internal service meshes often use layer 7 sidecars (Envoy, Linkerd's proxy) for observability and traffic management. The edge between your network and the internet almost always has layer 4 for DDoS protection.
Building a Load Balancer
The Proxies track asks you to implement a load balancer that handles request routing across multiple backend nodes, with health checking and algorithm selection. You will implement round-robin and least-connections as a minimum, and see firsthand how the algorithm choice affects distribution under different load patterns.
The implementation teaches you the mechanics that every reverse proxy (Nginx, HAProxy, Envoy, Traefik) implements. Understanding it at the implementation level makes you far more effective at configuring and debugging those systems in production.
Ready to build it? The Load Balancers track builds a working layer 7 load balancer with pluggable routing algorithms. You will implement health checking, round-robin, and least-connections, and test the behavior under simulated backend failures. The same concepts are in Nginx, HAProxy, Envoy, and AWS ALB.
Build it yourself
Reading about distributed systems is useful. Building them is how you actually learn.
Start the Load Balancers track