ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/loadbalancers
Tracks/Load Balancers
14

Load Balancers

Intermediate
Scalability|15 tasks

Implement load balancing strategies to distribute traffic across backend servers. Build Layer 4 and Layer 7 balancers with health checking and various algorithms.

Subtracks & Tasks

Interview Prep

Common interview questions for Infrastructure / Platform Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Round-robin: equal distribution, works well when requests have similar cost and servers have similar capacity. Simple and predictable. Least-connections: better when request durations vary significantly (e.g., WebSocket connections alongside short HTTP requests) — avoids overloading servers with long-lived connections. Random: surprisingly effective when all servers are identical — avoids coordination overhead and has good behavior under simultaneous burst load. Netflix uses weighted random. Power of Two Choices (P2C) — pick two random servers, route to the one with fewer connections — achieves near-optimal balance with O(1) overhead.

Model Answer

Options: (1) Consistent hashing on user ID or session ID — the same user always maps to the same backend without any state in the LB, (2) Signed cookie containing the backend server ID — the LB reads the cookie and routes accordingly, no server-side state, (3) JWT tokens that contain all session state — any backend can validate and use them, removing the need for affinity entirely. Option 3 is the most scalable: eliminate the statefulness rather than routing around it. IP hash is the least reliable (NAT, mobile IPs change).

Model Answer

Possible causes: (1) Health check is too shallow — it checks if the port is open, not whether the backend is actually healthy (the DB connection pool could be exhausted). Improve health check to exercise actual dependencies. (2) Slow backends — the LB routes traffic to a server that passes health checks but is running slowly (GC pause, I/O saturation). Use latency-aware routing or least-response-time. (3) Partial failures — some routes fail, not all. The health check hits a healthy endpoint. (4) Race condition — backend is being drained/restarted, passes one health check but fails during the drain window. Use pre-stop hooks and connection draining.

Model Answer

L4 (transport layer): routes based on IP/TCP/UDP. Cannot see HTTP content. Extremely fast, minimal latency overhead, handles any TCP protocol. Use for: raw throughput workloads, non-HTTP protocols, DDoS mitigation (AWS NLB). L7 (application layer): inspects HTTP headers, URL, cookies. Enables content-based routing, header rewriting, JWT validation, A/B testing. Higher computational overhead per request. Use for: HTTP APIs, microservices routing, SSL termination with certificate management, WebSocket upgrades. Modern stacks often use both: L4 at the edge for DDoS/high-throughput, L7 inside the cluster for intelligent routing.

Model Answer

Rolling deployment: bring up new version instances; add to LB pool; drain old instances (stop new connections, wait for in-flight to complete); remove old instances from pool; shut down. Key steps: (1) Liveness vs readiness probes — do not add an instance to the pool until it passes the readiness check, (2) Pre-stop hook — allow a grace period before termination to drain in-flight requests, (3) Connection draining — LB waits for existing connections to complete before marking the backend as removed, (4) Health check passes before traffic — new instance serves a few test requests before receiving full traffic (can combine with canary or blue-green). Kubernetes handles this via readinessProbe and terminationGracePeriodSeconds.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

Round-robin distributes requests equally by count, not by backend capacity. A server twice as powerful should receive twice as many requests.

The fix

Use weighted round-robin or least-connections. Assign weights proportional to backend capacity, or route new requests to the backend with the fewest active connections.

Why it happens

Sticky sessions route a user to the same backend for the lifetime of their session. If that backend dies, the session is gone because it was only stored in that server's memory.

The fix

Store session state in a shared, external store (Redis, a database) rather than in-process memory. Then any backend can serve any session, and sticky sessions are no longer necessary.

Why it happens

If health checks run every 30 seconds and a backend fails immediately after a check, traffic flows to it for almost a full interval.

The fix

Use active health checks at 2-5 second intervals. Combine with passive health checking: remove a backend from rotation immediately when it returns 5xx errors above a threshold.

Why it happens

Abruptly closing a backend drops all TCP connections it is serving, including in-flight requests.

The fix

Implement graceful drain: stop sending new requests to the backend, wait for in-flight requests to complete (with a timeout), then remove it.

Why it happens

A single load balancer instance is itself a SPOF even though it is supposed to provide high availability for backends.

The fix

Run at least two load balancer instances in active-passive or active-active mode. Use anycast routing, DNS failover, or a floating IP (VRRP/keepalived) to route around a failed instance.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

DimensionRound-RobinLeast ConnectionsPower of Two Choices
How it selectsCycles through backends in orderPicks backend with fewest active connectionsRandomly picks 2 backends; routes to the one with fewer connections
Request duration sensitivityNone — ignores connection durationHigh — routes away from slow backendsHigh — probabilistically avoids slow backends
State required at LBCounter onlyActive connection count per backendActive connection count per backend
OverheadO(1)O(n) scan or O(1) with heapO(1) — sample only two backends
Handles heterogeneous backendsNo (use weighted variant)Yes — fast backends naturally get moreYes — fast backends attract more connections
Used inDNS round-robin, Nginx defaultHAProxy, most hardware LBsNginx upstream, Envoy, Google Maglev
Verdict:Round-robin for stateless, equal-cost requests. Least connections for long-lived or variable-cost requests. Power of Two Choices for the same use case but at much lower overhead at scale.

Concepts Covered

round robinload balancingstatelessleast connectionsdynamic loadconnection trackinghealth checkfailoverlivenessLayer 7HTTP routingcontent-basedconsistent hashingkey affinitycache localitylayer 7 load balancingHTTP proxyrequest routingheader inspectionbackend selectionpath-based routingURL rewritingrouting tableswildcard matchingbackend poolssticky sessionssession affinitycookie-based routingsession persistencestateful servicescircuit breakerfailure thresholdhalf-open stateautomatic recoverycascade preventionrate limitingtoken bucketper-IP limitsper-API-key limitsDDoS protectionleast-connectionsactive connection trackingload-based routingatomic countersvariable request durationsweighted round-robincapacity-based routingbackend weightsheterogeneous clusterstraffic proportionalitypower-of-two-choicesrandomized load balancingleast-connections approximationconstant-time selectionscalabilitycache coherencyminimal disruptionbackend additions/removalsthundering herdcascading failuresexponential backoffgraceful degradationcircuit breaking

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.