Tracks/Load Balancers

Load Balancers

Intermediate

Scalability|15 tasks

Implement load balancing strategies to distribute traffic across backend servers. Build Layer 4 and Layer 7 balancers with health checking and various algorithms.

Subtracks & Tasks

Layer 4 Load Balancing

0/5

LA-1

beginner

Implement Round Robin Load Balancer

Implement round robin load balancing: 1. Maintain an ordered list of backend servers 2. Track current server index 3. For each request, send to serve...

round robinload balancingstateless

LA-2

intermediate

Implement Least Connections Algorithm

Implement least-connections load balancing: 1. Track active connection count for each server 2. When a request starts, increment count for chosen ser...

least connectionsdynamic loadconnection tracking

LA-3

intermediate

Add Health Checks and Failover

Add health checking to your load balancer: 1. Periodically send health check requests to servers 2. Track consecutive failures per server 3. Mark ser...

health checkfailoverliveness

LA-4

intermediate

Build Layer 7 Load Balancer

Build an HTTP-aware (Layer 7) load balancer: 1. Parse HTTP request (method, path, headers) 2. Route based on Host header (virtual hosts) 3. Route bas...

Layer 7HTTP routingcontent-based

LA-5

advanced

Implement Consistent Hashing for Load Balancing

Implement consistent hashing for stateful load balancing: 1. Assign servers to positions on hash ring 2. Hash each request key to ring position 3. Ro...

consistent hashingkey affinitycache locality

Layer 7 Load Balancing

0/5

LA-1

intermediate

Implement Layer 7 HTTP Proxy

Layer 7 load balancing operates at the HTTP application layer, inspecting headers and URL paths to make routing decisions. Unlike Layer 4 (TCP), L7 pr...

layer 7 load balancingHTTP proxyrequest routing+2 more

LA-2

intermediate

Implement Path-Based Routing

Path-based routing directs requests to different backend pools based on the URL path. This enables microservices architecture where /api/* routes to A...

path-based routingURL rewritingrouting tables+2 more

LA-3

intermediate

Implement Sticky Sessions

Sticky sessions (session affinity) ensure that all requests from a client are routed to the same backend server, essential for stateful services. **W...

sticky sessionssession affinitycookie-based routing+2 more

LA-4

advanced

Implement Circuit Breaking

**Circuit states**:. ```. CLOSED (normal). - Requests pass through to backend. - Track failures in a sliding window. - If failures > threshold → OPEN....

circuit breakerfailure thresholdhalf-open state+2 more

LA-5

intermediate

Implement Rate Limiting

Rate limiting protects backend services from being overwhelmed by too many requests. It prevents abuse, ensures fair usage, and maintains service avai...

rate limitingtoken bucketper-IP limits+2 more

Advanced Balancing Algorithms

0/5

AD-1

intermediate

Implement Least-Connections Load Balancing

Least-connections load balancing routes each request to the backend with the fewest active connections. This is superior to round-robin when request d...

least-connectionsactive connection trackingload-based routing+2 more

AD-2

intermediate

Implement Weighted Round-Robin Load Balancing

**Why weighted round-robin?**. ```. Problem: backends have different capacities. backend-1: 8 cores, 32GB RAM (high capacity). backend-2: 8 cores, 32G...

weighted round-robincapacity-based routingbackend weights+2 more

AD-3

advanced

Implement Power-of-Two-Choices Load Balancing

Power-of-two-choices is a randomized load balancing algorithm that approximates least-connections with constant-time selection. Instead of checking al...

power-of-two-choicesrandomized load balancingleast-connections approximation+2 more

AD-4

advanced

Implement Consistent Hashing for Load Balancing

Consistent hashing for load balancing ensures that the same client always routes to the same backend. This is useful for caching layers where session ...

consistent hashingsession affinitycache coherency+2 more

AD-5

advanced

Simulate Thundering Herd with Circuit Breaking

The thundering herd problem occurs when a large number of clients simultaneously retry after a backend failure, overwhelming the remaining backends an...

thundering herdcascading failuresexponential backoff+2 more

Interview Prep

Common interview questions for Infrastructure / Platform Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.

Model Answer

Round-robin: equal distribution, works well when requests have similar cost and servers have similar capacity. Simple and predictable. Least-connections: better when request durations vary significantly (e.g., WebSocket connections alongside short HTTP requests) — avoids overloading servers with long-lived connections. Random: surprisingly effective when all servers are identical — avoids coordination overhead and has good behavior under simultaneous burst load. Netflix uses weighted random. Power of Two Choices (P2C) — pick two random servers, route to the one with fewer connections — achieves near-optimal balance with O(1) overhead.

Model Answer

Options: (1) Consistent hashing on user ID or session ID — the same user always maps to the same backend without any state in the LB, (2) Signed cookie containing the backend server ID — the LB reads the cookie and routes accordingly, no server-side state, (3) JWT tokens that contain all session state — any backend can validate and use them, removing the need for affinity entirely. Option 3 is the most scalable: eliminate the statefulness rather than routing around it. IP hash is the least reliable (NAT, mobile IPs change).

Model Answer

Possible causes: (1) Health check is too shallow — it checks if the port is open, not whether the backend is actually healthy (the DB connection pool could be exhausted). Improve health check to exercise actual dependencies. (2) Slow backends — the LB routes traffic to a server that passes health checks but is running slowly (GC pause, I/O saturation). Use latency-aware routing or least-response-time. (3) Partial failures — some routes fail, not all. The health check hits a healthy endpoint. (4) Race condition — backend is being drained/restarted, passes one health check but fails during the drain window. Use pre-stop hooks and connection draining.

Model Answer

L4 (transport layer): routes based on IP/TCP/UDP. Cannot see HTTP content. Extremely fast, minimal latency overhead, handles any TCP protocol. Use for: raw throughput workloads, non-HTTP protocols, DDoS mitigation (AWS NLB). L7 (application layer): inspects HTTP headers, URL, cookies. Enables content-based routing, header rewriting, JWT validation, A/B testing. Higher computational overhead per request. Use for: HTTP APIs, microservices routing, SSL termination with certificate management, WebSocket upgrades. Modern stacks often use both: L4 at the edge for DDoS/high-throughput, L7 inside the cluster for intelligent routing.

Model Answer

Rolling deployment: bring up new version instances; add to LB pool; drain old instances (stop new connections, wait for in-flight to complete); remove old instances from pool; shut down. Key steps: (1) Liveness vs readiness probes — do not add an instance to the pool until it passes the readiness check, (2) Pre-stop hook — allow a grace period before termination to drain in-flight requests, (3) Connection draining — LB waits for existing connections to complete before marking the backend as removed, (4) Health check passes before traffic — new instance serves a few test requests before receiving full traffic (can combine with canary or blue-green). Kubernetes handles this via readinessProbe and terminationGracePeriodSeconds.

Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.

Common Mistakes

The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.

Why it happens

Round-robin distributes requests equally by count, not by backend capacity. A server twice as powerful should receive twice as many requests.

The fix

Use weighted round-robin or least-connections. Assign weights proportional to backend capacity, or route new requests to the backend with the fewest active connections.

Why it happens

Sticky sessions route a user to the same backend for the lifetime of their session. If that backend dies, the session is gone because it was only stored in that server's memory.

The fix

Store session state in a shared, external store (Redis, a database) rather than in-process memory. Then any backend can serve any session, and sticky sessions are no longer necessary.

Why it happens

If health checks run every 30 seconds and a backend fails immediately after a check, traffic flows to it for almost a full interval.

The fix

Use active health checks at 2-5 second intervals. Combine with passive health checking: remove a backend from rotation immediately when it returns 5xx errors above a threshold.

Why it happens

Abruptly closing a backend drops all TCP connections it is serving, including in-flight requests.

The fix

Implement graceful drain: stop sending new requests to the backend, wait for in-flight requests to complete (with a timeout), then remove it.

Why it happens

A single load balancer instance is itself a SPOF even though it is supposed to provide high availability for backends.

The fix

Run at least two load balancer instances in active-passive or active-active mode. Use anycast routing, DNS failover, or a floating IP (VRRP/keepalived) to route around a failed instance.

Comparison Mode

Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.

Dimension	Round-Robin	Least Connections	Power of Two Choices
How it selects	Cycles through backends in order	Picks backend with fewest active connections	Randomly picks 2 backends; routes to the one with fewer connections
Request duration sensitivity	None — ignores connection duration	High — routes away from slow backends	High — probabilistically avoids slow backends
State required at LB	Counter only	Active connection count per backend	Active connection count per backend
Overhead	O(1)	O(n) scan or O(1) with heap	O(1) — sample only two backends
Handles heterogeneous backends	No (use weighted variant)	Yes — fast backends naturally get more	Yes — fast backends attract more connections
Used in	DNS round-robin, Nginx default	HAProxy, most hardware LBs	Nginx upstream, Envoy, Google Maglev

Verdict:Round-robin for stateless, equal-cost requests. Least connections for long-lived or variable-cost requests. Power of Two Choices for the same use case but at much lower overhead at scale.

Concepts Covered

round robinload balancingstatelessleast connectionsdynamic loadconnection trackinghealth checkfailoverlivenessLayer 7HTTP routingcontent-basedconsistent hashingkey affinitycache localitylayer 7 load balancingHTTP proxyrequest routingheader inspectionbackend selectionpath-based routingURL rewritingrouting tableswildcard matchingbackend poolssticky sessionssession affinitycookie-based routingsession persistencestateful servicescircuit breakerfailure thresholdhalf-open stateautomatic recoverycascade preventionrate limitingtoken bucketper-IP limitsper-API-key limitsDDoS protectionleast-connectionsactive connection trackingload-based routingatomic countersvariable request durationsweighted round-robincapacity-based routingbackend weightsheterogeneous clusterstraffic proportionalitypower-of-two-choicesrandomized load balancingleast-connections approximationconstant-time selectionscalabilitycache coherencyminimal disruptionbackend additions/removalsthundering herdcascading failuresexponential backoffgraceful degradationcircuit breaking

Prerequisites

It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.