Subtracks & Tasks
Layer 4 Load Balancing
Implement Round Robin Load Balancer
Implement round robin load balancing: 1. Maintain an ordered list of backend servers 2. Track current server index 3. For each request, send to serve...
Implement Least Connections Algorithm
Implement least-connections load balancing: 1. Track active connection count for each server 2. When a request starts, increment count for chosen ser...
Add Health Checks and Failover
Add health checking to your load balancer: 1. Periodically send health check requests to servers 2. Track consecutive failures per server 3. Mark ser...
Build Layer 7 Load Balancer
Build an HTTP-aware (Layer 7) load balancer: 1. Parse HTTP request (method, path, headers) 2. Route based on Host header (virtual hosts) 3. Route bas...
Implement Consistent Hashing for Load Balancing
Implement consistent hashing for stateful load balancing: 1. Assign servers to positions on hash ring 2. Hash each request key to ring position 3. Ro...
Layer 7 Load Balancing
Implement Layer 7 HTTP Proxy
Layer 7 load balancing operates at the HTTP application layer, inspecting headers and URL paths to make routing decisions. Unlike Layer 4 (TCP), L7 pr...
Implement Path-Based Routing
Path-based routing directs requests to different backend pools based on the URL path. This enables microservices architecture where /api/* routes to A...
Implement Sticky Sessions
Sticky sessions (session affinity) ensure that all requests from a client are routed to the same backend server, essential for stateful services. **W...
Implement Circuit Breaking
**Circuit states**:. ```. CLOSED (normal). - Requests pass through to backend. - Track failures in a sliding window. - If failures > threshold → OPEN....
Implement Rate Limiting
Rate limiting protects backend services from being overwhelmed by too many requests. It prevents abuse, ensures fair usage, and maintains service avai...
Advanced Balancing Algorithms
Implement Least-Connections Load Balancing
Least-connections load balancing routes each request to the backend with the fewest active connections. This is superior to round-robin when request d...
Implement Weighted Round-Robin Load Balancing
**Why weighted round-robin?**. ```. Problem: backends have different capacities. backend-1: 8 cores, 32GB RAM (high capacity). backend-2: 8 cores, 32G...
Implement Power-of-Two-Choices Load Balancing
Power-of-two-choices is a randomized load balancing algorithm that approximates least-connections with constant-time selection. Instead of checking al...
Implement Consistent Hashing for Load Balancing
Consistent hashing for load balancing ensures that the same client always routes to the same backend. This is useful for caching layers where session ...
Simulate Thundering Herd with Circuit Breaking
The thundering herd problem occurs when a large number of clients simultaneously retry after a backend failure, overwhelming the remaining backends an...
Interview Prep
Common interview questions for Infrastructure / Platform Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
Round-robin: equal distribution, works well when requests have similar cost and servers have similar capacity. Simple and predictable. Least-connections: better when request durations vary significantly (e.g., WebSocket connections alongside short HTTP requests) — avoids overloading servers with long-lived connections. Random: surprisingly effective when all servers are identical — avoids coordination overhead and has good behavior under simultaneous burst load. Netflix uses weighted random. Power of Two Choices (P2C) — pick two random servers, route to the one with fewer connections — achieves near-optimal balance with O(1) overhead.
Model Answer
Options: (1) Consistent hashing on user ID or session ID — the same user always maps to the same backend without any state in the LB, (2) Signed cookie containing the backend server ID — the LB reads the cookie and routes accordingly, no server-side state, (3) JWT tokens that contain all session state — any backend can validate and use them, removing the need for affinity entirely. Option 3 is the most scalable: eliminate the statefulness rather than routing around it. IP hash is the least reliable (NAT, mobile IPs change).
Model Answer
Possible causes: (1) Health check is too shallow — it checks if the port is open, not whether the backend is actually healthy (the DB connection pool could be exhausted). Improve health check to exercise actual dependencies. (2) Slow backends — the LB routes traffic to a server that passes health checks but is running slowly (GC pause, I/O saturation). Use latency-aware routing or least-response-time. (3) Partial failures — some routes fail, not all. The health check hits a healthy endpoint. (4) Race condition — backend is being drained/restarted, passes one health check but fails during the drain window. Use pre-stop hooks and connection draining.
Model Answer
L4 (transport layer): routes based on IP/TCP/UDP. Cannot see HTTP content. Extremely fast, minimal latency overhead, handles any TCP protocol. Use for: raw throughput workloads, non-HTTP protocols, DDoS mitigation (AWS NLB). L7 (application layer): inspects HTTP headers, URL, cookies. Enables content-based routing, header rewriting, JWT validation, A/B testing. Higher computational overhead per request. Use for: HTTP APIs, microservices routing, SSL termination with certificate management, WebSocket upgrades. Modern stacks often use both: L4 at the edge for DDoS/high-throughput, L7 inside the cluster for intelligent routing.
Model Answer
Rolling deployment: bring up new version instances; add to LB pool; drain old instances (stop new connections, wait for in-flight to complete); remove old instances from pool; shut down. Key steps: (1) Liveness vs readiness probes — do not add an instance to the pool until it passes the readiness check, (2) Pre-stop hook — allow a grace period before termination to drain in-flight requests, (3) Connection draining — LB waits for existing connections to complete before marking the backend as removed, (4) Health check passes before traffic — new instance serves a few test requests before receiving full traffic (can combine with canary or blue-green). Kubernetes handles this via readinessProbe and terminationGracePeriodSeconds.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
Round-robin distributes requests equally by count, not by backend capacity. A server twice as powerful should receive twice as many requests.
The fix
Use weighted round-robin or least-connections. Assign weights proportional to backend capacity, or route new requests to the backend with the fewest active connections.
Why it happens
Sticky sessions route a user to the same backend for the lifetime of their session. If that backend dies, the session is gone because it was only stored in that server's memory.
The fix
Store session state in a shared, external store (Redis, a database) rather than in-process memory. Then any backend can serve any session, and sticky sessions are no longer necessary.
Why it happens
If health checks run every 30 seconds and a backend fails immediately after a check, traffic flows to it for almost a full interval.
The fix
Use active health checks at 2-5 second intervals. Combine with passive health checking: remove a backend from rotation immediately when it returns 5xx errors above a threshold.
Why it happens
Abruptly closing a backend drops all TCP connections it is serving, including in-flight requests.
The fix
Implement graceful drain: stop sending new requests to the backend, wait for in-flight requests to complete (with a timeout), then remove it.
Why it happens
A single load balancer instance is itself a SPOF even though it is supposed to provide high availability for backends.
The fix
Run at least two load balancer instances in active-passive or active-active mode. Use anycast routing, DNS failover, or a floating IP (VRRP/keepalived) to route around a failed instance.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | Round-Robin | Least Connections | Power of Two Choices |
|---|---|---|---|
| How it selects | Cycles through backends in order | Picks backend with fewest active connections | Randomly picks 2 backends; routes to the one with fewer connections |
| Request duration sensitivity | None — ignores connection duration | High — routes away from slow backends | High — probabilistically avoids slow backends |
| State required at LB | Counter only | Active connection count per backend | Active connection count per backend |
| Overhead | O(1) | O(n) scan or O(1) with heap | O(1) — sample only two backends |
| Handles heterogeneous backends | No (use weighted variant) | Yes — fast backends naturally get more | Yes — fast backends attract more connections |
| Used in | DNS round-robin, Nginx default | HAProxy, most hardware LBs | Nginx upstream, Envoy, Google Maglev |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.