Tasks
Implement Request Node Cache
Implement a local cache at each request-handling node. When a request comes in: 1. Check if the key exists in the local cache 2. If cache hit and not...
Build Global Cache
Implement a shared cache accessible by all nodes. Instead of each node maintaining its own cache, a dedicated cache server handles all cache operation...
Implement Distributed Cache with Consistent Hashing
Distribute cache entries across multiple cache nodes using consistent hashing. This scales cache capacity beyond a single node while maintaining effic...
Add Eviction Strategies (LRU, TTL)
Implement cache eviction policies to manage limited memory. When the cache is full and a new entry must be added, an eviction policy decides what to r...
Handle Cache Invalidation and Consistency
Implement cache invalidation strategies to maintain consistency between cache and database. When data changes, cached copies must be updated or remove...
Interview Prep
Common interview questions for Backend / Infrastructure Engineer roles that map directly to what you build in this track. Click any question to reveal the model answer.
Model Answer
Cache-aside: read hits cache; miss fetches from DB and populates cache; writes update DB and invalidate cache. Race condition: read misses, fetches from DB, write updates DB and deletes cache key, read populates cache with now-stale value. Mitigation: (1) TTL-based expiry as a backstop, (2) Lease tokens (Facebook's approach): cache issues a token on miss; write invalidates token; if token expired when read tries to populate, it aborts population. (3) Shorter TTLs for frequently-written keys.
Model Answer
Scenario: a popular cached item (trending article, flash sale product) expires. Thousands of requests simultaneously miss the cache and hit the database. The DB is overwhelmed; many requests time out; the system appears to go down even though the underlying data is fine. Solutions: (1) Probabilistic early expiration: as TTL approaches, a percentage of requests refresh the cache before expiry, spreading the load, (2) Lock-based: first miss acquires a lock and fetches from DB; others wait and read the freshly populated value, (3) Stale-while-revalidate: serve stale content while background refresh happens, (4) Longer TTLs with explicit invalidation instead of time-based expiry.
Model Answer
Without resilience: all requests hit the DB, which may be sized for cache-hit rates (e.g., only handling 10% of traffic directly). DB becomes overloaded, requests time out, cascading failure. With resilience: circuit breaker opens after Redis failures exceed threshold, routing all traffic to DB with a reduced feature set. Cache should be a performance optimization, not a reliability dependency for correct operation. Design for cache failure: implement request coalescing, DB connection pool limits, and graceful degradation. Use Redis Sentinel or Cluster for high availability.
Model Answer
Write-through: every write goes to both cache and DB synchronously. Cache is always consistent with DB. Higher write latency (two writes per operation). Safe — no data loss on cache failure. Use when read/write ratio is balanced and you cannot tolerate stale reads. Write-back (write-behind): writes go to cache immediately; DB write is deferred/async. Lower write latency. Risk: data in cache but not DB is lost if cache fails. Use when write throughput is critical and you can tolerate some data loss (metrics, analytics, IoT sensor data). Not suitable for financial data.
Model Answer
Fixed window: INCR key; if count > limit, reject; set TTL = window duration on first INCR. Problem: boundary burst (2x limit at window boundary). Sliding window with sorted set: ZADD key timestamp member; ZREMRANGEBYSCORE key 0 (now - window); ZCARD key; if count < limit, add and proceed. Atomic in Lua script or pipelining. Token bucket with Redis: store (tokens, last_refill_time); on each request, compute tokens to add since last refill, cap at max, subtract 1 if available. Implement with EVALSHA for atomicity. Cloudflare's production rate limiter uses a fixed window with a probabilistic correction for the boundary problem.
Questions are representative of real interview patterns. Model answers are starting points — adapt them with your own experience and the specific context of the interview.
Common Mistakes
The top 5 mistakes builders make in this track — and exactly how to fix them. Click any mistake to see the root cause and the correct approach.
Why it happens
Every request finds the key missing, decides to fetch from origin, and all fire at once before any one of them has repopulated the cache.
The fix
Use probabilistic early expiration (PER) or a lock-based approach: the first miss acquires a lock and fetches the value; other misses wait for the lock holder to populate the cache.
Why it happens
A pure LRU treats a one-time sequential scan the same as a repeated hot key. The scan promotes millions of cold entries to the top of the LRU list, flushing the actual hot set.
The fix
Use LRU-K (LRU based on the K-th most recent access) or a segmented LRU (SLRU) that requires an entry to be accessed twice before it enters the protected segment.
Why it happens
Invalidating a cache entry shared by many clients causes all of them to miss simultaneously and race to reload it.
The fix
Use a "fetch-and-set" atomic primitive (e.g., `SET NX PX` in Redis) to ensure only one client fetches from the origin. Others should serve a slightly stale value while the reload is in flight.
Why it happens
If TTLs are never set or set to very large values, keys accumulate without bound.
The fix
Every key must have a TTL or the cache must enforce a maximum capacity with an eviction policy. Prefer short TTLs for mutable data and longer TTLs for immutable data.
Why it happens
In a write-through design, the write must succeed on both the cache and the database. If the database write fails after the cache write succeeds, the cache holds an uncommitted value.
The fix
Write to the database first, then invalidate (not update) the cache. This "cache-aside" or "write-invalidate" pattern ensures the cache never holds a value the database does not have.
Comparison Mode
Side-by-side comparisons of the approaches, algorithms, and trade-offs you encounter in this track. Expand any comparison to see a detailed breakdown.
| Dimension | Write-Through | Write-Back | Write-Around |
|---|---|---|---|
| How it works | Write to cache and backing store synchronously | Write to cache only; flush to backing store later | Write directly to backing store, bypass cache |
| Write latency | High (waits for backing store) | Low (write to cache only) | Medium (backing store latency, no cache overhead) |
| Data loss risk | None | High — dirty cache lost on crash | None |
| Read-after-write from cache | Yes | Yes | No — first read is a cache miss |
| Cache pollution from write-once data | Yes | Yes | No — large writes do not pollute the cache |
| Best for | Read-heavy, consistency-critical data | Write-heavy workloads (DB buffer pools) | Large batch writes, log storage |
| Dimension | LRU | LFU | ARC |
|---|---|---|---|
| Evicts | Least recently used entry | Least frequently used entry | Dynamically balances recency vs frequency |
| Implementation complexity | Simple (doubly-linked list + hashmap) | Complex (frequency buckets) | Complex (four LRU lists) |
| Handles scan resistance | No — sequential scan evicts hot entries | Yes — hot entries have high frequency | Yes — ghost lists absorb scan traffic |
| Cold start | Good — recency bootstraps quickly | Poor — all new entries have frequency = 1 | Good — recency arm covers cold start |
| Adapts to access pattern changes | Yes (recent entries favoured) | Slow — old high-frequency entries stick around | Yes — adapts T1/T2 split dynamically |
| Used in | Redis (allkeys-lru), most caches | Redis (allkeys-lfu, Redis 4.0+) | ZFS ARC, IBM storage systems |
Concepts Covered
Prerequisites
It is recommended to complete the previous tracks before starting this one. Concepts build progressively throughout the curriculum.