The Cache Stampede Most Engineers Don't Know They Have

The cache-aside pattern works great — until your TTL fires at peak load and hundreds of requests simultaneously miss and hammer your database. This is the cache stampede. Here's why it's hard to catch, why naive fixes don't work, and three architectural solutions with real tradeoffs.

The Mistake: Cache-Aside With Synchronous Rebuild

The cache-aside pattern looks like this:

def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.set(f"user:{user_id}", json.dumps(user), ex=300)
    return user

Correct. Standard. Works fine — until your TTL fires during a traffic spike and 400 concurrent requests all see a cache miss in the same 10ms window.

Every one of those 400 requests hits the database. Every one computes the same result. Every one writes back the same value. You just sent 400x the expected load to the database your cache was supposed to protect. This is a **cache stampede** — sometimes called the thundering herd — and it arrives at exactly the worst possible moment: maximum traffic.

Why the Naive Fixes Miss the Point

The instinct is to set a longer TTL. That defers the problem. At some traffic level, the stampede still hits when the key expires. The problem isn't the TTL value — it's that all requesters race to rebuild simultaneously.

What makes this hard to catch:

It rarely reproduces in staging. You need enough concurrent load to trigger it, and test environments almost never have it.
DB latency spikes during the stampede, making rebuilds take longer, making more requests pile up in the miss window, making it worse.
If you seeded your cache at startup with a uniform TTL, every key expires at the same time. Coordinated stampede.

Some teams add a circuit breaker on the database and wonder why traffic looks like it's exploding before falling off a cliff. That cliff is the stampede resolution — not recovery.

Three Fixes, Three Tradeoffs

Mutex Lock: One Rebuilder, Others Wait

When a cache miss occurs, one request acquires a distributed lock and rebuilds the value. Others spin-wait and retry.

def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    lock_key = f"lock:user:{user_id}"
    if redis.set(lock_key, "1", nx=True, ex=10):
        user = db.query("SELECT * FROM users WHERE id = ?", user_id)
        redis.set(f"user:{user_id}", json.dumps(user), ex=300)
        redis.delete(lock_key)
        return user
    else:
        time.sleep(0.05)
        return get_user(user_id)

**What you get:** No stampede. One rebuild in flight per key.

**What you give up:** When your database is slow — exactly when this matters most — every waiting request queues up behind the lock holder. You traded a flood for a pile-up. The naive time.sleep retry also adds latency for every waiter. Reserve this pattern for expensive one-off computations where you genuinely want only one rebuild at a time: report generation, ML inference, aggregation pipelines. Always back it with a stale-serving fallback so a slow lock holder doesn't freeze your API.

Probabilistic Early Expiration (XFetch)

Rather than rebuilding after expiry, rebuild probabilistically *before* expiry. The probability of triggering an early rebuild increases as the remaining TTL shrinks toward zero.

import math, random, time

def get_user_xfetch(user_id, beta=1.0):
    data = redis.get(f"user:{user_id}")
    if data:
        payload = json.loads(data)
        remaining_ttl = redis.ttl(f"user:{user_id}")
        delta = payload.get("delta", 0.001)

        if -delta * beta * math.log(random.random()) >= remaining_ttl:
            user = db.query("SELECT * FROM users WHERE id = ?", user_id)
            redis.set(f"user:{user_id}", json.dumps({**user, "delta": delta}), ex=300)
            return user
        return payload

    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.set(f"user:{user_id}", json.dumps({**user, "delta": 0.001}), ex=300)
    return user

**What you get:** No locks, no waiting, no coordinated misses. Rebuilds spread stochastically across the TTL window — different requests trigger them at different times so no two clients race. This algorithm is from Vattani et al.'s "Optimal Probabilistic Cache Stampede Prevention" and is provably optimal under reasonable assumptions about rebuild cost.

**What you give up:** Some extra DB reads before strictly necessary. For cheap queries this is essentially free. For expensive ones, lower beta to be less aggressive. This is the right default for most CRUD endpoints where rebuild cost is moderate and freshness matters.

Stale-While-Revalidate: Return Fast, Refresh in Background

Serve whatever is cached immediately. If it's past a logical expiry, kick off a background refresh and let the next request get fresh data.

def get_user_stale(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        payload = json.loads(cached)
        if payload.get("expires_at", 0) < time.time():
            queue.enqueue("refresh_user_cache", user_id)
        return payload

    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    entry = {**user, "expires_at": time.time() + 300}
    redis.set(f"user:{user_id}", json.dumps(entry), ex=600)
    return user

Two TTLs at play: the logical expires_at at 300s controls when to enqueue a refresh; the Redis key TTL at 600s keeps stale data available while the worker catches up. The gap between them is your staleness budget.

**What you get:** Best latency — users never wait on a rebuild. No lock contention, no spinning.

**What you give up:** Consistency. Users can see data that's minutes stale if the background queue is backed up or the worker is slow. Never use this for inventory counts, payment status, session data, or anything where the user just made a write and expects to see it reflected.

What to Actually Monitor

Whichever pattern you pick, you need these signals:

**Cache hit rate per key prefix.** A sudden drop is either cold start or a bug. Hot keys should sit above 95%.
**Rebuild duration.** If your DB query takes 500ms and you're using a mutex, every waiting request pays 500ms plus sleep overhead. Know this number before choosing a pattern.
**DB query rate vs. cache miss rate.** These should track together. If DB queries spike while the miss rate stays flat, you have a stampede in progress.
**TTL distribution.** Are keys expiring in clusters? Add jitter:

base_ttl = 300
redis.set(key, value, ex=base_ttl + random.randint(-30, 30))

Ten percent jitter costs nothing and breaks up coordinated expiry. Do this regardless of which stampede prevention pattern you choose — jitter is the cheapest fix in the stack.

Pick One

For most apps: **stale-while-revalidate**. Lowest latency, simplest to reason about, good enough for profiles, content, and config data that changes infrequently.

For data that needs to stay fresh: **XFetch**. No waiting, proven semantics, works under load without coordination overhead.

For expensive one-off computation: **mutex with a stale fallback**. You genuinely want a single rebuild at a time — just don't block forever if the lock holder crashes.

The wrong answer is a longer TTL and wishful thinking. At some traffic level, your cache expiry will coincide with a traffic spike. The stampede doesn't care that you didn't plan for it.