From backend-engineer
Deliberate caching: strategy, invalidation, stampede defense.
How this skill is triggered — by the user, by Claude, or both
Slash command
/backend-engineer:caching-strategyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A cache trades freshness for speed. Add one only when you can answer two questions up front: how does this data get read and written, and how does a stale entry get corrected. Get those wrong and a cache turns a slow system into a fast wrong system.
A cache trades freshness for speed. Add one only when you can answer two questions up front: how does this data get read and written, and how does a stale entry get corrected. Get those wrong and a cache turns a slow system into a fast wrong system.
The read/write shape dictates the pattern. Pick one per data set.
| Pattern | How It Works | Fits | Cost |
|---|---|---|---|
| Cache-aside (lazy) | App reads cache; on miss, reads DB and populates. App writes go to DB and invalidate the key. | Read-heavy, miss-tolerant data. The default. | First read after a miss is slow. |
| Read-through | Cache itself loads from the DB on a miss, behind one interface. | Same as cache-aside when a library/proxy owns loading. | Couples you to the cache provider's loader. |
| Write-through | Writes hit the cache and the DB together, synchronously. | Consistency-critical data — cache never lags the DB. | Higher write latency; every write pays cache + DB. |
| Write-behind | Writes hit the cache, flush to the DB asynchronously. | Write-heavy, loss-tolerant data (counters, metrics). | A crash before flush loses data. Never for financial / transactional data. |
Default to cache-aside. Reach for write-through only when readers must never see a value older than the last write. Reach for write-behind only when losing the last few writes is acceptable.
Invalidation is the hard part — design it before you cache a single key. Three mechanisms:
user:42:v7); bump the version to make all old keys unreachable. Good for bulk invalidation without scanning.Tune the TTL to the data's tolerance for staleness. Too long and readers see stale values long after a change; too short and the hit rate collapses — the cache fills, expires, and refetches before it ever earns its keep, so it becomes pure overhead. When in doubt, start with a TTL and layer event-driven invalidation only on the keys that demand accuracy.
When a hot key expires, every concurrent reader misses at once and stampedes the DB with identical queries — a cache stampede (thundering herd) that can take the database down precisely when load is highest. Two defenses:
Apply protection to hot keys specifically; cold keys rarely stampede and the bookkeeping is not free.
When the cache hits its memory limit it evicts to make room. The policy must match the workload or it will throw out exactly the data you need under load.
Caches stack, each with different latency and consistency. Use the layer closest to the consumer that can still serve correct data.
| Layer | Holds | Notes |
|---|---|---|
| Browser | Per-user responses, assets | Cache-Control + ETag for revalidation. |
| Edge / CDN | Static and semi-dynamic responses | Shared across users — lowest latency, weakest freshness control. |
| Application | Sessions, computed results | Redis or in-process; where most deliberate caching lives. |
| Database query cache | Repeated query results | Closest to source; smallest staleness window. |
Never cache user-specific or sensitive data at a shared layer (edge/CDN). A shared cache serves one user's response to another. Mark authenticated responses Cache-Control: private so only the user's own browser stores them.
function getUser(id):
key = "user:" + id
value = cache.get(key)
if value is not null:
return value
lockKey = "lock:" + key
if cache.acquireLock(lockKey, ttl=5s):
try:
value = db.query("SELECT * FROM users WHERE id = ?", id)
cache.set(key, value, ttl=300s)
return value
finally:
cache.releaseLock(lockKey)
else:
wait(50ms)
return getUser(id)
The first caller on a miss holds the lock and refreshes; concurrent callers wait briefly and re-read instead of all hitting the DB. One query per expiry, not thousands.
For the deeper judgment of measuring the bottleneck before scaling — whether a cache is even the right fix — defer to scalable-architecture.
Cache-Control: private.npx claudepluginhub shoto290/shoto --plugin backend-engineerGuides test-driven development for Django applications using pytest-django, factory_boy, and Django REST Framework. Covers red-green-refactor workflow, conftest fixtures, and coverage reporting.