Skill

caching-and-performance

Implements HTTP caching strategies including Cache-Control directives, ETags, conditional requests, CDN caching for APIs, response compression (gzip/Brotli), HTTP/2, and circuit breakers.

REST API

performance

api-development

npx claudepluginhub oborchers/fractional-cto --plugin api-design-principles

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/api-design-principles:caching-and-performance

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Caching is the highest-leverage performance optimization available to any API. A correctly cached response eliminates network latency entirely, reduces server load, and improves perceived performance by orders of magnitude. Every API endpoint must ship with an explicit caching strategy -- omitting `Cache-Control` headers forces browsers into heuristic caching, which leads to unpredictable behav...

SKILL.md

124 lines · ~3k tokens

Similar Skills

HTTP Caching

Configures Cache-Control directives and ETags for HTTP endpoints to reduce redundant network round-trips and origin load.

1 file

harness-claude

managing-api-cache

2.2k

Implements API response caching with Redis, Memcached, cache-aside patterns, TTL, tag-based invalidation, HTTP headers, and stale-while-revalidate for performance optimization.

3 files6 tools

api-cache-manager

caching-strategies

Provides caching patterns like cache-aside, write-through, stampede prevention, CDN headers, multi-level L1/L2/L3 caches, and invalidation strategies for high-traffic systems and CDN design.

5 files3 tools

billy-milligan

Stats

LanguagePython

Parent stars12

Parent forks3

MaintenanceGood

Last CommitMar 14, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

The Fastest Request Is the One You Never Make

Caching is the highest-leverage performance optimization available to any API. A correctly cached response eliminates network latency entirely, reduces server load, and improves perceived performance by orders of magnitude. Every API endpoint must ship with an explicit caching strategy -- omitting Cache-Control headers forces browsers into heuristic caching, which leads to unpredictable behavior. Start with caching, then layer compression, protocol upgrades, and resilience on top.

Cache-Control Directives

Always set Cache-Control explicitly. Never rely on heuristic caching or the deprecated Expires header.

Directive	Scope	When to Use
`public`	Shared + private caches	Response is identical for all users (product catalogs, reference data, schemas)
`private`	Browser only	Response is user-specific (account details, dashboards). Prevents CDN/proxy caching
`no-cache`	All caches	Cache may store but must revalidate with origin before every use. Use when freshness matters but bandwidth savings from 304s are valuable
`no-store`	All caches	Never cache. Use for financial data, PII, balances, transactions
`max-age=N`	All caches	Response is fresh for N seconds. Primary TTL mechanism for all cacheable responses
`s-maxage=N`	Shared caches only	Overrides `max-age` for CDNs and proxies. Use to give CDNs longer TTLs than browsers
`must-revalidate`	All caches	Once stale, the cache must not serve the response without revalidating. Prevents serving stale content after TTL expires
`stale-while-revalidate=N`	All caches	Serve stale for N seconds while revalidating in the background. Best single directive for perceived performance on read-heavy endpoints
`immutable`	All caches	Response body will never change for this URL. Skip conditional revalidation entirely. Use for versioned URLs (schemas, configs with version in path)

Never combine contradictory directives. public, no-store is nonsensical. no-cache and no-store are different -- no-cache still stores and validates; no-store prohibits storage entirely.

Caching Strategy by Data Type

Data Type	Cache-Control	TTL	Rationale
Static reference data (country codes, currencies)	`public, max-age=300, s-maxage=86400, stale-while-revalidate=86400`	CDN: 24h, browser: 5m	Changes rarely, identical for all users
Product catalogs, pricing tiers	`public, max-age=300, s-maxage=3600, stale-while-revalidate=86400`	CDN: 1h, browser: 5m	Changes infrequently, high read volume
User-specific data (accounts, dashboards)	`private, max-age=60, stale-while-revalidate=300`	Browser: 1m	Must never be CDN-cached; use short browser TTLs
List endpoints, search results	`public, max-age=60, s-maxage=300`	CDN: 5m, browser: 1m	Can tolerate short staleness
Financial data, PII, balances	`no-store`	None	Staleness is unacceptable; never cache
Versioned schemas, immutable configs	`public, max-age=31536000, immutable`	1 year	URL contains version; content never changes at that URL
Webhook deliveries, mutations	No caching headers	None	POST/PUT/DELETE must always reach origin

ETags and Conditional Requests

ETags enable conditional requests that save bandwidth without sacrificing freshness. The server generates a unique tag for each response version. On subsequent requests, the client sends If-None-Match with the stored ETag. If unchanged, the server returns 304 Not Modified with no body.

Use strong ETags by default. Generate them from a content hash (SHA-256 of the canonical JSON) or a database version column. Weak ETags (W/"...") are acceptable when minor representation differences (field order, whitespace) should not invalidate the cache.

Pair ETags with no-cache for maximum freshness with bandwidth savings. The combination Cache-Control: no-cache + ETag forces revalidation on every request but returns an empty 304 when nothing changed. GitHub uses this pattern for its entire API -- conditional requests that return 304 do not count against the rate limit.

Support If-Modified-Since as a fallback. Send both ETag and Last-Modified headers. Clients that support ETags use If-None-Match; older clients fall back to If-Modified-Since. When both are present, If-None-Match takes precedence per RFC 9110. Prefer ETags for APIs because Last-Modified has only 1-second granularity and is susceptible to clock skew.

CDN Strategies for APIs

Use CDN edge caching for public, high-traffic, read-heavy endpoints. Never CDN-cache user-specific, financial, or real-time data.

Set the Vary header correctly. Without it, a CDN might serve a gzip-compressed response to a client that cannot decompress it. Common API configurations: Vary: Accept-Encoding for compression negotiation, Vary: Accept, Accept-Encoding for content negotiation, Vary: Accept-Encoding, X-API-Version for versioned APIs.

Never use Vary: Authorization with public caching. This creates a separate cache entry per auth token, driving hit rates to near zero. Use Cache-Control: private for user-specific responses instead.

Build deliberate cache keys. The default CDN cache key is the URL, which is often insufficient for APIs. Include relevant headers (Accept-Encoding, Accept-Language, X-API-Version) in the cache key. Never include the full Authorization header -- hash it or use a tier identifier if the response varies by subscription level.

Use surrogate keys (cache tags) for targeted invalidation. Tag cached responses with resource identifiers (e.g., product-abc123, category-electronics). When a product changes, purge by tag to invalidate both the detail endpoint and every collection that includes it. Fastly, Cloudflare (Cache Tags), and Varnish (xkey) support this pattern.

Prefer stale-while-revalidate over aggressive purging. For most read-heavy APIs, serving slightly stale data for a few seconds while revalidating in the background is preferable to complex real-time purge pipelines.

Compression

Compress every API response larger than 1 KB. Do not compress responses smaller than 1 KB (overhead exceeds savings) or already-compressed content (images, archives).

Support gzip as baseline, Brotli for modern clients. Brotli produces 15-25% smaller payloads than gzip for JSON. Zstandard (zstd) is emerging with even better speed-to-ratio tradeoffs but browser support is limited (Chrome 123+, Firefox 126+).

Negotiate via Accept-Encoding. The client sends Accept-Encoding: zstd, br, gzip and the server responds with the best mutually-supported encoding. Priority order: zstd > br > gzip > identity. Always include Vary: Accept-Encoding so caches store separate copies per encoding.

Compress at the gateway or CDN, not the application. Application servers should send uncompressed responses to the reverse proxy or CDN, which handles compression centrally. This simplifies application code, avoids duplicate configuration across services, and offloads CPU from application servers.

HTTP/2 and HTTP/3

Enable HTTP/2 on every API. HTTP/2 multiplexes multiple requests over a single TCP connection, eliminating the 6-connection-per-origin browser limit from HTTP/1.1. A single TLS handshake serves all concurrent requests. HPACK header compression reduces overhead from repetitive headers like Authorization and Content-Type.

Skip server push for APIs. Server push was designed for preloading related web resources. For APIs, clients know what they need. Server push adds complexity without meaningful benefit and is being deprecated in most implementations.

Adopt HTTP/3 (QUIC) for latency-sensitive APIs. HTTP/3 replaces TCP with QUIC (UDP-based), reducing connection setup to one round trip (zero for resumed connections). It eliminates TCP head-of-line blocking -- one slow stream no longer blocks others. Connection migration allows mobile clients to change networks without reconnecting. All major CDNs support HTTP/3. Advertise support via the Alt-Svc header: Alt-Svc: h3=":443"; ma=86400.

Circuit Breakers

When your API depends on external services, wrap those dependencies in circuit breakers to prevent cascading failures.

Three states:

State	Behavior	Transition
Closed	Normal operation. Requests pass through. Failures are counted within a time window.	Opens when failure count exceeds threshold (e.g., 5 failures in 60 seconds)
Open	All requests fail immediately with a predefined fallback. No calls reach the dependency.	Transitions to Half-Open after a recovery timeout (e.g., 30 seconds)
Half-Open	A limited number of test requests are allowed through.	Closes if test requests succeed. Reopens if they fail.

Set thresholds based on the dependency. Payment services need low thresholds (3-5 failures) and short recovery timeouts. Notification services can tolerate higher thresholds. Always provide a fallback -- queue for retry, return a cached response, or return a degraded result with a clear status.

Combine with retries and exponential backoff. Use full jitter (recommended by AWS): delay = random(0, min(max_delay, base_delay * 2^attempt)). Jitter prevents thundering herd problems when multiple clients retry simultaneously against a recovering service. Honor Retry-After headers when present.

Connection Pooling and Keep-Alive

Reuse TCP connections. Creating a new connection for every request wastes time on TCP handshakes and TLS negotiation.

Use connection pooling in API clients. Every HTTP client library supports connection pools -- requests.Session in Python, https.Agent with keepAlive: true in Node.js, http.Transport with MaxIdleConnsPerHost in Go. A single pooled session should be shared across all requests to the same API.

Configure keep-alive on servers. Set keepalive_timeout to 60-90 seconds and keepalive_requests to 1000+ to allow connection reuse. Maintain a pool of keep-alive connections to upstream backends (e.g., keepalive 32 in Nginx upstream blocks).

Always drain and close response bodies. In languages like Go, failing to read and close the response body prevents the connection from returning to the pool, causing connection leaks that eventually exhaust the pool.

Review Checklist

When reviewing an API for caching and performance:

caching-and-performance

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

caching-and-performance

Popularity

Invocation

Context Preview

SKILL.md

The Fastest Request Is the One You Never Make

Cache-Control Directives

Caching Strategy by Data Type

ETags and Conditional Requests

CDN Strategies for APIs

Compression

HTTP/2 and HTTP/3

Circuit Breakers

Connection Pooling and Keep-Alive

Review Checklist

Similar Skills

Help us improve

The Fastest Request Is the One You Never Make

Cache-Control Directives

Caching Strategy by Data Type

ETags and Conditional Requests

CDN Strategies for APIs

Compression

HTTP/2 and HTTP/3

Circuit Breakers

Connection Pooling and Keep-Alive

Review Checklist