This skill should be used when the user is implementing HTTP caching, configuring Cache-Control headers, using ETags and conditional requests, setting up CDN caching for APIs, implementing response compression, choosing between gzip and Brotli, configuring HTTP/2, or implementing circuit breakers. Covers Cache-Control directives, ETag validation, CDN strategies, compression, and resilience patterns.
npx claudepluginhub oborchers/fractional-cto --plugin api-design-principlesThis skill uses the workspace's default tool permissions.
Caching is the highest-leverage performance optimization available to any API. A correctly cached response eliminates network latency entirely, reduces server load, and improves perceived performance by orders of magnitude. Every API endpoint must ship with an explicit caching strategy -- omitting `Cache-Control` headers forces browsers into heuristic caching, which leads to unpredictable behav...
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Retrieves current documentation, API references, and code examples for libraries, frameworks, SDKs, CLIs, and services via Context7 CLI. Ideal for API syntax, configs, migrations, and setup queries.
Uses ctx7 CLI to fetch current library docs, manage AI coding skills (install/search/generate), and configure Context7 MCP for AI editors.
Caching is the highest-leverage performance optimization available to any API. A correctly cached response eliminates network latency entirely, reduces server load, and improves perceived performance by orders of magnitude. Every API endpoint must ship with an explicit caching strategy -- omitting Cache-Control headers forces browsers into heuristic caching, which leads to unpredictable behavior. Start with caching, then layer compression, protocol upgrades, and resilience on top.
Always set Cache-Control explicitly. Never rely on heuristic caching or the deprecated Expires header.
| Directive | Scope | When to Use |
|---|---|---|
public | Shared + private caches | Response is identical for all users (product catalogs, reference data, schemas) |
private | Browser only | Response is user-specific (account details, dashboards). Prevents CDN/proxy caching |
no-cache | All caches | Cache may store but must revalidate with origin before every use. Use when freshness matters but bandwidth savings from 304s are valuable |
no-store | All caches | Never cache. Use for financial data, PII, balances, transactions |
max-age=N | All caches | Response is fresh for N seconds. Primary TTL mechanism for all cacheable responses |
s-maxage=N | Shared caches only | Overrides max-age for CDNs and proxies. Use to give CDNs longer TTLs than browsers |
must-revalidate | All caches | Once stale, the cache must not serve the response without revalidating. Prevents serving stale content after TTL expires |
stale-while-revalidate=N | All caches | Serve stale for N seconds while revalidating in the background. Best single directive for perceived performance on read-heavy endpoints |
immutable | All caches | Response body will never change for this URL. Skip conditional revalidation entirely. Use for versioned URLs (schemas, configs with version in path) |
Never combine contradictory directives. public, no-store is nonsensical. no-cache and no-store are different -- no-cache still stores and validates; no-store prohibits storage entirely.
| Data Type | Cache-Control | TTL | Rationale |
|---|---|---|---|
| Static reference data (country codes, currencies) | public, max-age=300, s-maxage=86400, stale-while-revalidate=86400 | CDN: 24h, browser: 5m | Changes rarely, identical for all users |
| Product catalogs, pricing tiers | public, max-age=300, s-maxage=3600, stale-while-revalidate=86400 | CDN: 1h, browser: 5m | Changes infrequently, high read volume |
| User-specific data (accounts, dashboards) | private, max-age=60, stale-while-revalidate=300 | Browser: 1m | Must never be CDN-cached; use short browser TTLs |
| List endpoints, search results | public, max-age=60, s-maxage=300 | CDN: 5m, browser: 1m | Can tolerate short staleness |
| Financial data, PII, balances | no-store | None | Staleness is unacceptable; never cache |
| Versioned schemas, immutable configs | public, max-age=31536000, immutable | 1 year | URL contains version; content never changes at that URL |
| Webhook deliveries, mutations | No caching headers | None | POST/PUT/DELETE must always reach origin |
ETags enable conditional requests that save bandwidth without sacrificing freshness. The server generates a unique tag for each response version. On subsequent requests, the client sends If-None-Match with the stored ETag. If unchanged, the server returns 304 Not Modified with no body.
Use strong ETags by default. Generate them from a content hash (SHA-256 of the canonical JSON) or a database version column. Weak ETags (W/"...") are acceptable when minor representation differences (field order, whitespace) should not invalidate the cache.
Pair ETags with no-cache for maximum freshness with bandwidth savings. The combination Cache-Control: no-cache + ETag forces revalidation on every request but returns an empty 304 when nothing changed. GitHub uses this pattern for its entire API -- conditional requests that return 304 do not count against the rate limit.
Support If-Modified-Since as a fallback. Send both ETag and Last-Modified headers. Clients that support ETags use If-None-Match; older clients fall back to If-Modified-Since. When both are present, If-None-Match takes precedence per RFC 9110. Prefer ETags for APIs because Last-Modified has only 1-second granularity and is susceptible to clock skew.
Use CDN edge caching for public, high-traffic, read-heavy endpoints. Never CDN-cache user-specific, financial, or real-time data.
Set the Vary header correctly. Without it, a CDN might serve a gzip-compressed response to a client that cannot decompress it. Common API configurations: Vary: Accept-Encoding for compression negotiation, Vary: Accept, Accept-Encoding for content negotiation, Vary: Accept-Encoding, X-API-Version for versioned APIs.
Never use Vary: Authorization with public caching. This creates a separate cache entry per auth token, driving hit rates to near zero. Use Cache-Control: private for user-specific responses instead.
Build deliberate cache keys. The default CDN cache key is the URL, which is often insufficient for APIs. Include relevant headers (Accept-Encoding, Accept-Language, X-API-Version) in the cache key. Never include the full Authorization header -- hash it or use a tier identifier if the response varies by subscription level.
Use surrogate keys (cache tags) for targeted invalidation. Tag cached responses with resource identifiers (e.g., product-abc123, category-electronics). When a product changes, purge by tag to invalidate both the detail endpoint and every collection that includes it. Fastly, Cloudflare (Cache Tags), and Varnish (xkey) support this pattern.
Prefer stale-while-revalidate over aggressive purging. For most read-heavy APIs, serving slightly stale data for a few seconds while revalidating in the background is preferable to complex real-time purge pipelines.
Compress every API response larger than 1 KB. Do not compress responses smaller than 1 KB (overhead exceeds savings) or already-compressed content (images, archives).
Support gzip as baseline, Brotli for modern clients. Brotli produces 15-25% smaller payloads than gzip for JSON. Zstandard (zstd) is emerging with even better speed-to-ratio tradeoffs but browser support is limited (Chrome 123+, Firefox 126+).
Negotiate via Accept-Encoding. The client sends Accept-Encoding: zstd, br, gzip and the server responds with the best mutually-supported encoding. Priority order: zstd > br > gzip > identity. Always include Vary: Accept-Encoding so caches store separate copies per encoding.
Compress at the gateway or CDN, not the application. Application servers should send uncompressed responses to the reverse proxy or CDN, which handles compression centrally. This simplifies application code, avoids duplicate configuration across services, and offloads CPU from application servers.
Enable HTTP/2 on every API. HTTP/2 multiplexes multiple requests over a single TCP connection, eliminating the 6-connection-per-origin browser limit from HTTP/1.1. A single TLS handshake serves all concurrent requests. HPACK header compression reduces overhead from repetitive headers like Authorization and Content-Type.
Skip server push for APIs. Server push was designed for preloading related web resources. For APIs, clients know what they need. Server push adds complexity without meaningful benefit and is being deprecated in most implementations.
Adopt HTTP/3 (QUIC) for latency-sensitive APIs. HTTP/3 replaces TCP with QUIC (UDP-based), reducing connection setup to one round trip (zero for resumed connections). It eliminates TCP head-of-line blocking -- one slow stream no longer blocks others. Connection migration allows mobile clients to change networks without reconnecting. All major CDNs support HTTP/3. Advertise support via the Alt-Svc header: Alt-Svc: h3=":443"; ma=86400.
When your API depends on external services, wrap those dependencies in circuit breakers to prevent cascading failures.
Three states:
| State | Behavior | Transition |
|---|---|---|
| Closed | Normal operation. Requests pass through. Failures are counted within a time window. | Opens when failure count exceeds threshold (e.g., 5 failures in 60 seconds) |
| Open | All requests fail immediately with a predefined fallback. No calls reach the dependency. | Transitions to Half-Open after a recovery timeout (e.g., 30 seconds) |
| Half-Open | A limited number of test requests are allowed through. | Closes if test requests succeed. Reopens if they fail. |
Set thresholds based on the dependency. Payment services need low thresholds (3-5 failures) and short recovery timeouts. Notification services can tolerate higher thresholds. Always provide a fallback -- queue for retry, return a cached response, or return a degraded result with a clear status.
Combine with retries and exponential backoff. Use full jitter (recommended by AWS): delay = random(0, min(max_delay, base_delay * 2^attempt)). Jitter prevents thundering herd problems when multiple clients retry simultaneously against a recovering service. Honor Retry-After headers when present.
Reuse TCP connections. Creating a new connection for every request wastes time on TCP handshakes and TLS negotiation.
Use connection pooling in API clients. Every HTTP client library supports connection pools -- requests.Session in Python, https.Agent with keepAlive: true in Node.js, http.Transport with MaxIdleConnsPerHost in Go. A single pooled session should be shared across all requests to the same API.
Configure keep-alive on servers. Set keepalive_timeout to 60-90 seconds and keepalive_requests to 1000+ to allow connection reuse. Maintain a pool of keep-alive connections to upstream backends (e.g., keepalive 32 in Nginx upstream blocks).
Always drain and close response bodies. In languages like Go, failing to read and close the response body prevents the connection from returning to the pool, causing connection leaks that eventually exhaust the pool.
When reviewing an API for caching and performance:
Cache-Control header -- no heuristic cachings-maxage and stale-while-revalidate for CDN cachingCache-Control: private; sensitive data uses no-storeIf-None-Match is handled with 304Vary header is set correctly for endpoints with content negotiation or compressionAlt-Svc header advertises HTTP/3 if supportedRetry-After headers