From harness-claude
Guides API retry strategies: classifies transient vs permanent errors, emits Retry-After headers, implements exponential backoff with jitter, ensures idempotency. Use for rate-limiting, 429/503 responses, client SDKs.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> RETRY GUIDANCE SIGNALS CLIENTS WHEN AND HOW TO RETRY FAILED REQUESTS — CLASSIFYING ERRORS AS TRANSIENT OR PERMANENT, EMITTING Retry-After HEADERS, AND REQUIRING IDEMPOTENCY FOR SAFE RETRIES PREVENTS BOTH THUNDERING-HERD AMPLIFICATION AND UNNECESSARY REQUEST ABANDONMENT UNDER TEMPORARY LOAD.
Implements configurable retry strategies with exponential backoff and jitter for transient failures in network requests, database connections, and rate-limited APIs. Includes TypeScript utility function.
Generates code, configurations, and best practices for retry logic in API integrations covering third-party APIs, webhooks, SDKs, OAuth. Activates on retry queries.
Implements API error handling with standardized responses, logging, monitoring, retry logic, circuit breakers, and validation patterns for Node.js and Python APIs.
Share bugs, ideas, or general feedback.
RETRY GUIDANCE SIGNALS CLIENTS WHEN AND HOW TO RETRY FAILED REQUESTS — CLASSIFYING ERRORS AS TRANSIENT OR PERMANENT, EMITTING Retry-After HEADERS, AND REQUIRING IDEMPOTENCY FOR SAFE RETRIES PREVENTS BOTH THUNDERING-HERD AMPLIFICATION AND UNNECESSARY REQUEST ABANDONMENT UNDER TEMPORARY LOAD.
503 Service Unavailable response that lacks a Retry-After header429 Too Many Requests or 503 Service Unavailable for capacity-related refusalsTransient vs. permanent errors — A transient error is a temporary condition that may resolve without client-side changes: network timeout, service overload, brief database unavailability. A permanent error will not resolve on retry without action: invalid credentials, missing resource, malformed request. Retrying a permanent error wastes resources and delays diagnosis. Classifying errors correctly in the response allows clients to decide immediately whether to retry. As a rule: 4xx errors (except 429) are permanent — they require the client to change the request. 5xx errors (except 501) are potentially transient — the server, not the request, is the problem.
Retry-After header — The HTTP standard header that tells clients when they may safely retry a request. It accepts two formats:
Retry-After: 60 — retry after 60 seconds from now.Retry-After: Fri, 11 Apr 2026 09:00:00 GMT — retry after this absolute timestamp.
Retry-After is mandatory for 429 Too Many Requests and strongly recommended for 503 Service Unavailable. Servers that omit it on 429 responses leave clients to implement their own backoff, producing unpredictable retry storms.Exponential backoff — A retry strategy where the wait time doubles after each failed attempt: 1s, 2s, 4s, 8s, 16s, up to a configured maximum. Exponential backoff reduces the probability that all retrying clients hit the server at the same moment. The base interval and multiplier should be configurable. Backoff should respect the Retry-After header as a floor — never retry before the server-specified delay, regardless of the computed backoff value.
Jitter — Randomization added to the backoff interval to desynchronize retry storms. Without jitter, all clients that received the same 429 at the same moment compute the same retry interval and submit simultaneously. With full jitter (sleep(random_between(0, computed_backoff))), retries are spread across the backoff window, dramatically reducing peak retry load. AWS's builders library recommends "decorrelated jitter" as the most effective pattern for high-concurrency workloads.
429 Too Many Requests vs 503 Service Unavailable — These are commonly confused:
429 — The server is healthy, but this client has exceeded its rate limit. The server can still serve other clients. The issue is client-specific. Retry after Retry-After elapses; the same client will succeed if the rate is respected.503 — The server is temporarily unavailable to all clients. The issue is server-wide: maintenance, overload, deployment in progress. Retry after Retry-After elapses; success is not guaranteed even after waiting (the outage may continue). Use 503 for circuit-breaker open states; use 429 for rate limiting.AWS API Gateway rate-limiting and retry patterns in production:
Rate limit exceeded (429 with Retry-After):
GET /v1/metrics?start=2026-01-01&end=2026-04-01
Authorization: Bearer tok_...
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1744271400
{
"type": "https://api.example.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded 100 requests per minute. Retry after 30 seconds.",
"instance": "/errors/correlation/f1a2-b3c4",
"limit": 100,
"remaining": 0,
"reset": 1744271400
}
The client reads Retry-After: 30 and waits at least 30 seconds. X-RateLimit-Reset provides the absolute timestamp for clients that prefer UTC synchronization. The body echoes the policy for logging and debugging.
Service temporarily unavailable (503 with Retry-After):
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 120
{
"type": "https://api.example.com/errors/service-unavailable",
"title": "Service Temporarily Unavailable",
"status": 503,
"detail": "The service is undergoing scheduled maintenance. Expected recovery in 2 minutes.",
"instance": "/errors/correlation/d5e6-f7a8"
}
Exponential backoff with jitter (pseudo-code):
import random, time
def retry_with_backoff(fn, max_attempts=5, base_delay=1.0, max_delay=60.0):
for attempt in range(max_attempts):
response = fn()
if response.status_code not in (429, 503):
return response # success or permanent error — stop retrying
# Respect server-specified Retry-After as a floor
retry_after = float(response.headers.get("Retry-After", 0))
# Compute exponential backoff with full jitter
computed = min(base_delay * (2 ** attempt), max_delay)
jittered = random.uniform(0, computed)
wait = max(retry_after, jittered)
time.sleep(wait)
raise MaxRetriesExceeded(f"Failed after {max_attempts} attempts")
The critical line is wait = max(retry_after, jittered) — the server's Retry-After is always respected as a minimum, and jitter is added on top of it to desynchronize concurrent retrying clients.
Omitting Retry-After from 429 responses. Without Retry-After, clients have no authoritative signal for when to retry. Common outcomes: aggressive retrying every few hundred milliseconds (worsening the rate-limit breach), or backing off so conservatively that legitimate requests are delayed for minutes. Fix: always include Retry-After on 429 responses, set to the actual reset window in seconds.
Retrying 4xx errors other than 429. A 400, 401, 403, 404, or 422 response will not succeed on retry without a change to the request. Retrying them wastes quota, delays error propagation, and produces confusing logs. Fix: only retry on 429 (rate limit) and 5xx (server fault) responses. All other 4xx codes should surface as immediate errors to the caller.
Exponential backoff without jitter. Synchronized backoff (all clients waiting exactly 2, 4, 8 seconds) produces retry bursts at each interval boundary. If 1000 clients all received 503 at the same moment, they will all retry at T+2s, T+4s, T+8s — each interval triggers a new overload spike. Fix: add randomized jitter to desynchronize the retry distribution across the backoff window.
Using 503 for per-client rate limiting. Returning 503 Service Unavailable when a specific client exceeds its rate limit misleads other clients (and monitoring systems) into thinking the server is globally down. It also prevents clients from distinguishing "my rate limit" from "server outage" — different actions are required. Fix: use 429 for client-specific rate limits and 503 for server-wide unavailability.
Retrying a request that is not idempotent may produce duplicate side effects: a charge processed twice, a message sent twice, a record created twice. Before implementing retry logic, classify the endpoint:
GET, HEAD, OPTIONS — retry freely.PUT, DELETE — retry is safe if the server implements idempotency correctly.POST — requires explicit idempotency keys (see api-idempotency-keys) to retry safely.Clients must never retry a POST without an idempotency key unless the server documents that the endpoint is safe to re-invoke (e.g., a pure query wrapped in POST).
Server-side Retry-After signals can drive client-side circuit breakers. When consecutive 503 responses arrive with Retry-After headers, the circuit breaker should open for at least the server-specified duration. When the Retry-After window expires, a single probe request determines whether to close the circuit. This prevents the thundering-herd problem where all clients simultaneously probe the recovering service.
Stripe's client libraries implement a retry strategy that the Stripe engineering team has published: up to 2 automatic retries on 429 and 5xx responses, with exponential backoff starting at 0.5 seconds and capped at 2 seconds, with Retry-After honored as a floor. Stripe found in production analysis that the combination of (a) emitting accurate Retry-After values, (b) client jitter, and (c) idempotency keys on all mutating requests reduced duplicate charge incidents by over 95% compared to client implementations that retried blindly on any error. The key insight: the server's Retry-After signal is the coordination mechanism that transforms a retry storm into a smooth recovery curve.
429, 5xx) or permanent (4xx except 429) and document the classification in the API reference.Retry-After headers to all 429 responses, set to the actual rate-limit reset window in seconds.Retry-After headers to 503 responses when the expected recovery time is known; omit when the outage duration is unknown.POST endpoints that may be retried support idempotency keys (see api-idempotency-keys).harness validate to confirm skill files are well-formed and cross-references are correct.429 responses include a Retry-After header set to the rate-limit reset window in seconds.429 is used for client-specific rate limits; 503 is used for server-wide unavailability — never interchanged.429 and 5xx responses; 4xx errors (except 429) are surfaced immediately.POST endpoints require idempotency keys before retries are safe.