From harness-claude
Designs reliable webhook systems for APIs covering registration, payload schema/versioning, at-least-once delivery, retries, idempotency, and fan-out. Use for new platforms, audits, or developer docs.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> WEBHOOKS ARE A PUSH-BASED CONTRACT — REGISTRATION, PAYLOAD SCHEMA, DELIVERY GUARANTEES, AND RETRY POLICIES ARE ALL CONSUMER-FACING COMMITMENTS THAT DETERMINE WHETHER INTEGRATIONS REMAIN RELIABLE UNDER PARTIAL FAILURES, AND DESIGNING THESE PROPERTIES EXPLICITLY UPFRONT PREVENTS AN AD-HOC SYSTEM THAT FAILS SILENTLY AT THE WORST POSSIBLE MOMENT.
Guides webhook design, inbound handling with HMAC verification and idempotency, outbound delivery with retries, circuit breakers, and dead letter queues.
Implements reliable webhook delivery in TypeScript with HMAC-SHA256 signature verification, exponential backoff retries with jitter, queuing, and failure handling for push notifications to external services.
Implements webhook systems with retry logic, signature verification, delivery guarantees, and dead-letter handling for event-driven integrations and notifications.
Share bugs, ideas, or general feedback.
WEBHOOKS ARE A PUSH-BASED CONTRACT — REGISTRATION, PAYLOAD SCHEMA, DELIVERY GUARANTEES, AND RETRY POLICIES ARE ALL CONSUMER-FACING COMMITMENTS THAT DETERMINE WHETHER INTEGRATIONS REMAIN RELIABLE UNDER PARTIAL FAILURES, AND DESIGNING THESE PROPERTIES EXPLICITLY UPFRONT PREVENTS AN AD-HOC SYSTEM THAT FAILS SILENTLY AT THE WORST POSSIBLE MOMENT.
Webhook registration and lifecycle — Consumers register a publicly reachable HTTPS endpoint (the "webhook URL") with the provider, along with an event type filter (e.g., payment.succeeded, order.created). The registration should store: the endpoint URL, the event types subscribed, a shared signing secret, and metadata for auditing (created by, created at). Provide a management API or dashboard UI to create, list, test, pause, and delete registrations. GitHub's webhook management API (POST /repos/{owner}/{repo}/hooks) is the canonical reference: consumers can subscribe per-repository, per-organization, or at the App level.
Payload design and versioning — Every webhook payload must include: an id (unique delivery ID for deduplication), an event field (event type string), a created timestamp (ISO 8601), an api_version field (the schema version that generated the payload), and a data object (the event subject). Never omit the api_version — consumers need it to route the payload to the correct handler when you introduce breaking schema changes. Stripe's payload envelope is the industry standard: { "id": "evt_xxx", "object": "event", "api_version": "2024-04-10", "created": 1712800000, "type": "payment_intent.succeeded", "data": { "object": { ... } } }.
At-least-once delivery and consumer idempotency — Webhook systems guarantee at-least-once delivery by design: a delivery is considered successful only when the consumer returns a 2xx response within the timeout window (typically 5–30 seconds). Any non-2xx response or timeout triggers a retry. Consequently, consumers will occasionally receive the same event more than once. Document this guarantee explicitly, and instruct consumers to use the delivery id field as an idempotency key to deduplicate retried deliveries. Never promise exactly-once delivery unless you have a distributed transaction mechanism backing it.
Retry policy with exponential backoff — Failed deliveries (non-2xx or timeout) must be retried with exponential backoff: attempt 1 at T+1 min, attempt 2 at T+5 min, attempt 3 at T+30 min, attempt 4 at T+2 h, attempt 5 at T+5 h. After the maximum retry count (commonly 5–10 attempts over 24–72 hours), mark the delivery as failed and optionally surface an alert to the webhook owner. GitHub retries for 3 days with exponential backoff. Stripe retries for 3 days across up to 17 attempts with escalating intervals.
Ordering guarantees and fan-out — Webhooks are not guaranteed to arrive in the order events occurred. Network and retry interactions can invert order. Tell consumers to use the created timestamp to establish event ordering rather than arrival order. For fan-out (one internal event dispatched to N registered endpoints), use an async queue per subscriber so a slow or failing subscriber cannot block delivery to other subscribers. Each subscriber's queue is independent; backpressure on one does not affect others.
Endpoint health and circuit breaking — Track consecutive delivery failures per endpoint. After a configurable threshold (e.g., 5 consecutive failures or >50% failure rate over 1 hour), automatically pause the endpoint and notify the webhook owner. A paused endpoint stops receiving new deliveries; the owner must investigate and re-enable it. Sending to a consistently failing endpoint wastes resources and can cause queue buildup. GitHub auto-disables webhooks that fail for 7 consecutive days.
GitHub Webhooks — repository push event
Register a webhook on a repository:
POST /repos/acme/payments/hooks
Authorization: Bearer ghp_xxx
Content-Type: application/json
{
"name": "web",
"active": true,
"events": ["push", "pull_request"],
"config": {
"url": "https://ci.acme.com/hooks/github",
"content_type": "json",
"secret": "s3cr3t_abc123",
"insecure_ssl": "0"
}
}
→ HTTP/1.1 201 Created
{
"id": 12345678,
"type": "Repository",
"active": true,
"events": ["push", "pull_request"],
"config": { "url": "https://ci.acme.com/hooks/github", "content_type": "json" },
"created_at": "2024-04-10T12:00:00Z"
}
Incoming delivery payload:
{
"ref": "refs/heads/main",
"before": "abc123",
"after": "def456",
"repository": { "id": 99887766, "name": "payments", "full_name": "acme/payments" },
"pusher": { "name": "alice", "email": "alice@acme.com" },
"commits": [{ "id": "def456", "message": "fix: handle nil pointer in checkout" }]
}
Delivery headers:
X-GitHub-Event: push
X-GitHub-Delivery: 72d3162e-cc78-11e3-81ab-4c9367dc0958
X-Hub-Signature-256: sha256=abc123...
Consumer responds with 200 OK and an empty body — GitHub marks the delivery as successful. The X-GitHub-Delivery UUID is the idempotency key consumers use to deduplicate retries.
Fan-out architecture:
Internal Event Bus
│
▼
Dispatcher Service
│
┌───┴───────────────────────┐
▼ ▼
Queue: endpoint-A Queue: endpoint-B
│ │
▼ ▼
Worker → POST /hook/A Worker → POST /hook/B
Each subscriber queue is independent. A 30-second timeout on endpoint-B does not delay endpoint-A's delivery.
Synchronous delivery in the request path. Firing the webhook HTTP call synchronously during the transaction that generated the event couples event delivery latency to the consumer's endpoint response time. A slow consumer endpoint (or network partition) can cause your transaction to time out or roll back. Always dispatch webhook deliveries asynchronously via a queue after the originating transaction commits.
No delivery id in the payload. Without a unique delivery identifier, consumers cannot implement idempotency. Every retry looks identical to the first delivery. The consumer has no reliable way to detect or handle duplicates. Include a UUID delivery ID in every payload regardless of whether the consumer currently uses it.
Fixed retry interval (no backoff). Retrying at a fixed 1-minute interval after a consumer endpoint goes down hammers the endpoint with traffic exactly when it is least able to respond — during an outage or deployment. Exponential backoff with jitter reduces thundering-herd behavior and gives the consumer time to recover before the next attempt.
Delivering to HTTP endpoints. Allowing webhook registrations to plain HTTP (non-TLS) URLs exposes the payload — which may contain sensitive business data — to network interception. Require HTTPS for all webhook URLs and reject registrations that specify HTTP. Surface a clear validation error: "config.url must use HTTPS".
No test delivery mechanism. Consumers integrating a webhook system need a way to trigger a test delivery without generating a real event. Without it, they must create real objects in production to test their handler. Provide a POST /hooks/{id}/test endpoint that sends a sample payload and returns the delivery result.
A consistent payload envelope across all event types reduces the cognitive load on consumers who handle multiple event types. The envelope fields (id, event, created, api_version, data) are stable; only data varies by event type. This pattern enables consumers to write generic middleware that extracts the envelope before routing to event-specific handlers.
Consider publishing a JSON Schema or OpenAPI schema for each event type's data payload. Stripe publishes event object schemas in their OpenAPI spec; consumers can generate typed client models from them. When you evolve a payload (rename a field, add a required property), bump api_version and publish a changelog.
Stripe processes billions of webhook deliveries per month. Their published reliability figures show that >99.9% of events are delivered within 30 seconds on the first attempt under normal conditions. The remaining <0.1% are retried across up to 17 attempts over 72 hours. Stripe's key design decisions that achieve this:
api_version field in every payload is the API version at the time the event was created, not the current API version — consumers always receive the schema they integrated against.Outcome: Stripe's webhook retry architecture became the reference design for Twilio, GitHub, and most SaaS platforms that launched webhook support after 2015.
api_version) before implementing delivery infrastructure — consumers build against the schema.POST /hooks/{id}/test endpoint and a delivery history dashboard so consumers can verify integration without generating real events.harness validate to confirm skill files are well-formed and related skills are correctly cross-referenced.id, an event type string, a created timestamp, and an api_version field.POST /hooks/{id}/test) is available for consumers to verify handler integration without generating real events.