Help us improve
Share bugs, ideas, or general feedback.
From build-like-amazon
Designs API contracts with backward compatibility guarantees, versioning strategy, error semantics, idempotency, and pagination.
npx claudepluginhub robisson/build-like-amazon-agent-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/build-like-amazon:api-contract-firstThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Contract-first means you design the API before you write implementation code. The API contract is a binding agreement with your clients: once published, it cannot be broken. This approach forces you to think from the client's perspective, establish clear semantics upfront, and create a stable interface that enables independent evolution of client and server.
Generate OpenAPI 3.1 contracts with schemas, RFC 9457 errors, versioning, and examples. Use when defining API contracts from entities.
Provides decision trees, patterns, and guidance for REST, gRPC, GraphQL API design including resource naming, schema, versioning, pagination, rate limiting, auth, and OpenAPI.
Designs and manages API contracts before implementation using OpenAPI and AsyncAPI specs for contract-first development workflows and best practices.
Share bugs, ideas, or general feedback.
Contract-first means you design the API before you write implementation code. The API contract is a binding agreement with your clients: once published, it cannot be broken. This approach forces you to think from the client's perspective, establish clear semantics upfront, and create a stable interface that enables independent evolution of client and server.
At scale, APIs are used by clients you don't know, in ways you didn't anticipate, depending on behaviors you never documented. Hyrum's Law is absolute: "With a sufficient number of users of an API, all observable behaviors of your system will be depended on by somebody." This means every response header, every error format, every ordering behavior becomes part of your contract whether you intended it or not.
An API is any contract through which a client consumes functionality. The protocol does not change the rule — but the standard you use to express the contract does change with the protocol. Picking the right standard matters. OpenAPI is not the universal answer; it is the answer for one of the protocols below.
| Protocol / surface | Recommended contract standard | Artifact filename example |
|---|---|---|
| REST / HTTP | OpenAPI 3.x (canonical for REST). Smithy if you're in the AWS service style and want SDK code-gen. | openapi.yaml, service.smithy |
| GraphQL | GraphQL SDL — the schema is the contract; .graphql schema file with queries, mutations, subscriptions, types. Optionally publish via Apollo Federation / Schema Registry. | schema.graphql |
| gRPC | Protocol Buffers (.proto). Use proto3. The .proto is the contract; everything else (clients, servers, docs) is generated from it. | service.proto |
| Async / event-driven (SQS, Kinesis, MSK / Kafka, EventBridge, SNS, RabbitMQ) | AsyncAPI 3.x for the channel-and-operation level (the equivalent of OpenAPI for async). For the payload schema itself, pair with JSON Schema, Avro, or Protobuf depending on your serializer. EventBridge: also publish the event schema in the EventBridge Schema Registry. | asyncapi.yaml + schemas/payment-requested.avsc |
| Data contracts (warehouse tables, lakehouse, topics, file formats consumed by downstream pipelines) | Open Data Contract Standard (ODCS) when available — the emerging standard for producer ↔ consumer agreements on data products. Otherwise use the schema-language native to the store: dbt contracts for warehouse tables, Iceberg / Delta schema for lakehouse, Avro / Protobuf / JSON Schema for topics. | data-contract.yaml (ODCS), model.yml (dbt) |
| MCP tool surface | MCP server manifest — the tool definitions exposed by the server (name, parameters, return shape, side-effect declaration, JSON Schema for arguments). | mcp-tools.json |
| AI agent tool definitions | Agent tool spec — provider-native (Anthropic tool use, OpenAI tools, Bedrock agent action groups). Underlying input shape is JSON Schema. | tools.json |
| Webhooks (outbound events you publish to subscribers) | AsyncAPI (treats publisher-to-subscriber as first-class) or OpenAPI if you describe the receiver-side endpoint contract. JSON Schema for the payload. | webhooks.asyncapi.yaml |
| CLI public surface | Commands, flags, exit codes, output schemas (JSON output: JSON Schema). No single industry standard — document explicitly. | cli-contract.md + output-schemas/ |
| SDK public surface | Language-native interface definitions (Java interfaces, TypeScript .d.ts, Python .pyi). Generate from the underlying API standard when possible (OpenAPI / Smithy / Proto). | generated from upstream contract |
When picking the standard, the agent applies these rules in order:
openapi.yaml and an asyncapi.yaml — they describe different surfaces. Do not try to merge them..proto (proto3)Anything that consumes the API: a web UI, a mobile app, a CLI, an SDK, an MCP server, an AI agent, another internal service, a partner integration, a batch ETL job. All clients are equivalent from the contract's point of view. The word "frontend" is too narrow — use "client" or "API consumer".
This is the direct application of Amazon's 2002 API Mandate: every team exposes its functionality only through APIs, and there is no other way to consume what another team built.
Amazon services are decoupled and communicate through well-defined APIs. A team owns its API surface completely and is responsible for never breaking clients. AWS public APIs are forever—once released, they cannot be removed or have their behavior changed. This discipline applies internally as well: internal APIs have clients you've never met, in organizations you've never heard of. Breaking them causes cascading failures.
The principle "APIs are forever" drives extreme care in API design. It's far better to ship a minimal API and extend it than to ship a broad API and discover parts of it are wrong. You can always add; you can never remove.
Before any implementation:
The implementation serves the contract. Never let implementation convenience dictate API shape.
Even these "safe" changes can break clients in practice:
Mitigation: Use integration tests with real client traffic patterns. Monitor error rates after every deployment. Use canary deployments for API changes.
Choose one strategy and apply it consistently:
/v1/resources
/v2/resources
Accept: application/vnd.company.resource.v1+json
POST /resources
X-Api-Version: 2024-01-15
Key principle: Never require clients to migrate. If they're on v1 and it works, it should keep working indefinitely.
{
"type": "https://api.example.com/errors/resource-not-found",
"error_code": "RESOURCE_NOT_FOUND",
"message": "Order with ID 'ord-123' was not found",
"request_id": "req-abc-def-123",
"timestamp": "2024-01-15T10:30:00Z",
"details": [
{
"field": "order_id",
"issue": "No order exists with this identifier",
"suggestion": "Verify the order ID and try again"
}
],
"retry_after_ms": null
}
error_code, not on message text.retry_after_ms).| Status | Error Code Pattern | Client Action |
|---|---|---|
| 400 | INVALID_* | Fix request, do not retry |
| 401 | AUTHENTICATION_* | Re-authenticate, then retry |
| 403 | AUTHORIZATION_* | Do not retry; insufficient permissions |
| 404 | NOT_FOUND | Resource doesn't exist; may retry if race condition |
| 409 | CONFLICT_* | Resolve conflict (e.g., version mismatch), then retry |
| 429 | RATE_LIMITED | Retry after retry_after_ms with exponential backoff |
| 500 | INTERNAL_ERROR | Retry with exponential backoff |
| 503 | SERVICE_UNAVAILABLE | Retry with exponential backoff |
Networks are unreliable. Clients will retry. If a request succeeds but the response is lost, the client retries what was already processed. Without idempotency, this causes duplicate actions (double charges, double sends, duplicate records).
POST /v1/payments
Idempotency-Key: client-generated-uuid
{
"amount": 100,
"currency": "USD",
"recipient": "user-456"
}
Rules:
| Operation Type | Naturally Idempotent? | Needs Idempotency Key? |
|---|---|---|
| GET | Yes (reads are safe) | No |
| PUT (full replace) | Yes (same input = same state) | No |
| DELETE | Yes (deleting deleted = no-op) | No |
| POST (create) | No | Yes |
| PATCH (partial update) | Depends | Yes (for non-commutative updates) |
Table: idempotency_keys
├── key: string (PK) — The idempotency key
├── client_id: string — Which client submitted it
├── status: enum — PROCESSING | COMPLETED | FAILED
├── request_hash: string — Hash of request body (detect misuse)
├── response: blob — Stored response to return on replay
├── created_at: timestamp
└── expires_at: timestamp (TTL)
GET /v1/orders?limit=20
→ Response includes: "next_cursor": "eyJpZCI6IjEyMyJ9"
GET /v1/orders?limit=20&cursor=eyJpZCI6IjEyMyJ9
→ Next page
Why cursor-based over offset-based:
Cursor design:
{
"items": [...],
"pagination": {
"next_cursor": "eyJpZCI6IjEyMyJ9",
"has_more": true,
"total_count": null
}
}
Rules:
has_more boolean instead.has_more: false = end of results.429 Too Many Requests with Retry-After headerX-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1640000000
X-Request-Deadline: <timestamp>)Sunset header to deprecated endpoint responses| Intention | Mechanism |
|---|---|
| "I'll keep the API backward compatible" | Contract tests run in CI that compare new schema against published baseline; any breaking change fails the build |
| "I'll document the API properly" | API spec is generated from code annotations; if the code doesn't have annotations, the endpoint doesn't exist |
| "I'll add idempotency later" | Framework requires idempotency key for all POST/PATCH operations; implementation fails without it |
| "I'll handle pagination when we have more data" | Framework enforces maximum response size; unbounded queries are rejected at compile time |
| "I'll communicate deprecations to clients" | Automated deprecation scanner finds usage and files tickets with consuming teams |
| What They Say | Why It's Wrong | What To Do Instead |
|---|---|---|
| "No one uses that field, we can remove it" | You don't know every client. Hyrum's Law: if it's observable, someone depends on it. | Deprecate it: stop documenting, stop using in new clients, but keep returning it. Remove only after zero-traffic verification. |
| "We'll version when we need to" | By the time you need to version, you've already broken clients. Versioning strategy must exist from day one. | Design your versioning strategy in the contract. Make the first version explicit (v1, not versionless). |
| "The error message explains what happened" | Clients parse error codes, not messages. If your error handling depends on message text, you're one rewording away from breaking clients. | Use machine-readable error codes. Messages are for humans debugging, codes are for machines branching. |
| "We'll add pagination later when the data grows" | By the time data grows, clients are depending on getting all results in one call. Adding pagination is a breaking change. | Paginate from day one. Even if there's only 3 items. The contract includes pagination from the start. |
| "Our internal API doesn't need this rigor" | Internal APIs become external APIs. Internal clients are still clients. Internal breakages still cause outages. | Apply the same rigor. The only difference is you might iterate faster on internal APIs, but you still don't break them. |