Help us improve
Share bugs, ideas, or general feedback.
From claude-mods
Build applications on the Anthropic API and Claude Agent SDK: tool use, prompt caching, structured outputs, batches, extended thinking, model selection, and agentic loops.
npx claudepluginhub 0xdarkmatter/claude-mods --plugin claude-modsHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-mods:claude-api-opsWhen to use
Use when building applications on the Anthropic API or Claude Agent SDK — e.g. 'add tool use to my Claude app', 'set up prompt caching', 'which Claude model should I use', 'handle stop_reason / streaming'.
This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Building applications and agents on Anthropic's API: the Messages API, tool use,
Builds, debugs, and optimizes Claude API / Anthropic SDK apps with prompt caching, model migration, and feature tuning.
Reference for Claude API / Anthropic SDK: model IDs, pricing, params, streaming, tool use, MCP, agents, caching, token counting, and model migration. Activates when Claude/Anthropic is mentioned or LLM tasks arise.
Provides instructions for building LLM-powered apps with the Claude API or Anthropic SDK, including language detection and code examples for Python, TypeScript, Java, Go, Ruby, and more.
Share bugs, ideas, or general feedback.
Building applications and agents on Anthropic's API: the Messages API, tool use, prompt caching, structured outputs, batches, thinking/effort, and the Claude Agent SDK. For developers writing apps against the API — not for using Claude Code itself.
API surfaces move fast. Model IDs, parameters, and betas in this skill were
verified against platform.claude.com (2026-06). When in doubt — especially for
"latest model" or pricing questions — verify with WebFetch against
https://platform.claude.com/docs/en/about-claude/models/overview.md or query
the Models API (client.models.list()).
| Model | ID (exact, no date suffix) | Context | Max Output | Input $/MTok | Output $/MTok |
|---|---|---|---|---|---|
| Claude Fable 5 | claude-fable-5 | 1M | 128K | $10.00 | $50.00 |
| Claude Opus 4.8 | claude-opus-4-8 | 1M | 128K | $5.00 | $25.00 |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | 1M | 64K | $3.00 | $15.00 |
| Claude Haiku 4.5 | claude-haiku-4-5 | 200K | 64K | $1.00 | $5.00 |
Use these alias IDs verbatim. Never append date suffixes (claude-sonnet-4-6-20251114
is wrong → 404). Older actives: claude-opus-4-7, claude-opus-4-6, claude-opus-4-5,
claude-sonnet-4-5. Live capability lookup: client.models.retrieve("claude-opus-4-8")
→ .max_input_tokens, .max_tokens, .capabilities dict.
What is the workload?
│
├─ Hardest problems, long-horizon agents, deep research, ceiling intelligence
│ └─ claude-fable-5 (premium) or claude-opus-4-8 (default flagship)
│
├─ Agentic coding, tool-heavy workflows, production assistants
│ └─ claude-opus-4-8 (quality) or claude-sonnet-4-6 (speed/cost balance)
│
├─ High-volume production: summarization, RAG answers, extraction
│ └─ claude-sonnet-4-6
│
├─ Classification, routing, simple Q&A, latency-critical
│ └─ claude-haiku-4-5
│
└─ Subagents inside a larger system
└─ One tier below the orchestrator (Opus loop → Sonnet/Haiku workers)
Tiering rule: route by task difficulty, not by uniform default. An Opus orchestrator dispatching Haiku classifiers is routinely 5-10x cheaper than Opus-everywhere with no quality loss on the simple legs.
| Need | Use | Why |
|---|---|---|
| One request → one response (classify, summarize, extract, Q&A) | Messages API | Simplest; full control |
| Multi-step pipeline, your code controls the logic | Messages API + tool use | You own the loop |
| Custom agent with your own tools, your infra | Messages API + tool use (manual loop or SDK tool runner) | Max flexibility |
| Agent that reads/edits files, runs commands, searches — without building tools | Claude Agent SDK | Claude Code's tools + agent loop as a library |
| CI/CD automation, coding agents, production agent apps | Claude Agent SDK | Built-in tools, hooks, sessions, MCP |
| Large non-urgent workloads (eval runs, backfills, bulk extraction) | Batches API | 50% discount, ≤24h turnaround |
| Hosted agent, Anthropic runs loop + sandbox | Managed Agents (beta) | No infra; see official docs |
Rule of thumb: start at the simplest tier. Reach for an agent only when the task is genuinely open-ended (multi-step, hard to fully specify, errors recoverable, value justifies cost).
Everything goes through POST /v1/messages. Headers: x-api-key,
anthropic-version: 2023-06-01, content-type: application/json.
# pip install anthropic
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
system="You are a concise technical assistant.",
messages=[{"role": "user", "content": "Explain CRDTs in one paragraph."}],
)
for block in response.content: # content is a list of typed blocks
if block.type == "text": # always check .type before .text
print(block.text)
print(response.stop_reason, response.usage.input_tokens, response.usage.output_tokens)
// npm install @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-opus-4-8",
max_tokens: 16000,
messages: [{ role: "user", content: "Explain CRDTs in one paragraph." }],
});
for (const block of response.content) {
if (block.type === "text") console.log(block.text); // narrow the union first
}
Streaming (default to it for long outputs — non-streaming above ~16K
max_tokens risks SDK HTTP timeouts):
with client.messages.stream(model="claude-opus-4-8", max_tokens=64000,
messages=[{"role": "user", "content": "Write a long report"}]) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message() # full Message after streaming
Full params, response shape, stop reasons, errors, retries, rate limits: references/messages-api.md
thinking: {"type": "adaptive"}. The old fixed budget
{"type": "enabled", "budget_tokens": N} is removed on Fable 5 / Opus 4.8 / 4.7
(400 error) and deprecated on Opus 4.6 / Sonnet 4.6.output_config: {"effort": "low" | "medium" | "high" | "xhigh" | "max"}
— nested in output_config, not top-level. Default high. xhigh (Opus 4.7+)
is best for coding/agentic work; max is Opus-tier + Sonnet 4.6 only.temperature,
top_p, top_k all return 400. Steer with prompting + effort.tool_choice: {"type": "auto"} (default) or "none" is allowed —
{"type": "any"} or {"type": "tool", ...} returns a 400.thinking: {"type": "adaptive", "display": "summarized"} if you surface
reasoning to users.Details and gotchas: references/structured-outputs.md (thinking interplay) and references/messages-api.md.
tools = [{
"name": "get_weather",
"description": "Get current weather. Call when the user asks about weather conditions.",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string", "description": "City, e.g. Paris"}},
"required": ["location"],
},
}]
response = client.messages.create(model="claude-opus-4-8", max_tokens=16000,
tools=tools, messages=messages)
if response.stop_reason == "tool_use":
... # execute, send tool_result back, loop
tool_choice: {"type": "auto"} (default) | {"type": "any"} | {"type": "tool", "name": "..."} | {"type": "none"}. Add
"disable_parallel_tool_use": true to force at most one call per response.
The agentic loop, parallel tool results, pause_turn, is_error, server-side
tools, and SDK tool runners: references/tool-use.md
Work top-down; each item is independent:
cache_control: {"type": "ephemeral"}. Reads cost ~0.1x; up to 90% savings.
Verify with usage.cache_read_input_tokens > 0 — zero means a silent
invalidator (timestamp in system prompt, unsorted JSON, varying tools).max_tokens to what you need (256 for classification);
stream + generous cap for long generation.medium is often the sweet
spot; low for subagents and simple tasks.client.messages.count_tokens(...) (never
tiktoken — it's OpenAI's tokenizer and undercounts Claude by 15-20%).tools → system → messages,
volatile content last; don't swap tool sets or models mid-conversation.Mechanics, breakpoints, TTLs, batch lifecycle, tiering math: references/caching-and-cost.md
# pip install claude-agent-sdk (Python >= 3.10)
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def main():
async for message in query(
prompt="Find and fix the bug in auth.py",
options=ClaudeAgentOptions(allowed_tools=["Read", "Edit", "Bash"]),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
// npm install @anthropic-ai/claude-agent-sdk
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find and fix the bug in auth.ts",
options: { allowedTools: ["Read", "Edit", "Bash"] },
})) {
if ("result" in message) console.log(message.result);
}
Built-in tools (Read/Write/Edit/Bash/Glob/Grep/WebSearch/WebFetch/...), hooks
(PreToolUse, PostToolUse, ...), subagents, MCP servers, sessions
(resume/fork), permission modes, and the SDK-vs-raw-API decision:
references/agent-sdk.md
| Pitfall | Symptom | Fix |
|---|---|---|
| Date-suffixed or guessed model ID | 404 not_found_error | Use exact alias IDs from the table above |
budget_tokens on Fable 5 / Opus 4.8 / 4.7 | 400 | thinking: {"type": "adaptive"} |
| Assuming thinking is opt-in on Fable 5 | Unexpected thinking tokens billed | Fable 5 thinking is always-on and cannot be disabled ({"type": "disabled"} rejected); budget for it |
temperature/top_p/top_k on Fable 5 / Opus 4.8 / 4.7 | 400 | Remove; steer via prompt + effort |
Thinking + tool_choice: any/tool | 400 | Only auto/none with thinking on |
| Assistant-turn prefill on 4.6+ models | 400 | output_config.format or system-prompt instruction |
| Cache marker on <minimum prefix | Silent no-cache (cache_creation_input_tokens: 0) | Min 512-4096 tokens depending on model (see caching ref) |
Not handling stop_reason: "tool_use" | Agent "stops" after first tool call | Loop: execute tools, append tool_result, re-request |
Missing tool_result for a tool_use id | 400 on follow-up | One tool_result per tool_use block, ids matching |
Non-streaming with max_tokens > ~16K | SDK timeout / ValueError | Stream + get_final_message() / finalMessage() |
output_format top-level param | Deprecated | output_config: {"format": {...}} |
| tiktoken for Claude token counts | 15-20%+ undercount | messages.count_tokens endpoint |
| String-matching error messages | Fragile retries | Typed exceptions: anthropic.RateLimitError etc. |
Raw string-matching tool input | Breaks on escaping changes | Always json.loads() / use parsed block.input |
This skill ships a staleness verifier and two copy-and-adapt starter assets. The model table and pricing above are the facts most likely to drift — run the verifier when you suspect they're stale.
scripts/check-model-table.py — guards the Current Models table (this file)
and the per-model prompt-cache minimum table
(references/caching-and-cost.md) against drift.
Two modes per the resource protocol §7:
# Structural (default, no network): every row well-formed, ids carry no date
# suffix, prices numeric, the two files agree on the model lineup. Exit 4 on a
# malformed/contradictory row.
python skills/claude-api-ops/scripts/check-model-table.py --offline
python skills/claude-api-ops/scripts/check-model-table.py --offline --json | python -m json.tool
# Live (advisory, needs ANTHROPIC_API_KEY): curls the Models API and compares
# its id set against the documented ids. Exit 10 if a documented id is gone or a
# newer alias id is missing from the table; exit 7 (not a failure) if the key is
# unset or the API is unreachable. Live mode checks model-ID coverage ONLY — the
# API returns no pricing, so pricing/context drift stays an --offline + docs concern.
ANTHROPIC_API_KEY=sk-... python skills/claude-api-ops/scripts/check-model-table.py --live
assets/agentic-loop.py — a minimal, runnable tool-use loop (define a tool,
call messages.create, loop while stop_reason == "tool_use", append
tool_result, re-request until end_turn). Copy it as the starting point when
building a manual agent loop; the >>> ADAPT marks show what to change.
assets/output-schema.json — a known-good structured-outputs request body in
the canonical output_config.format shape (with additionalProperties: false
and a required array). Copy and reshape schema.properties when adding JSON
outputs; see references/structured-outputs.md
for the rules. (Not supported on Fable 5 — that model uses system-prompt
instructions instead.)
| File | Covers |
|---|---|
| references/messages-api.md | Params, response shape, streaming events, stop reasons, error handling, retries, rate limits |
| references/tool-use.md | Tool definitions, tool_choice, parallel tools, agentic loop, tool results, server tools, tool runners |
| references/caching-and-cost.md | Prompt caching mechanics, Batches API, token counting, model tiering economics |
| references/structured-outputs.md | output_config.format, schema rules/limits, strict tools, parse() helpers, thinking interplay |
| references/agent-sdk.md | Python + TS Agent SDK, ClaudeAgentOptions, hooks, MCP, sessions, SDK vs raw API |
When cached facts may be stale, WebFetch (append .md for clean markdown):
https://platform.claude.com/docs/en/about-claude/models/overview.mdhttps://platform.claude.com/docs/en/api/messageshttps://platform.claude.com/docs/en/agents-and-tools/tool-use/overview.mdhttps://platform.claude.com/docs/en/build-with-claude/prompt-caching.mdhttps://platform.claude.com/docs/en/build-with-claude/structured-outputs.mdhttps://platform.claude.com/docs/en/build-with-claude/batch-processing.mdhttps://code.claude.com/docs/en/agent-sdk/overview