From latestaiagents
Use Claude's extended thinking (reasoning) mode effectively — budget tokens, interleaved thinking with tool use, when it helps, when it wastes tokens, and how to inspect the thinking trace. Use this skill when building reasoning-heavy features (math, code generation, multi-step planning), debugging why a model is shallow on hard problems, or deciding whether to enable thinking. Activate when: extended thinking, thinking tokens, budget_tokens, reasoning mode, interleaved thinking, thinking blocks.
npx claudepluginhub latestaiagents/agent-skills --plugin skills-authoringThis skill uses the workspace's default tool permissions.
**Extended thinking gives the model a scratchpad before the final answer. Pay for reasoning tokens, get deeper answers. Use it surgically, not everywhere.**
Designs chain-of-thought reasoning chains for improved AI outputs in complex reasoning, ambiguous inputs, high-stakes tasks, analysis, and creative exploration.
Strips AI-slop patterns from chain-of-thought, extended thinking, and agent decomposition traces—not final prose. Targets over-explaining questions, hedging plans, trivial over-decomposition, infinite-loop rationalization.
Guides safe adjustment of reasoning depth, budgets, and metrics visibility using dedicated references and workflows for Claude Code sessions.
Share bugs, ideas, or general feedback.
Extended thinking gives the model a scratchpad before the final answer. Pay for reasoning tokens, get deeper answers. Use it surgically, not everywhere.
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: {
type: "enabled",
budget_tokens: 10_000,
},
messages: [{ role: "user", content: "Prove that every prime > 3 is of the form 6k±1." }],
});
budget_tokens is the max thinking tokens. Model may use fewer. max_tokens must be > budget_tokens (thinking counts toward the total).
| Task | Typical budget |
|---|---|
| Short multi-step reasoning | 2,000-5,000 |
| Code generation with planning | 5,000-10,000 |
| Complex math/proofs | 10,000-32,000 |
| Deep agent planning | 10,000-20,000 |
| Research synthesis | 16,000-32,000 |
Start at 5,000 and measure. Bigger budget ≠ better answers past a point.
for (const block of response.content) {
if (block.type === "thinking") {
console.log("REASONING:", block.thinking);
} else if (block.type === "text") {
console.log("ANSWER:", block.text);
}
}
The thinking block reveals the model's reasoning. Useful for:
Do not feed thinking blocks back to the user as-is in production — they're not polished prose. And do not modify them before passing back in multi-turn (signature validation will fail).
With the interleaved-thinking-2025-05-14 beta, the model thinks between tool calls — reasoning about each tool result before picking the next:
const response = await client.messages.create(
{
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: { type: "enabled", budget_tokens: 10_000 },
tools: [searchTool, fetchTool, summarizeTool],
messages: [{ role: "user", content: "Research X and write a brief." }],
},
{ headers: { "anthropic-beta": "interleaved-thinking-2025-05-14" } },
);
Without interleaved thinking, the model only thinks once at the start. With it, the model can reassess after every tool result — critical for agents that operate under uncertainty.
When continuing a conversation that included thinking, pass the assistant's full message back unchanged (including thinking blocks):
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: "Great. Now prove the converse." });
const next = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: { type: "enabled", budget_tokens: 10_000 },
messages,
});
Thinking blocks carry signatures that the API validates. Reordering or editing them breaks the request.
Thinking tokens are billed as output tokens. A call with 10K thinking + 2K answer costs 12K output tokens.
Rough rule: thinking doubles-to-triples the cost of a reasoning-heavy call. Confirm it's worth it by A/B testing against no-thinking.
const stream = client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: { type: "enabled", budget_tokens: 10_000 },
messages: [...],
});
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta.type === "thinking_delta") {
// show a "thinking..." spinner or subtle text
} else if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}
In UX, show a distinct "thinking" indicator, then switch to streaming the answer.