From latestaiagents
Use Claude's 1M-token context window effectively — when to use it, how to structure inputs for recall, how to price it, and how to combine with prompt caching to keep it affordable. Use this skill when building apps that feed large codebases, long documents, or entire conversation histories to Claude, or when weighing 1M context vs RAG. Activate when: 1M context, long context, big context window, context vs RAG, Claude 1 million tokens, context-beta header.
npx claudepluginhub latestaiagents/agent-skills --plugin skills-authoringThis skill uses the workspace's default tool permissions.
**Claude Opus 4.6 and Sonnet 4.6 support 1M token context with the `context-1m-2025-08-07` beta header. Use it well or burn money for nothing.**
Guides token budgeting, placement effects, RAG patterns, prompt caching, compression, and multi-turn strategies for LLM applications. Use for context windows, budgets, overflow, and optimization.
Teaches context engineering ops: Write to persist, Select relevant info, Compress tokens, Isolate to manage budgets and keep AI coding sessions efficient.
Manages Claude Code context window: monitors usage via statusline, creates handoffs, compacts with structured summaries, resumes after compaction. Use for long sessions nearing limits.
Share bugs, ideas, or general feedback.
Claude Opus 4.6 and Sonnet 4.6 support 1M token context with the context-1m-2025-08-07 beta header. Use it well or burn money for nothing.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create(
{
model: "claude-sonnet-4-6",
max_tokens: 4096,
messages: [{ role: "user", content: giantDocument + "\n\nSummarize." }],
},
{ headers: { "anthropic-beta": "context-1m-2025-08-07" } },
);
Without the beta header, requests over 200K tokens will error.
Long context is priced differently above 200K input tokens. Check your provider's current rates; as a rule of thumb input above 200K costs ~2× the base rate. Output price is unchanged.
Rule: if you're only going to use 200K, don't enable 1M. Only pay for long-context pricing when you actually need > 200K.
| When 1M context wins | When RAG wins |
|---|---|
| Cross-document synthesis | Fresh data that updates hourly |
| Full-codebase refactoring | Unbounded corpus (> 1M tokens) |
| Holistic code review | Per-user personal data (privacy isolation) |
| Single-shot analysis | Many cheap lookups on small queries |
| Exploration where you don't know what's relevant | Known query patterns |
Hybrid: RAG retrieves the top 500K tokens; stuff those into 1M context. Best of both.
Claude's long-context recall is strong but not uniform. Tips:
<document index="1" title="...">...</document> — the model indexes on theseconst prompt = `
You will analyze the codebase below, then answer questions.
<codebase>
<file path="src/auth.ts">...</file>
<file path="src/db.ts">...</file>
...
</codebase>
Given the codebase above, answer: <question>How does auth flow work?</question>
`;
1M context is expensive per call. If you're asking multiple questions against the same corpus, cache it:
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 4096,
system: [
{ type: "text", text: "You are a code reviewer." },
{ type: "text", text: giantCodebase, cache_control: { type: "ephemeral", ttl: "1h" } },
],
messages: [{ role: "user", content: "What's the auth flow?" }],
});
First call: full cost. Subsequent calls in the TTL window: ~10% of input cost for the cached portion. See the prompt-caching-ttl skill.
1M input takes longer to process — TTFT can be 10-30s for a full context. Mitigate:
Use the token counter before sending:
const { input_tokens } = await client.messages.countTokens({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: text }],
});
if (input_tokens > 1_000_000) throw new Error("Over context limit");
Budget with a 5% safety margin — actual tokenization varies slightly.