From lich-skills
Compresses subagent prompts to ≤200-word briefs before Task/Agent tool calls, preventing high token costs from re-tokenizing full contexts in multi-agent workflows.
npx claudepluginhub lichamnesia/lich-skills --plugin lich-skillsThis skill uses the workspace's default tool permissions.
Pre-flight checklist for spawning subagents in Claude Code. The single most
Guides creating, configuring, and orchestrating Claude Code subagents and Task tool. Covers prompts, tools, models, file structures, and multi-agent workflows.
Creates, validates, and refines Claude Code subagents for reliable delegation. Use for building new subagents, checking configurations, improving quality, scoping tool access, permission modes, and hook validation.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Pre-flight checklist for spawning subagents in Claude Code. The single most expensive mistake in multi-agent workflows is treating subagents as free threads. They are not. Each subagent is an independent LLM call that re-tokenizes the entire prompt you hand it.
There is no prefix sharing across subagents in Anthropic's serving stack today. The multi-agent topology is a known optimization opportunity (see arXiv 2604.25899 Pythia: Predictability-Driven Agent-Native LLM Serving, 2026-04) but has not yet shipped in production. Until it does, the burden of compression falls on the orchestrating agent.
The math: 20 subagents × 50K shared context = ~1M tokens per fan-out. One careless Task call exhausts a Max-plan daily quota.
For every subagent you are about to spawn, apply these five rules in order.
The subagent gets its own system prompt, its own tool list, and its own task framing automatically. Do not paste your own conversation history, your TODO list, or your reasoning trail into the subagent prompt unless the subagent literally needs them to do its job.
BAD: Here is the full content of foo.py: <800 lines>
GOOD: Read /abs/path/to/foo.py and report the public API.
The subagent has Read access. One Read call is cheaper than re-tokenizing 800 lines on every spawn.
BAD: <entire 50K context window dump> — now look for X
GOOD: Project context: <3-line summary>. Look for X in <specific path>.
If you genuinely synthesized something useful from your context, summarize it in 1-3 sentences and pass that summary. Never paste the raw window.
The brief should fit in a tweet-length window. If it does not, you are either dumping context the subagent can fetch itself, or handing over a task too large for one subagent — split it.
Before spawning N parallel subagents, ask: do they all share the same long context? If yes, you are about to multiply cost by N. Either:
ROLE: <one line — what this subagent is>
GOAL: <one line — what it should produce>
INPUTS:
- <path or 1-line summary, never raw content>
- <path or 1-line summary>
CONSTRAINTS:
- <output format, what NOT to do>
RETURN: <one line — what to report back>
Total ≤ 200 words. If it does not fit, the task is too large for one subagent. Split it.
| Excuse | Reality |
|---|---|
| "Faster to just paste the whole file." | Faster for you. The subagent re-tokenizes 800 lines on every spawn. |
| "Parallelism will be 5× faster." | 5× faster wall-clock, 5× more tokens — and Max plan caps are daily, not hourly. The hourly speedup costs you the rest of the day. |
| "The subagent needs context." | It needs information. A summary is information. The raw window is dump. |
| "I'll just pass the whole repo, it's small." | "Small" repos compound across 10 subagents into a Max-plan kill. |
| "I don't have time to summarize." | Summarizing forces you to clarify what the subagent should do — which is 80% of why subagents fail anyway. |
| "The user wanted parallel." | The user wanted fast. Token-budget parallel ≠ wall-clock parallel. Default to sequential unless tasks are genuinely independent. |
Spawn the subagent. Then evaluate:
Most "subagent failures" are brief failures, not subagent failures.
If you catch yourself writing any of these in a subagent prompt, stop and re-compress:
Multi-agent serving is a 2026 research topic, not a shipped feature. arXiv 2604.25899 (Pythia, 2026-04) demonstrates that exposing multi-agent topology to the serving layer can recover 3-5× of wasted tokens via prefix sharing — but no production LLM provider has shipped this yet. Anthropic, OpenAI, Codex: every subagent today is a cold-start, full-prefix re-tokenization.
Until that changes, the brief is the only knob you have. Use it.