Help us improve
Share bugs, ideas, or general feedback.
From claude-agent-dev
Design, write, test agents and subagents. Trigger on 'build agent', 'create subagent', 'agent prompt', 'multi-agent', 'managed agent', 'agent not working', 'agent keeps triggering wrong thing'. Also: agent role selection, permission scoping, tool surface design, choosing between subagent/team/workflow/managed agent.
npx claudepluginhub j0hanz/claude-agent-dev-pluginHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-agent-dev:create-agent [what the agent should do][what the agent should do]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
An agent is a worker with its **own context window, system prompt, and tool surface**. It trades coordination overhead and cost for isolation: work that would flood the main thread, or that benefits from a narrow role and a clean slate, gets delegated. Engineer every agent by working through seven decisions, then test it before shipping.
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Guides systematic root-cause debugging when tests fail, builds break, or unexpected errors occur. Provides a structured triage checklist to preserve evidence, localize, and fix issues instead of guessing.
Share bugs, ideas, or general feedback.
An agent is a worker with its own context window, system prompt, and tool surface. It trades coordination overhead and cost for isolation: work that would flood the main thread, or that benefits from a narrow role and a clean slate, gets delegated. Engineer every agent by working through seven decisions, then test it before shipping.
Core mindset: an agent is the expensive primitive. A fresh agent re-derives context cold, burns its own token budget, and adds a coordination boundary you now have to design across. Reach for one only when isolation, parallelism, or a reusable narrow role earns its keep. If the model just needs instructions it can follow inline, that is a skill. If a fixed action must happen at a lifecycle point, that is a hook. If you need it once, just do it.
| Need | Use |
|---|---|
| A side task would flood the main context (large search, log triage, broad refactor) | Subagent |
| A reusable role invoked across many sessions (reviewer, debugger, researcher) | Subagent (saved definition) |
| Several independent tasks should run in parallel | Agent team or agent view |
| A large, repeatable orchestration — dozens to hundreds of units, cross-checked | Workflow |
| An agent must be callable from outside a session (your app, a cron, a webhook) | Managed Agent (API) |
| The model needs instructions/knowledge it applies inline | Skill — not an agent |
| A fixed action must happen at a lifecycle point (format, block, log) | Hook → use the create-hook skill |
| A one-off task right now | Just do it — no agent |
If an agent is still the answer, walk the seven decisions below.
Write it as This agent <verb + object> so that <main-thread benefit>. Examples:
A good job statement is narrow. "An agent that helps with code" is not a job; "an agent that audits TypeScript for unsafe any usage" is. Narrow jobs produce sharp system prompts, tight tool lists, and testable behavior.
Split check — run before proceeding: If the job statement contains AND connecting two unrelated verbs (e.g., "review AND fix", "analyze AND deploy", "scan AND commit"), stop and split. Write two job statements. Two jobs = two agents. Do not combine them — propose the split to the user and produce separate definitions unless they explicitly confirm a combined design is intentional. A reviewer with write access is a misconfigured agent, not a combined agent.
The when and who-invokes determine the primitive. Read the deep comparison — cost, lifetime, observability, and migration paths — in references/primitives.md before committing.
| Primitive | Invoked by | Context | Best for |
|---|---|---|---|
| Claude Code subagent | Claude, mid-session, via the Agent tool | Fresh, isolated | Delegated side tasks; reusable narrow roles |
| Agent team | A parent agent (TeamCreate, /background) | Per-teammate, parallel | A few independent tasks running concurrently |
| Agent view | You, from a dashboard (claude agents) | Per-session, persistent | Many independent background sessions you steer |
| Dynamic workflow | A JS script the runtime executes | Per-agent, script-held | Large migrations/audits; cross-checked research at scale |
| Managed Agent | External code via the Agent API | One run per invocation | API-callable services outside any session |
Built-in subagents (don't rebuild these): Explore (Haiku, read-only search), Plan (read-only research), general-purpose (all tools). Prefer a typed/named subagent over general-purpose — general-purpose re-derives context cold and costs more.
For subagents, the full frontmatter field catalog and invocation modes are in references/subagent-spec.md. For the API shape (agents.create/update/invoke, the wholesale-replace trap, permission policy), see references/managed-agents.md and the /claude-api reference.
The system prompt is the agent — for a subagent it replaces the default Claude Code prompt entirely. This is the craft. Read references/system-prompt-craft.md for the full treatment; the skeleton:
Principles:
description. The description field must not contain < or > — use plain words instead of <example> tags or <field-name> placeholders. validate_agent.py warns on DESC003; avoid triggering it.Default to least privilege. Start read-only; add write/execute only when the job provably needs it.
| Job shape | Tool surface |
|---|---|
| Search / read / analyze | Read, Grep, Glob (this is the Explore profile) |
| Review / report | Read, Grep, Glob + the analysis it needs — no Edit/Write |
| Implement / fix | Read, Edit, Write, Bash, Glob, Grep |
| Orchestrate sub-work | add Agent(worker, researcher) — allowlist which agents it may spawn |
Rules:
tools is an allowlist; omit it to inherit all tools. disallowedTools is a denylist applied first. Prefer naming an allowlist over inheriting everything.Bash and write tools are escalations. They can modify the host and exfiltrate data. Justify each one against the job. A reviewer does not need Bash.Agent(name1, name2) to allowlist them; omit Agent entirely to forbid spawning.default prompts; plan is read-only; acceptEdits/bypassPermissions remove guardrails. Background/headless agents auto-deny prompts — a blocking prompt there silently stalls the agent. See the permission-mode table in references/subagent-spec.md.For Managed Agents, default agent_toolset to always_ask; reserve always_allow for a single, pinned, fully-trusted MCP server.
Mandatory safety challenges — trigger these before producing the definition:
| Request contains | Required action before proceeding |
|---|---|
bypassPermissions | (1) State what it unlocks: writes to .git/, .claude/, .vscode/ with no prompts. (2) Ask: "What specific capability requires bypass that acceptEdits doesn't cover?" (3) Present acceptEdits as the default recommendation. Only produce bypassPermissions after the user provides a concrete, scoped justification. |
Auto-commit (git commit, Bash staging, push) | (1) Warn: "Autonomous commit means no human review before git history changes." (2) Ask whether --no-commit (report findings instead) would suffice. (3) If proceeding, the system prompt must include an explicit boundary: "Do not force-commit. Stop and report if a pre-commit hook fails." |
Two unrelated jobs (AND in the job from Step 1) | Already caught in Step 1. If you reach here with a dual-job definition, go back and split it. |
Match the model to the hardest thing the agent must do — not the ceiling, not the floor.
| Model | model: value | Use when |
|---|---|---|
| Haiku | haiku / claude-haiku-4-5-20251001 | Classification, search/discovery, narrow tool dispatch (the Explore profile) |
| Sonnet | sonnet / claude-sonnet-4-6 | Default. Multi-step reasoning, implementation, tool orchestration |
| Opus | opus / claude-opus-4-8 | Root-cause analysis, security boundaries, adversarial or deep reasoning |
| Inherit | inherit | Match the parent session's model (default when omitted) |
effort (low…max) overrides the session effort for this agent — raise it for deep reasoning, lower it for mechanical work.CLAUDE_CODE_SUBAGENT_MODEL env → per-invocation override → frontmatter model → parent session model.An agent starts with a fresh context. Decide what goes into it and what it composes with.
isolation: worktree for an agent that writes files but must not disturb the parent checkout (auto-cleaned if it makes no changes). Set background: true for always-async work — but remember background agents auto-deny permission prompts.skills: [name, ...] injects those skills' full content at startup (not just the description). Use this to give an agent a capability without relying on it to discover the skill. Cannot preload skills marked disable-model-invocation: true.hooks, or shape it project-wide with SubagentStart/SubagentStop. For anything beyond a trivial guard, design it with the create-hook skill.mcpServers attaches tools inline (connect on start, disconnect on finish) or by reference — keeps MCP tool defs out of the main conversation's context.memory: user|project|local gives the agent a persistent MEMORY.md scope across runs. Use for agents that should accumulate learnings.What loads at startup (non-fork subagent): its own system prompt yes; full Claude Code prompt no; CLAUDE.md + memory yes (except Explore/Plan); parent conversation history no; skills: content yes.
Never ship an agent you haven't invoked on a real input. The failure modes are behavioral, not syntactic.
Validate the definition (catches the silent misconfigurations — bad name, dangerous tools, broad permissions, invalid model):
python scripts/validate_agent.py path/to/your-agent.md
Then exercise behavior. Run the agent on 2–3 representative inputs, including one edge case and one it should refuse or scope out of. Check:
The full eval-first method — deterministic checks vs. LLM-as-judge, baseline-vs-agent comparison, flakiness — is in references/testing-agents.md. For rigorous skill-style benchmarking, hand off to the skill-builder skill.
Before producing the definition, state the job in the This agent <verb + object> so that <benefit> form — one sentence, visible to the user. This makes the single-job check auditable and catches misaligned scope before tool choices are locked in.
When you produce an agent for the user, always give:
.claude/agents/<name>.md for project scope, ~/.claude/agents/ for all projects).@agent-<name> mention, or the --agent/API call.Mandatory pre-delivery: Run validate_agent.py on the definition. Do not present it until you have run it and confirmed zero ERRORs. Report the exact [OK] / [FAIL] result inline. If warnings exist, note each one and whether it was accepted and why.
/claude-api.claude-code-subagent.md and managed-agent.json.| Skill | For |
|---|---|
create-hook | Lifecycle hooks — the deterministic guarantees you attach to agents |
agents-maintainer | AGENTS.md/CLAUDE.md instruction files (not agent configs) |
skill-builder | Building a skill, and rigorous benchmarked evals |