Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.
/plugin marketplace add EveryInc/every-marketplace/plugin install compound-engineering@every-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/action-parity-discipline.mdreferences/agent-native-testing.mdreferences/architecture-patterns.mdreferences/dynamic-context-injection.mdreferences/mcp-tool-design.mdreferences/mobile-patterns.mdreferences/refactoring-to-prompt-native.mdreferences/self-modification.mdreferences/shared-workspace-architecture.mdreferences/system-prompt-design.md<why_now>
Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
The surprising discovery: a really good coding agent is actually a really good general-purpose agent. The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding. </why_now>
<core_principles>
Whatever the user can do through the UI, the agent should be able to achieve through tools.
This is the foundational principle. Without it, nothing else matters.
Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."
If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.
The fix: Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.
This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can achieve the same outcomes. Sometimes that's a single tool (create_note). Sometimes it's composing primitives (write_file to a notes directory with proper formatting).
The discipline: When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.
A capability map helps:
| User Action | How Agent Achieves It |
|---|---|
| Create a note | write_file to notes directory, or create_note tool |
| Tag a note as urgent | update_file metadata, or tag_note tool |
| Search notes | search_files or search_notes tool |
| Delete a note | delete_file or delete_note tool |
The test: Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?
Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.
A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.
A feature is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.
Less granular (limits the agent):
Tool: classify_and_organize_files(files)
→ You wrote the decision logic
→ Agent executes your code
→ To change behavior, you refactor
More granular (empowers the agent):
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "Organize the user's downloads folder. Analyze each file,
determine appropriate locations based on content and recency,
and move them there."
Agent: Operates in a loop—reads files, makes judgments, moves things,
checks results—until the folder is organized.
→ Agent makes the decisions
→ To change behavior, you edit the prompt
The key shift: The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.
The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.
The test: To change how a feature behaves, do you edit prose or refactor code?
With atomic tools and parity, you can create new features just by writing new prompts.
This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.
Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:
"Review files modified this week. Summarize key changes. Based on
incomplete items and approaching deadlines, suggest three priorities
for next week."
The agent uses list_files, read_file, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.
This works for developers and users. You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.
The constraint: This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.
The test: Can you add a new feature by writing a new prompt section, without adding new code?
The agent can accomplish things you didn't explicitly design for.
When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.
"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."
You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.
This reveals latent demand. Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.
The flywheel:
This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.
The test: Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.
Agent-native applications get better through accumulated context and prompt refinement.
Unlike traditional software, agent-native applications can improve without shipping code:
Accumulated context: The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A context.md file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.
Prompt refinement at multiple levels:
Self-modification (advanced): Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.
The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.
The test: Does the application work better after a month of use than on day one, even without code changes? </core_principles>
<intake> ## What aspect of agent-native architecture do you need help with?Wait for response before proceeding. </intake>
<routing> | Response | Action | |----------|--------| | 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md), then apply Architecture Checklist below | | 2, "files", "workspace", "filesystem" | Read [files-universal-interface.md](./references/files-universal-interface.md) and [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) | | 3, "tool", "mcp", "primitive", "crud" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) | | 4, "domain tool", "when to add" | Read [from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) | | 5, "execution", "completion", "loop" | Read [agent-execution-patterns.md](./references/agent-execution-patterns.md) | | 6, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) | | 7, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) | | 8, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) | | 9, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) | | 10, "product", "progressive", "approval", "latent demand" | Read [product-implications.md](./references/product-implications.md) | | 11, "mobile", "ios", "android", "background", "checkpoint" | Read [mobile-patterns.md](./references/mobile-patterns.md) | | 12, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) | | 13, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |After reading the reference, apply those patterns to the user's specific context. </routing>
<architecture_checklist>
When designing an agent-native system, verify these before implementation:
z.string() inputs when the API validates, not z.enum()complete_task tool (not heuristic detection)refresh_context tool)When designing architecture, explicitly address each checkbox in your plan. </architecture_checklist>
<quick_start>
Step 1: Define atomic tools
const tools = [
tool("read_file", "Read any file", { path: z.string() }, ...),
tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
tool("list_files", "List directory", { path: z.string() }, ...),
tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
];
Step 2: Write behavior in the system prompt
## Your Responsibilities
When asked to organize content, you should:
1. Read existing files to understand the structure
2. Analyze what organization makes sense
3. Create/move files using your tools
4. Use your judgment about layout and formatting
5. Call complete_task when you're done
You decide the structure. Make it good.
Step 3: Let the agent work in a loop
const result = await agent.run({
prompt: userMessage,
tools: tools,
systemPrompt: systemPrompt,
// Agent loops until it calls complete_task
});
</quick_start>
<reference_index>
All references in references/:
Core Patterns:
Agent-Native Disciplines:
Platform-Specific:
<anti_patterns>
These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.
Agent as router — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.
Build the app, then add agent — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.
Request/response thinking — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.
Defensive tool design — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.
Happy path in code, agent just executes — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.
THE CARDINAL SIN: Agent executes your code instead of figuring things out
// WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Your code decides
const priority = calculatePriority(message); // Your code decides
await store(message, category, priority); // Your code orchestrates
if (priority > 3) await notify(); // Your code decides
});
// RIGHT - Agent figures out how to process feedback
tools: store_item, send_message // Primitives
prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
Workflow-shaped tools — analyze_and_organize bundles judgment into the tool. Break it into primitives and let the agent compose them.
Context starvation — Agent doesn't know what resources exist in the app.
User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to."
Fix: Inject available resources, capabilities, and vocabulary into system prompt.
Orphan UI actions — User can do something through the UI that the agent can't achieve. Fix: maintain parity.
Silent actions — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.
Heuristic completion detection — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a complete_task tool.
Static tool mapping for dynamic APIs — Building 50 tools for 50 API endpoints when a discover + access pattern would give more flexibility.
// WRONG - Every API type needs a hardcoded tool
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required
// RIGHT - Dynamic capability discovery
tool("list_available_types", ...) // Discover what's available
tool("read_health_data", { dataType: z.string() }, ...) // Access any type
Incomplete CRUD — Agent can create but not update or delete.
// User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...) // Missing: update, delete
Fix: Every entity needs full CRUD.
Sandbox isolation — Agent works in separate data space from user.
Documents/
├── user_files/ ← User's space
└── agent_output/ ← Agent's space (isolated)
Fix: Use shared workspace where both operate on same files.
Gates without reason — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.
Artificial capability limits — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do. </anti_patterns>
<success_criteria>
You've built an agent-native application when:
Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.
Can it figure out how to accomplish it, operating in a loop until it succeeds?
If yes, you've built something agent-native.
If it says "I don't have a feature for that"—your architecture is still too constrained. </success_criteria>
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.