Help us improve
Share bugs, ideas, or general feedback.
From beagle-analysis
Audits agent codebases against the 12-Factor Agents methodology, analyzing per-factor compliance with file-level evidence. Use when reviewing LLM-powered system architecture or planning agent improvements.
npx claudepluginhub existential-birds/beagle --plugin beagle-analysisHow this skill is triggered — by the user, by Claude, or both
Slash command
/beagle-analysis:agent-architecture-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Reference: [12-Factor Agents](https://github.com/humanlayer/12-factor-agents)
Provides Agent design consulting and review based on the 12-Factor AgentOps framework. Helps design new agents and review existing agents, skills, and workflows for issues.
Audits Claude Code agents for violations, gaps, and improvements across 7 dimensions like description quality and frontmatter, outputting structured repair plans.
Creates Claude Code agents from scratch or by adapting templates. Guides requirements gathering, template selection, and file generation following Anthropic best practices (v2.1.63+).
Share bugs, ideas, or general feedback.
Reference: 12-Factor Agents
| Parameter | Description | Required |
|---|---|---|
docs_path | Path to documentation directory (for existing analyses) | Optional |
codebase_path | Root path of the codebase to analyze | Required |
Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.
Search Patterns:
# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"
# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"
# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"
File Patterns: **/agents/*.py, **/schemas/*.py, **/models/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | All LLM outputs use Pydantic/dataclass schemas with validators |
| Partial | Some outputs typed, but dict returns or unvalidated strings exist |
| Weak | LLM returns raw strings parsed manually or with regex |
Anti-patterns:
json.loads(llm_response) without schema validationoutput.split() or regex parsing of LLM responsesdict[str, Any] return types from agentsPrinciple: Treat prompts as first-class code you control, version, and iterate on.
Search Patterns:
# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"
# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"
# Look for prompt directories
find . -type d -name "prompts"
File Patterns: **/prompts/**, **/templates/**, **/agents/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Prompts in separate files, templated (Jinja2), versioned |
| Partial | Prompts as module constants, some parameterization |
| Weak | Prompts hardcoded inline in functions, f-strings only |
Anti-patterns:
f"You are a {role}..." inline in agent methodsPrinciple: Control how history, state, and tool results are formatted for the LLM.
Search Patterns:
# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"
# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"
# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"
File Patterns: **/context/*.py, **/state/*.py, **/core/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Custom context format, token optimization, typed events, compaction |
| Partial | Basic message history with some structure |
| Weak | Raw message accumulation, standard OpenAI format only |
Anti-patterns:
Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.
Search Patterns:
# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"
# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"
# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"
File Patterns: **/tools/*.py, **/handlers/*.py, **/agents/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | All tool outputs schema-validated, handlers type-safe |
| Partial | Most tools typed, some loose dict returns |
| Weak | Tools return arbitrary dicts, no validation layer |
Anti-patterns:
eval() or exec() on LLM-generated codePrinciple: Merge execution state (step, retries) with business state (messages, results).
Search Patterns:
# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"
# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep -r "sqlite\|database\|repository" --include="*.py"
# Look for state reconstruction
grep -r "load_state\|restore\|reconstruct" --include="*.py"
File Patterns: **/state/*.py, **/models/*.py, **/database/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Single serializable state object with all execution metadata |
| Partial | State exists but split across systems (memory + DB) |
| Weak | Execution state scattered, requires multiple queries to reconstruct |
Anti-patterns:
Principle: Agents support simple APIs for launching, pausing at any point, and resuming.
Search Patterns:
# Look for REST endpoints
grep -r "@router.post\|@app.post" --include="*.py"
grep -r "start_workflow\|pause\|resume" --include="*.py"
# Look for interrupt mechanisms
grep -r "interrupt_before\|interrupt_after" --include="*.py"
# Look for webhook handlers
grep -r "webhook\|callback" --include="*.py"
File Patterns: **/routes/*.py, **/api/*.py, **/orchestrator/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | REST API + webhook resume, pause at any point including mid-tool |
| Partial | Launch/pause/resume exists but only at coarse-grained points |
| Weak | CLI-only launch, no pause/resume capability |
Anti-patterns:
input() or confirm() callsPrinciple: Human contact is a tool call with question, options, and urgency.
Search Patterns:
# Look for human input mechanisms
grep -r "typer.confirm\|input(\|prompt(" --include="*.py"
grep -r "request_human_input\|human_contact" --include="*.py"
# Look for approval patterns
grep -r "approval\|approve\|reject" --include="*.py"
# Look for structured question formats
grep -r "question.*options\|HumanInputRequest" --include="*.py"
File Patterns: **/agents/*.py, **/tools/*.py, **/orchestrator/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | request_human_input tool with question/options/urgency/format |
| Partial | Approval gates exist but hardcoded in graph structure |
| Weak | Blocking CLI prompts, no tool-based human contact |
Anti-patterns:
typer.confirm() in agent codePrinciple: Custom control flow, not framework defaults. Full control over routing, retries, compaction.
Search Patterns:
# Look for routing logic
grep -r "add_conditional_edges\|route_\|should_continue" --include="*.py"
# Look for custom loops
grep -r "while True\|for.*in.*range" --include="*.py" | grep -v test
# Look for execution mode control
grep -r "execution_mode\|agentic\|structured" --include="*.py"
File Patterns: **/orchestrator/*.py, **/graph/*.py, **/core/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Custom routing functions, conditional edges, execution mode control |
| Partial | Framework control flow with some customization |
| Weak | Default framework loop with no custom routing |
Anti-patterns:
Principle: Errors in context enable self-healing. Track consecutive errors, escalate after threshold.
Search Patterns:
# Look for error handling
grep -r "except.*Exception\|error_history\|consecutive_errors" --include="*.py"
# Look for retry logic
grep -r "retry\|backoff\|max_attempts" --include="*.py"
# Look for escalation
grep -r "escalate\|human_escalation" --include="*.py"
File Patterns: **/agents/*.py, **/orchestrator/*.py, **/core/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Errors in context, retry with threshold, automatic escalation |
| Partial | Errors logged and returned, no automatic retry loop |
| Weak | Errors logged only, not fed back to LLM, task fails immediately |
Anti-patterns:
logger.error() without adding to contextPrinciple: Each agent has narrow responsibility, 3-10 steps max.
Search Patterns:
# Look for agent classes
grep -r "class.*Agent\|class.*Architect\|class.*Developer" --include="*.py"
# Look for step definitions
grep -r "steps\|tasks" --include="*.py" | head -20
# Count methods per agent
grep -r "async def\|def " agents/*.py 2>/dev/null | wc -l
File Patterns: **/agents/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | 3+ specialized agents, each with single responsibility, step limits |
| Partial | Multiple agents but some have broad scope |
| Weak | Single "god" agent that handles everything |
Anti-patterns:
Principle: Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.
Search Patterns:
# Look for entry points
grep -r "@cli.command\|@router.post\|@app.post" --include="*.py"
# Look for WebSocket support
grep -r "WebSocket\|websocket" --include="*.py"
# Look for external integrations
grep -r "slack\|discord\|webhook" --include="*.py" -i
File Patterns: **/routes/*.py, **/cli/*.py, **/main.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | CLI + REST + WebSocket + webhooks + chat integrations |
| Partial | CLI + REST API available |
| Weak | CLI only, no programmatic access |
Anti-patterns:
if __name__ == "__main__" entry pointPrinciple: Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.
Search Patterns:
# Look for state mutation patterns
grep -r "\.status = \|\.field = " --include="*.py"
# Look for immutable updates
grep -r "model_copy\|\.copy(\|with_" --include="*.py"
# Look for side effects in agents
grep -r "write_file\|subprocess\|requests\." agents/*.py 2>/dev/null
File Patterns: **/agents/*.py, **/nodes/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Immutable state updates, side effects isolated to tools/handlers |
| Partial | Mostly immutable, some in-place mutations |
| Weak | State mutated in place, side effects mixed with agent logic |
Anti-patterns:
state.field = new_value (mutation)Principle: Fetch likely-needed data upfront rather than mid-workflow.
Search Patterns:
# Look for context pre-fetching
grep -r "pre_fetch\|prefetch\|fetch_context" --include="*.py"
# Look for RAG/embedding systems
grep -r "embedding\|vector\|semantic_search" --include="*.py"
# Look for related file discovery
grep -r "related_tests\|similar_\|find_relevant" --include="*.py"
File Patterns: **/context/*.py, **/retrieval/*.py, **/rag/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Automatic pre-fetch of related tests, files, docs before planning |
| Partial | Manual context passing, design doc support |
| Weak | No pre-fetching, LLM must request all context via tools |
Anti-patterns:
Gate order: Do not assign Strong / Partial / Weak or treat recommendations as observed facts until Hard gates (after Analysis Workflow) are satisfied for the factors in scope.
| Factor | Status | Notes |
|--------|--------|-------|
| 1. Natural Language -> Tool Calls | **Strong/Partial/Weak** | [Key finding] |
| 2. Own Your Prompts | **Strong/Partial/Weak** | [Key finding] |
| ... | ... | ... |
| 13. Pre-fetch Context | **Strong/Partial/Weak** | [Key finding] |
**Overall**: X Strong, Y Partial, Z Weak
For each factor, provide:
Current Implementation
Compliance Level
Gaps
Recommendations
Initial Scan
Deep Dive (per factor)
Gap Analysis
Recommendations
Summary
Run these in order. Do not skip ahead: each Pass is an objective condition you can check (paths on disk, citations present), not internal certainty.
codebase_path, or an explicit no evidence located statement after targeted reads. If evidence is missing after search, default that factor to Weak unless the criterion is clearly N/A (say why).| Score | Meaning | Action |
|---|---|---|
| Strong | Fully implements principle | Maintain, minor optimizations |
| Partial | Some implementation, significant gaps | Planned improvements |
| Weak | Minimal or no implementation | High priority for roadmap |