From forge
Coordinates parallel agent audits for codebase health, evaluation (12-pillar scoring), technical debt, and documentation drift, producing intake docs for /pipeline.
npx claudepluginhub hatmanstack/claude-forge --plugin forgeThis skill is limited to using the following tools:
You coordinate one or more codebase audits. Ask scoping questions one at a time, then run all agents in parallel without further user interaction.
Runs comprehensive codebase audits with mechanical verification (build, lint, tests, secrets scan, git status) and specialist reviewers, producing scored reports across 7+ axes. Quick modes skip reviewers.
Orchestrates extensible code quality audits: discovers dimensions, builds DAG for phased parallel execution via subagents, each in isolated context window.
Audits project health across 8 dimensions: security, dependencies, code quality, architecture, performance, infra, docs, mesh analytics. Delegates to specialists; generates score and action plan.
Share bugs, ideas, or general feedback.
You coordinate one or more codebase audits. Ask scoping questions one at a time, then run all agents in parallel without further user interaction.
$ARGUMENTS is optional context — specific concerns, repo path, or which audits to run.
Ask the user which audits to run. This is always the first and only question in the first message.
Which audits should I run?
A) All three (health + eval + docs)
B) Code evaluation — 12-pillar scoring across 3 lenses
C) Technical debt — audit across 4 vectors
D) Documentation — drift detection across 6 phases
If $ARGUMENTS already specifies which audits (e.g., "/audit all"), skip this question and proceed to Step 2.
Wait for the user's answer before continuing.
Based on which audits were selected, ask the relevant scoping questions one per message. Wait for each answer before asking the next.
Start with the universal question, then ask audit-specific questions.
Universal (always ask first):
Are there parts of the codebase you already know are problematic?
Things that keep breaking, areas you dread touching, modules that slow down every PR.
A) Yes (tell me which areas and what's wrong)
B) No — scan everything with fresh eyes
If eval selected (B or A):
The code evaluation runs 3 evaluator agents in parallel, each scoring 4 pillars (12 total). The scores calibrate to the role level you select.
What role level should I evaluate this codebase against?
A) Junior Developer — fundamentals: readability, basic error handling, test presence
B) Mid-Level Developer — patterns: separation of concerns, consistent conventions, test coverage
C) Senior Developer — production: defensive coding, observability, performance awareness, type rigor
D) Staff+ / Principal — systems: architectural coherence, scalability, operational excellence
Any specific concerns the evaluators should weight more heavily?
A) Performance — hot paths, algorithmic complexity, resource management
B) Security — input validation, auth patterns, secrets handling
C) Testing — coverage quality, test architecture, edge cases
D) Architecture — separation of concerns, modularity, coupling
E) Multiple (tell me which)
F) None — balanced evaluation across all pillars
What should the evaluators look at?
A) Full repo, standard exclusions (vendor, generated, node_modules, __pycache__)
B) Full repo, no exclusions
C) Specific directories only (tell me which to include or exclude)
The 12 pillars are:
Any pillars to accept below the default 9/10 threshold?
A) None — require 9/10 on all 12 pillars
B) Specific overrides (tell me which pillars and target scores, e.g., "Creativity: 7, Git Hygiene: accept")
If health selected (C or A):
The health audit scans for technical debt across 4 vectors: architectural, structural, operational, and code hygiene. Findings are prioritized by severity (CRITICAL > HIGH > MEDIUM > LOW). The pipeline remediates until all CRITICAL and HIGH findings are resolved.
What's the primary goal for this audit?
A) General health check — scan all 4 vectors equally
B) Production hardening — emphasize operational debt (error handling, timeouts, resource leaks, observability)
C) Onboarding prep — emphasize structural and hygiene debt (naming, dead code, documentation, test coverage)
D) Pre-release cleanup — focus on CRITICAL/HIGH items only, skip MEDIUM/LOW
What's the deployment target?
A) Serverless (Lambda, Cloud Functions) — cold starts, execution limits, stateless constraints
B) Containers (ECS, Kubernetes, Docker) — resource management, health checks, graceful shutdown
C) Static hosting / SPA — build pipeline, CDN, client-side concerns
D) Monolith / traditional server — process management, connection pooling, memory leaks
E) Multiple (tell me which)
F) Not deployed yet / unsure
What should the health auditor cover, and is anything off-limits?
A) Full repo, no constraints
B) Full repo, but skip specific areas (tell me which — e.g., "don't touch the legacy auth module")
C) Specific directories only (tell me which)
What development tooling is already in place?
A) Full setup — linters, CI pipeline, pre-commit hooks, type checking
B) Partial (tell me what you have — e.g., "ESLint but no CI")
C) None — no linting, CI, or hooks configured
If docs selected (D or A):
The doc audit runs 6 detection phases: discovery, comparison (drift/gaps/stale), code examples, link integrity, config/environment, and structure. It compares documentation claims against actual code behavior.
What documentation should I audit, and is anything off-limits?
A) All docs, no constraints
B) All docs, but skip specific files (tell me which)
C) Specific directories only (tell me which)
D) README and API docs only
What's the primary language stack?
A) JS/TS — typedoc, swagger-jsdoc available
B) Python — sphinx, mkdocstrings available
C) Both
What drift prevention tooling should I add after fixing the docs?
A) Markdown linting (markdownlint) + link checking (lychee) — catches formatting issues and broken links on every PR
B) Auto-generated API docs (typedoc/sphinx) — single source of truth lives in code, not prose
C) Both A and B
D) None — just fix the existing docs, no new tooling
After all questions are answered, generate the directory name: YYYY-MM-DD-audit-slug
audit-ragstack, audit-my-app)docs/plans/YYYY-MM-DD-audit-slug/Create the directory.
Before spawning agents, read all required role prompt files. Only read prompts for selected audits.
skills/pipeline/health-auditor.mdskills/pipeline/eval-hire.md, skills/pipeline/eval-stress.md, skills/pipeline/eval-day2.mdskills/pipeline/doc-auditor.mdAll auditor/evaluator agents are read-only — they explore the codebase but don't modify it. Spawn all selected agents in a single parallel batch (up to 5 agents for "all"):
+-------------------------------------------------------------------+
| PARALLEL AGENT SPAWN |
+-------------------------------------------------------------------+
| |
| health auditor ─┐ |
| eval hire ──────┤ |
| eval stress ────┤ all agents run simultaneously |
| eval day2 ──────┤ |
| doc auditor ────┘ |
| ↓ |
| orchestrator collects all responses, writes intake docs |
| |
+-------------------------------------------------------------------+
Agent 1: Health Auditor (if health selected)
<role_prompt>
[Contents of health-auditor.md]
</role_prompt>
<task>
Audit the codebase in the current working directory.
Goal: [from Step 2]
Scope: [from Step 2]
Existing tooling: [from Step 2]
Constraints: [from Step 2]
</task>
Agent 2: Eval — The Pragmatist (if eval selected)
<role_prompt>
[Contents of eval-hire.md]
</role_prompt>
<task>
Evaluate the codebase in the current working directory.
Role level: [from Step 2]
Focus areas: [from Step 2]
Exclusions: [from Step 2]
</task>
Agent 3: Eval — The Oncall Engineer (if eval selected)
<role_prompt>
[Contents of eval-stress.md]
</role_prompt>
<task>
Evaluate the codebase in the current working directory.
Role level: [from Step 2]
Focus areas: [from Step 2]
Exclusions: [from Step 2]
</task>
Agent 4: Eval — The Team Lead (if eval selected)
<role_prompt>
[Contents of eval-day2.md]
</role_prompt>
<task>
Evaluate the codebase in the current working directory.
Role level: [from Step 2]
Focus areas: [from Step 2]
Exclusions: [from Step 2]
</task>
Agent 5: Doc Auditor (if docs selected)
<role_prompt>
[Contents of doc-auditor.md]
</role_prompt>
<task>
Audit documentation in the current working directory against codebase reality.
Doc scope: [from Step 2]
Constraints: [from Step 2]
</task>
After all agents complete, verify each agent's output contains its completion signal:
AUDIT_COMPLETEEVAL_HIRE_COMPLETEEVAL_STRESS_COMPLETEEVAL_DAY2_COMPLETEDOC_AUDIT_COMPLETEIf any signal is missing, the agent may have been truncated. Report the incomplete agent to the user and do NOT write that intake doc with partial data. Other intake docs with valid signals can still be written.
For agents with valid signals, write the intake docs:
docs/plans/YYYY-MM-DD-audit-slug/health-audit.md with type: repo-health in frontmatterdocs/plans/YYYY-MM-DD-audit-slug/eval.md with type: repo-eval and pillar_overrides in frontmatterdocs/plans/YYYY-MM-DD-audit-slug/doc-audit.md with type: doc-health in frontmatterSee the individual intake skill SKILL.md files (repo-health, repo-eval, doc-health) for the exact output templates.
Append an entry to .claude/skill-runs.json in the repo root. If the file does not exist, create it with an empty array first. Each entry records when a skill was run so that skill usage can be tracked across repos and OS wipes.
{
"skill": "audit",
"date": "YYYY-MM-DD",
"plan": "YYYY-MM-DD-audit-slug",
"audits": ["health", "eval", "docs"]
}
audits: list which audits were selected (subset of health, eval, docs)Audit complete: docs/plans/YYYY-MM-DD-audit-slug/
Intake docs produced:
- [health-audit.md — X critical, Y high, Z medium, W low]
- [eval.md — N/12 pillars at target]
- [doc-audit.md — X drift, Y gaps, Z stale, W broken links]
To remediate, run:
/pipeline YYYY-MM-DD-audit-slug
The pipeline will create one unified plan across all audit types.
/pipeline after all remediation is complete.