From capstone
Unified dual-engine audit: runs Gemini large-context analysis followed by Claude critique, synthesizing both into a scored report with per-finding acceptance criteria. Use when the user says "/audit", "audit the codebase", "audit story-NNN", or "audit <path>". Replaces both old /audit and /gemini-audit. Supports --claude-only and --gemini-only for single-engine backward compat. Supports scoping to files, directories, story diffs, or time ranges. Section filters, scoring, and all existing flags preserved.
npx claudepluginhub kelsi-andrewss/capstone-toolkit --plugin capstoneThis skill uses the workspace's default tool permissions.
User has requested: `/audit $ARGUMENTS`
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Share bugs, ideas, or general feedback.
User has requested: /audit $ARGUMENTS
Continuous execution. Steps 0 through 8 execute as one uninterrupted flow. Do not pause, narrate, summarize, or ask for confirmation between steps. The only legitimate stops are:
Everything else flows. Each step's output feeds the next -- no commentary in between.
Parse $ARGUMENTS into the following categories:
story-NNN, and not a section keyword -- collect as target pathsstory-\d+ -- at most onesecurity, bugs, completeness, quality -- collect all that match. Also set by the flag variants --security, --bugs, --completeness, --quality (equivalent to bare keyword). If none specified, default = all four sections.--claude-only appears--gemini-only appears--opus appears -- forces Opus model for the Claude agent--since if present (commit ref or date like 2d, 1w, 2026-03-01)--summary appears -- print one-paragraph summary to stdout, skip file write--json appears -- write structured JSON instead of markdown--output if present--append appears--ignore occurrence (collect list)--no-completeness appears--requirements if presentValidation:
--claude-only and --gemini-only are set, stop with error: "Cannot use both --claude-only and --gemini-only."Derive engine_mode:
--claude-only --> engine_mode = claude-only--gemini-only --> engine_mode = gemini-onlydualIdentify the project root as the directory containing the nearest .git folder walking up from the current working directory. Store as <project-root>.
Exactly one of the following applies (in priority order):
A. story_id is set:
pm_get_story("<story_id>") to get the detail file, extract branch.branch is null, stop: "story_id not found or has no branch."git -C <project-root> diff dev...<branch> --name-onlyB. flag_since is set:
git -C <project-root> log --since="<flag_since>" --name-only --pretty=format:"" | sort -uC. paths are set:
D. No scope given (full_project_mode):
Apply ignore filters: for each glob in flag_ignore, exclude matching paths from the target list.
In order:
<project-root>/REQUIREMENTS.md -- use if exists.<project-root>/requirements.pdf -- use if exists.If flag_no_completeness is set, override requirements_path to null regardless.
Skip this entire step if engine_mode is
claude-only.
Call ToolSearch with query select:mcp__gemini__audit to load the deferred tool. This step is required -- the tool is not available until loaded.
Invoke mcp__gemini__audit with:
paths = resolved target file list (or null for full project)sections = section_filter list (or null for all)summary_only = falseignore_patterns = flag_ignore list (or null)Parse the returned findings into a structured list. Each finding must have: severity, file, line, section (category), description, evidence. Tag all findings as source: "gemini".
Store as gemini_findings[]. If the tool returns zero findings, set gemini_findings = [] and proceed.
Never run the Gemini pass on files matching .gitignore or .claudeignore patterns.
Skip this entire step if engine_mode is
gemini-only.
Start with the base text from ~/.claude/AUDIT-PROMPT.md (read that file verbatim).
Append the Scope section (always):
## Audit Scope
<one of the following>
- Full project: <project-root>
- Target files/directories: <comma-separated list>
- Story diff (<branch> vs dev): <file list>
- Files changed since <flag_since>: <file list>
Ignored patterns: <flag_ignore list, or "none">
Append Requirements section if requirements_path is not null:
## Requirements Document
Path: <requirements_path>
Read this file as the source of truth for the Completeness section.
Append Section filter if section_filter is set:
## Section Filter
Only produce the following section(s) of the report: <section_filter list>.
Omit all other sections from the output.
If engine_mode is dual, append a Gemini Findings for Review section:
## Gemini Findings for Review
The following findings were produced by Gemini's large-context analysis.
For EACH finding, you must:
1. **Confirm** -- the finding is valid, keep severity as-is
2. **Downgrade** -- the finding is valid but severity is too high, state new severity with one-line reasoning
3. **Reject** -- the finding is not a real issue, state one-line rejection reasoning
Additionally, identify any issues Gemini MISSED that you find in the scoped files.
<list each gemini finding with its ID, severity, file:line, category, description, evidence>
If engine_mode is claude-only, append standard audit instructions -- the Claude agent performs a full independent audit with no Gemini context.
Append output format instructions: instruct the agent to return findings as a structured list with: id, severity, section, file, line, description, evidence, and for each Gemini finding: verdict (confirm/downgrade/reject) and reasoning.
sonnet by default. Use opus if flag_opus is set.Parse the agent's response into:
claude_confirmations[] -- Gemini findings confirmed (with possible severity changes)claude_rejections[] -- Gemini findings rejected (with rejection reasoning)claude_new_findings[] -- issues Claude found independently, tagged source: "claude"Merge findings from both passes into a single list.
Dual mode:
confirmed_findings[], tag source: "both".confirmed_findings[] with new severity, tag source: "both", note "downgraded from X to Y".rejected_findings[] with rejection reasoning.confirmed_findings[], tag source: "claude".Gemini-only mode:
gemini_findings[] go directly to confirmed_findings[], tagged source: "gemini". No rejections.Claude-only mode:
confirmed_findings[], tagged source: "claude". No Gemini findings exist.If both engines independently flag the same file:line with the same category (section), merge into one finding tagged source: "both". Keep the higher severity. Use the more detailed description.
Number all confirmed findings sequentially: F-001, F-002, etc.
For every confirmed finding, write a testable Given/When/Then statement. The acceptance criterion describes the correct behavior, NOT the fix implementation.
Example: "Given a user submits a form with XSS payload in the name field, when the server processes the input, then the payload is sanitized before storage."
If confirmed_findings[] exceeds 50 entries, ask the user:
"Found N findings. Include all in the report, or truncate to top 50 by severity?"
Wait for response before proceeding. This is the only legitimate pause in the flow.
| Severity | Value |
|---|---|
| critical | 4 |
| high | 3 |
| medium | 2 |
| low | 1 |
| Section | Weight |
|---|---|
| security | 4x |
| bugs | 3x |
| completeness | 2x |
| quality | 1x |
For each confirmed finding: deduction = severity_value * section_weight.
score = 100 - sum(all deductions), floored at 0. Perfect score (no findings) = 100.
| Section | Finding count | Weight | Weighted deduction | Raw deduction |
|---|---|---|---|---|
| Security | N | 4x | sum of (severity * 4) for security findings | sum of severity for security findings |
| Bugs | N | 3x | sum of (severity * 3) for bugs findings | sum of severity for bugs findings |
| Completeness | N | 2x | sum of (severity * 2) for completeness findings | sum of severity for completeness findings |
| Quality | N | 1x | sum of (severity * 1) for quality findings | sum of severity for quality findings |
| Total | N | total weighted | total raw |
Final score line: Score: <score>/100
If flag_summary is set:
Print a one-paragraph executive summary to stdout containing: what was audited, engine_mode, finding count by severity, score. No file write.
Print completion summary (Step 8 format) and stop.
If flag_json is set:
Write a JSON file to flag_output path (or <project-root>/AUDIT.json). Schema:
{
"metadata": {
"scope": "...",
"engine_mode": "dual|claude-only|gemini-only",
"sections_run": ["..."],
"timestamp": "ISO8601"
},
"score": {
"total": 0,
"breakdown": [
{
"section": "...",
"finding_count": 0,
"weight": 0,
"weighted_deduction": 0,
"raw_deduction": 0
}
]
},
"findings": [
{
"id": "F-001",
"severity": "...",
"section": "...",
"file": "...",
"line": 0,
"description": "...",
"evidence": "...",
"source": "gemini|claude|both",
"acceptance_criterion": "Given... When... Then..."
}
],
"rejected_findings": [
{
"id": "...",
"severity": "...",
"section": "...",
"file": "...",
"line": 0,
"description": "...",
"source": "gemini",
"rejection_reason": "..."
}
],
"executive_summary": "..."
}
Proceed to Step 7.
Determine output_path: flag_output value, or <project-root>/AUDIT.md.
If flag_append: read existing file first, append new findings without duplicating existing ones.
Write (or append) the report with this structure:
Executive summary -- 1 paragraph: what was audited, engine_mode used, finding count by severity, overall score.
Score breakdown -- the table from Step 5.
Findings by section -- one heading each for Security, Bugs, Completeness, Quality (only sections that were run). Within each section, list findings ordered by severity (critical first). Each finding includes:
**CRITICAL**, **HIGH**, **MEDIUM**, or **LOW**[gemini], [claude], or [both]> Given... When... Then...Appendix: Rejected findings -- list each rejected Gemini finding with its severity, file:line, description, and Claude's rejection reasoning. If no rejected findings, omit this section entirely.
Security constraint: never expose secrets or credentials found during audit in the report. Redact and note existence only.
Never modify any project files during an audit. The audit reports findings -- it does not apply fixes.
Proceed to Step 7.
Call pm_list_stories(), filter to stories where state is not done, shipped, or archived. Build:
open_story_map = { story_id: { title, writeFiles[] } }
For each confirmed finding referencing a file path, check if any open story's writeFiles overlaps. If so, annotate the finding: Related open story: story-NNN -- <title>.
If annotations were added to the markdown report, write the updated report back to output_path.
Scan confirmed findings with severity critical or high that have no related open story.
If any exist, list them and ask:
"The following High/Critical priority findings are not covered by any open story. Create a /todo story for each? (yes / no / list the ones you want)"
If user approves, invoke /todo for each selected finding.
Emit telemetry event:
bash scripts/emit-event.sh audit '{"engine_mode":"<engine_mode>","finding_count":<N>,"score":<score>}'
Audit complete.
Engine: <dual|claude-only|gemini-only>
Report: <output_path>
Score: <score>/100
Findings: <N> Critical, <N> High, <N> Medium, <N> Low
Rejected: <N> (see Appendix)
<if stories created>: Stories created: story-NNN, ...