Run comprehensive triage-first verification pipeline with specialized agents. Detects scope, discovers toolchain, triages files to relevant agents, runs static analysis, launches review agents in parallel, exercises the app, and produces a unified report. Supports interactive, report-only, and auto-fix modes.
From cata-helpersnpx claudepluginhub niekcandaele/claude-helpers --plugin cata-helpersThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Run comprehensive verification before considering changes complete. This skill detects what changed, triages files to relevant agents, runs static analysis, launches review agents in parallel, exercises the app end-to-end, and produces a unified report with actionable findings.
Parse $ARGUMENTS for:
Mode (--mode=):
interactive (default): Full pipeline → report → interactive triage → plan → fixreport-only: Full pipeline → report → STOPauto-fix: Full pipeline → report → auto-accept severity >= threshold → plan → fix → STOPScope Control:
--scope=staged: Verify only staged changes--scope=unstaged: Verify only unstaged modified files--scope=branch: Verify all changes in current branch vs base--scope=all: Verify entire codebase (comprehensive audit — skips triage, runs all agents on everything)--files="file1,file2": Verify specific files only--module=path: Verify specific module/directoryOther Options:
--skip-ux: Skip UX review for pure backend changes--auto-fix-threshold=N: Minimum severity for auto-fix mode (default: 3)Determine what files/changes to verify.
1. Parse User-Specified Scope (if provided):
$ARGUMENTS for --scope=, --files=, or --module= flags2. Auto-Detect Scope (default behavior):
Priority order:
Git Commands for Scope Detection:
# Check for staged changes
git diff --cached --name-only
# Check for unstaged changes
git diff --name-only
# Check for branch changes
BASE=$(git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/null)
git diff --name-only $BASE HEAD
# Get line ranges for changed files
git diff --cached -U0 -- <file> # staged
git diff -U0 -- <file> # unstaged
git diff -U0 $BASE HEAD -- <file> # branch
3. Build Scope Context:
Create a list of files in scope with status:
file.ts (modified, lines 45-67, 89-102)new-file.ts (added, entire file)old-file.ts (deleted)Store as SCOPE_CONTEXT for passing to agents.
Also build a machine-usable SCOPE_METADATA block for agents that need exact diff reconstruction:
scope_mode: staged, unstaged, branch, files, module, all, or the resolved auto-detected modebase_ref: exact baseline ref/commit used for the scoped diffcompare_ref: exact comparison target (HEAD, INDEX, WORKTREE, or explicit ref)path_filter: exact scoped paths, or ALL_SCOPED_FILESdiff_command: exact git diff command used to define the scopemerge_base: exact merge-base hash for branch scope, otherwise emptySCOPE_METADATA is the source of truth for any agent that needs to reconstruct the selected diff.
4. Format Scope for Agents:
VERIFICATION SCOPE:
Files in scope:
- src/auth/login.ts (modified, lines 45-67, 89-102)
- src/auth/middleware.ts (modified, lines 12-34)
- tests/auth/login.test.ts (added, entire file)
CRITICAL SCOPE CONSTRAINTS:
- ONLY flag issues in code that was ADDED or MODIFIED in these files/lines
- DO NOT flag issues in surrounding context or old code unless it blocks the new changes
- DO NOT flag issues in other files not listed above
- Focus exclusively on the quality of the NEW or CHANGED code
Exception: You MAY flag issues in old code IF:
1. The new changes directly interact with or depend on that old code
2. The old code issue is causing the new code to be incorrect
3. The old code issue creates a blocker for the new functionality
Git commands to see your scoped changes:
git diff HEAD -- <scoped-files>
git diff --cached -- <scoped-files>
Example machine-readable metadata:
SCOPE_METADATA:
- scope_mode: unstaged
- base_ref: INDEX
- compare_ref: WORKTREE
- path_filter: src/auth/login.ts,src/auth/middleware.ts
- diff_command: git diff -- src/auth/login.ts src/auth/middleware.ts
- merge_base:
Check if the project has a pre-configured engineer skill:
ls .claude/skills/*-engineer/SKILL.md 2>/dev/null
If found:
ENGINEER_CONTEXT — this is pre-verified knowledge from /setup-engineerVERIFICATION.md in the same skill directory — if it exists, read it and store as CUSTOM_GATES
If not found:
ENGINEER_CONTEXT is emptyCUSTOM_GATES is emptyLaunch ONE Explore agent (fast, read-only) with two jobs in a single prompt:
If engineer skill exists: Validate that the provided commands still work and discover anything missing:
The engineer skill provides these commands:
- Test: {test_command}
- Build: {build_command}
- Lint: {lint_commands}
Verify each command exists (which/type check). For any that fail, discover alternatives.
Discover any additional linters/type-checkers not covered by the engineer skill.
If no engineer skill: Full discovery:
Discover the project's toolchain:
- Test command (npm test, pytest, cargo test, go test, etc.)
- Build command (npm run build, cargo build, go build, etc.)
- Linter commands (eslint, pylint, clippy, golangci-lint, etc.)
- Type-checker commands (tsc --noEmit, mypy, etc.)
Check package.json scripts, Makefile targets, CI config, pyproject.toml, Cargo.toml, go.mod.
Output concrete commands that can be executed.
Read each changed file (not just the extension — look at actual content).
For each file, assign it to the agents whose review is most relevant:
Categories:
- REVIEWER: Business logic, application code, architecture, patterns, security, robustness → cata-reviewer
- GENERAL_SECOND_OPINION: Entire scoped diff → cata-codex-reviewer
- QA: Test files or files that need test coverage assessment → cata-qa
- UX: UI components, CLI output, user-facing strings → cata-ux-reviewer
- TEST_EXECUTION: Any code change that could affect tests → cata-tester
A file can have multiple categories.
IMPORTANT: ALL agents always run — triage assigns files to focus each agent,
but never skips agents. Agents with no specifically assigned files review
the full scope (they may catch cross-cutting concerns).
Always assign the full scoped file list to `cata-codex-reviewer`. It is a general second-opinion pass, not a specialist router target.
Output a JSON-like mapping:
{
"agent_assignments": {
"cata-reviewer": ["all scoped files"],
"cata-codex-reviewer": ["all scoped files"],
"cata-qa": ["test files and files needing test coverage"],
"cata-ux-reviewer": ["UI/CLI/user-facing files"],
"cata-tester": ["all"],
},
"toolchain": {
"test": "npm test",
"build": "npm run build",
"lint": ["npx eslint", "npx tsc --noEmit"]
}
}
If --skip-ux: The UX reviewer agent is skipped (explicit user override only).
Output: Store the agent assignments as TRIAGE_RESULT and toolchain commands as TOOLCHAIN.
Launch cata-static agent (haiku model) with:
SCOPED FILES:
{scope file list}
COMMANDS TO RUN:
{each linter/type-checker command from TOOLCHAIN}
Run each command on the scoped files. Parse output into structured findings.
Report only findings in scoped files.
Wait for results. Store as STATIC_SUMMARY.
Assemble a compact context bundle (~50-100 lines) for review agents:
CONTEXT_BUNDLE:
VERIFICATION SCOPE:
{SCOPE_CONTEXT from Phase 1}
SCOPE METADATA:
{SCOPE_METADATA from Phase 1}
ENGINEER SKILL SUMMARY:
{Brief summary from ENGINEER_CONTEXT, or "No engineer skill found — toolchain discovered via exploration"}
{If engineer skill exists: "Reference files available at .claude/skills/{name}/ — read TESTING.md, ARCHITECTURE.md etc. for your domain"}
STATIC ANALYSIS SUMMARY:
{STATIC_SUMMARY from Phase 4 — just the findings table, not raw output}
TOOLCHAIN:
- Test: {command}
- Build: {command}
- Lint: {commands}
CUSTOM REVIEW GATES:
{Review Gates from VERIFICATION.md, or "None defined"}
These are repo-maintainer-defined requirements. If any rule falls in your review domain,
report PASS/FAIL for it. Failed gates should be reported as severity 9 findings.
DIFF STAT:
{output of: git diff --stat [scope args]}
Keep this compact. Agents read files themselves — the bundle just tells them where to look and what's already known.
Launch ALL review agents in parallel using the Agent tool. Every agent runs every time — no agents are skipped based on triage (only explicit --skip-ux flag can exclude an agent).
Model routing is handled by agent frontmatter:
Each agent prompt includes:
CONTEXT_BUNDLE from Phase 5TRIAGE_RESULT.agent_assignmentsFor cata-codex-reviewer, SCOPE_METADATA is authoritative. It must not infer scope mode from filenames or prose when exact metadata is available.
For each agent, the prompt follows this structure:
{CONTEXT_BUNDLE}
YOUR ASSIGNED FILES:
{files from TRIAGE_RESULT.agent_assignments for this agent}
RELEVANT STATIC FINDINGS:
{filtered findings from STATIC_SUMMARY relevant to this agent's domain}
{If engineer skill exists:}
ENGINEER SKILL REFERENCE:
Reference files are available at .claude/skills/{engineer-skill-name}/
Read files relevant to your domain (e.g., TESTING.md for cata-tester, architecture docs for cata-reviewer).
{Agent-specific instructions...}
OUTPUT FORMAT: For each issue found, provide:
- Title (short description)
- Severity (1-10, where 1=trivial, 10=critical)
- Location (file:line)
- Description (what the issue is and why it matters)
Agent-specific instruction blocks:
cata-reviewer:
Comprehensive review across all five dimensions:
1. Design & Code Quality: design adherence, over-engineering, AI slop, test integrity, structural completeness
2. Architecture: module boundaries, dependency direction, god objects, abstraction opportunities, coupling
3. Coherence: reinvented wheels, pattern violations, convention mismatches, documentation drift, dead code
4. Hardening: invalid inputs, error paths, inconsistent validation, orphaned references, state transitions
5. Security: injection, auth/authz, multi-tenant isolation, data exposure, crypto
Research the project's structure, patterns, and security approach BEFORE evaluating changes.
Focus on: 'Is this change well-designed, structurally sound, pattern-consistent, robust, and secure?'
cata-codex-reviewer:
Run the local Codex CLI as an independent second-opinion reviewer.
Use `codex review` via Bash, not Claude's native analysis alone.
Use `SCOPE_METADATA` as the source of truth for scope reconstruction.
Adapt the verify scope into a temporary diff-only workspace under /tmp so Codex reviews only the intended changes.
Do NOT infer staged vs unstaged vs branch vs path-filtered scope from assigned files or prose if `SCOPE_METADATA` says otherwise.
If exact reconstruction from `SCOPE_METADATA` is not possible, report PATCH_CONSTRUCTION_FAILED instead of reviewing an approximate diff.
If Codex is unavailable (missing CLI, auth missing, network blocked, sandbox blocked), report BLOCKED status with a short factual reason.
If scope is `--scope=all`, report SKIPPED_UNSUPPORTED_SCOPE rather than attempting a whole-codebase audit.
Normalize Codex output into: title, severity, location, description.
cata-tester:
Run the full test suite using: {test command from TOOLCHAIN}
Report exact pass/fail counts.
If tests cannot run, report what prevented execution.
For EACH failure: title, severity, location, error message, and whether it's IN-SCOPE or OUT-OF-SCOPE.
cata-ux-reviewer:
ONLY test user-facing changes in the scoped files.
Do not audit the entire UI/CLI for issues.
Focus on the UX of what changed in this scope.
Test any UI, CLI output, error messages, or API responses that were modified.
cata-qa:
Evaluate whether the scoped changes are adequately tested.
Assess test quality, mock usage, and test type appropriateness.
Adapt expectations to codebase testing maturity.
Focus on: 'Are these changes well-tested with good tests?'
Collect structured findings from each agent. Extract ONLY:
Discard investigation narratives. Keep the orchestrator context lean.
For cata-codex-reviewer, also collect agent status if no findings were produced:
COMPLETEDBLOCKEDSKIPPED_UNSUPPORTED_SCOPECodex BLOCKED handling: If cata-codex-reviewer reports BLOCKED, this is a significant event — the independent second-model review did not run. Flag it prominently in the report:
report-only mode: Include BLOCKED status with high visibility in the Agent Results Summary and add a prominent warning after the summary table.interactive mode: Use AskUserQuestion to ask: "Codex review was BLOCKED ({reason}). Continue without Codex review, or stop to resolve?"SKIPPED_UNSUPPORTED_SCOPE is expected for --scope=all and is not flagged as a warning.If cata-tester OR cata-ux-reviewer OR cata-exerciser reported failures (severity 7+):
Launch cata-debugger with:
VERIFICATION SCOPE CONTEXT:
{SCOPE_CONTEXT}
FAILURES TO INVESTIGATE:
{list of failures from tester/ux/exerciser}
Analyze the root cause of these failures.
Focus on failures caused by the scoped changes.
If failures are unrelated to scope, note that explicitly.
Launch cata-exerciser (sonnet) with:
{CONTEXT_BUNDLE}
Exercise the changes end-to-end:
1. Read the engineer skill (.claude/skills/*-engineer/) if it exists — follow its instructions for starting the environment, authenticating, and interacting with services
2. Start the full local environment (app + all backing services)
3. Determine exercise strategy based on change type:
- Frontend/UI changes → use Playwright to navigate and interact
- API/backend changes → make actual API calls via curl, verify responses and data state
- Data/search/indexing changes → trigger operations, query services via CLI tools to verify data was written/indexed
- Job/worker changes → trigger jobs, verify side effects via database/service queries
- Mixed → exercise through all affected interfaces
4. Verify data flows end-to-end — don't stop at "endpoint returns 200", follow data through the system
5. Report whether the specific changes actually work with real data
If you hit a barrier (can't start, need credentials, unclear what to test, no engineer skill for complex backend):
- Return BLOCKED status with specific reason
- If you cannot determine HOW to exercise the change, that is severity 9-10
ISSUES FOUND BY REVIEW AGENTS:
{List of all issues found in Phase 6 with VI-IDs, severity, title, location}
While exercising, attempt to trigger each reported issue and report verification status
(CONFIRMED / NOT REPRODUCED / NOT APPLICABLE / BLOCKED).
{If CUSTOM_GATES has exerciser gates:}
CUSTOM EXERCISER GATES:
{List of exerciser gates from VERIFICATION.md}
These are mandatory repo-maintainer-defined checks. After exercising the feature, you MUST
check each gate and report PASS/FAIL with evidence. Any failing gate means your overall
status cannot be PASSED — use FAILED instead.
Handle exerciser barriers:
LOGIN_REQUIRED, UNCLEAR_FEATURE, NO_EXERCISE_STRATEGY, NO_ENGINEER_SKILL, or SERVICE_UNAVAILABLE (interactive/auto-fix modes only):
AskUserQuestion to get help from the userreport-only mode: Record BLOCKED status in report without asking user# Verification Report
## Scope
**Mode:** [staged / unstaged / branch / all / files / module]
**Files Verified:**
- src/auth/login.ts (modified, lines 45-67, 89-102)
- src/auth/middleware.ts (modified, lines 12-34)
- tests/auth/login.test.ts (added, entire file)
**Files Excluded:** All other files in codebase (not in scope for this verification)
---
## Triage Summary
**Agents run:** cata-reviewer, cata-codex-reviewer, cata-tester, cata-qa, cata-ux-reviewer, cata-exerciser
**Agents skipped:** [none, or list if --skip-ux was used]
**Static analysis:** ESLint (3 findings), tsc (1 finding)
---
## Agent Results Summary
| Agent | Status | Notes |
|-------|--------|-------|
| cata-static | Completed | 4 findings (3 warnings, 1 error) |
| cata-tester | X passed, Y failed | [brief note] |
| cata-reviewer | Completed | Found N items (design, arch, coherence, hardening, security) |
| cata-codex-reviewer | Completed / **BLOCKED** / Skipped | Found N items / [reason] |
| cata-qa | Completed | Found N items |
| cata-ux-reviewer | Completed / Skipped | Found N items / [reason] |
| cata-exerciser | PASSED / FAILED / BLOCKED | [reason if blocked] |
| cata-debugger | Ran / N/A | [if applicable] |
---
## Issues Found
[Deduplicated issues from all agents, sorted by severity descending]
| ID | Sev | Title | Sources | Location | Description |
|----|-----|-------|---------|----------|-------------|
| VI-1 | 9 | [Short title] | tester, reviewer | file:line | [Combined description] |
| VI-2 | 7 | [Short title] | security | file:line | [Description] |
*Severity: 9-10 Critical | 7-8 High | 5-6 Moderate | 3-4 Low | 1-2 Trivial*
*Sources column shows which agents flagged the issue. Multiple sources = higher confidence.*
**Total: N issues from M agent findings (deduplicated)**
---
## Exerciser Verification
| Issue ID | Title | Exerciser Status | Notes |
|----------|-------|-----------------|-------|
| VI-1 | [title] | CONFIRMED | [observation] |
| VI-2 | [title] | NOT REPRODUCED | [what was tried] |
| VI-3 | [title] | NOT APPLICABLE | [reason] |
---
## Custom Verification Gates
{If no VERIFICATION.md exists or no custom gates defined: omit this section entirely}
### Exerciser Gates
| # | Rule | Status | Evidence |
|---|------|--------|----------|
| 1 | [rule from VERIFICATION.md] | PASS / FAIL / BLOCKED | [from exerciser report] |
### Review Gates
| # | Rule | Status | Checked By | Evidence |
|---|------|--------|------------|----------|
| 1 | [rule from VERIFICATION.md] | PASS / FAIL / NOT CHECKED | [agent name] | [from agent findings] |
**Custom Gates: X/Y passed, Z blocked**
report-onlyOutput the report and return control to the caller. Do not triage, plan, or fix anything.
AskUserQuestioninteractive (default)After presenting the report, run interactive triage:
If zero issues found: Output the report and return. No triage needed.
Interactive Triage Process:
AskUserQuestionAskUserQuestion Format (batch of up to 4):
AskUserQuestion:
questions:
- header: "VI-1"
question: "{Title} — {Description}. Found at {file:line} by: {sources} (severity {N})"
multiSelect: false
options:
- label: "{Fix option 1}"
description: "{Specific action with file:line reference}"
- label: "{Fix option 2}"
description: "{Alternative action with file:line reference}"
- label: "Explain"
description: "Get the full picture before deciding"
- label: "Skip"
description: "Accept this issue — will not fix in this change set"
Handling "Explain": Read surrounding code, re-present with richer context (alone, not batched). Keep all fix options. If Explain again, dig deeper.
CRITICAL: Present EVERY issue. Never skip issues. Only stop early if the user explicitly says "stop", "done", or "skip the rest".
EnterPlanModeExitPlanMode for approvalauto-fixAfter presenting the report:
--auto-fix-threshold (default: 3)AskUserQuestionThe report must be brutally honest:
| Range | Impact | Examples |
|---|---|---|
| 9-10 | Critical | Data loss, security vulnerability, cannot function |
| 7-8 | High | Major functionality broken, significant problems |
| 5-6 | Moderate | Clear issues, workarounds exist |
| 3-4 | Low | Minor issues, slight inconvenience |
| 1-2 | Trivial | Polish, cosmetic, optional improvements |
Severity reflects "how big is this issue?" — NOT "must you fix it?" The human decides what to act on.
cata-ux-reviewer runs by default like all other agents. It can only be skipped via the explicit --skip-ux flag. Use --skip-ux when ALL are true:
When in doubt, don't use --skip-ux — let it run.
This is critical since verify runs in the main context window.
.claude/skills/*-engineer/ exists)