Formal bug fix pipeline — root cause analysis, scope lock, behavior contract, differential testing, regression proof. Usage: /fix <description|#issue> <--severity critical|high|medium> <--hotfix>
From primenpx claudepluginhub arthtech-ai/arthai-marketplace --plugin prime/fixIteratively repairs code errors until zero remain via autonomous loop, applying one atomic fix per iteration with auto-revert on failure. Supports --target, --scope, --category, --iterations flags.
/fixDetects and fixes bugs by analyzing error messages, stack traces, logs, or file paths. Proposes solutions, implements minimal fixes, validates, and suggests tests.
/fixDiagnoses bugs via reproduction, implements TDD fixes, and validates with QA/security gates using iterative retries up to 5 attempts per phase.
A rigorous bug fix workflow grounded in formal methods. Every step is verifiable. Every guardrail is enforceable. No guessing.
The correctness pipeline:
| Step | Formal Method | What It Proves |
|---|---|---|
| 1. Root Cause Swarm | Competing Hypotheses + Adversarial Debate | Multiple strategies converge on the actual cause |
| 2. Scope Lock | Graph Reachability | Fix only touches what it needs to |
| 3. Behavior Contract | Hoare Logic | We know what should and shouldn't change |
| 4. Implement Fix | Scoped Agent | Changes stay within the Fix Zone |
| 4b. Review Swarm | Parallel Adversarial Review | Independent reviewer + QA attacker agree the fix is correct |
| 5. Differential Test | Bisimulation + Mutation | Fix changes ONLY bug behavior, nothing else |
| Input | Action |
|---|---|
/fix POST /auth/refresh returns 500 | Free-text bug description |
/fix #123 | Load bug from GitHub issue |
/fix #123 --severity critical | With explicit severity |
/fix #123 --hotfix | Critical: expedited path (skip non-essential steps) |
/fix --severity high token refresh fails silently | Severity + description |
If no severity provided, assess automatically in Step 1.
If #N provided — load the GitHub issue:
gh issue view N --json title,body,labels,assignees,comments
Extract: description, reproduction steps, affected area, any labels (bug, critical, etc.)
If free-text — use as the bug description directly.
Read project context (Tier 0→2: read small files fully, grep directories before reading):
CLAUDE.md for tech stack, test commands, architecture (Tier 0).claude/project-profile.md if it exists (Tier 0).claude/knowledge/shared/domain.md if it exists (Tier 1).claude/qa-knowledge/incidents/ for affected keywords (Tier 2 — selective).claude/qa-knowledge/bug-patterns.md for affected keywords (Tier 2 — selective).claude/wikis/*/wiki/index.md for relevant domain context (Tier 1→2)Check for similar past bugs:
# Search incidents by affected area (Tier 2 — grep-then-read)
grep -rl "{affected_keywords}" .claude/qa-knowledge/incidents/ 2>/dev/null
# Search bug patterns
grep -i "{affected_keywords}" .claude/qa-knowledge/bug-patterns.md 2>/dev/null
# Search topic wikis for domain context on affected area
for idx in .claude/wikis/*/wiki/index.md; do
grep -li "{affected_keywords}" "$idx" 2>/dev/null
done
If similar past bugs found, show them:
⚠ Similar past incidents found:
• 2026-03-15-token-expiry.md (severity: high, status: covered)
Root cause: token expiry check used wrong timezone
• 2026-02-28-auth-500.md (severity: critical, status: covered)
Root cause: missing null check on session object
Goal: Find the actual cause, not a symptom. This is NOT optional — no fix without understanding.
1.1: Reproduce the bug
From the description/issue, determine how to reproduce:
# If it's an API bug — try to hit the endpoint
curl -s -X POST http://localhost:8000/auth/refresh -H "Content-Type: application/json" \
-d '{"refresh_token": "test"}' | head -20
# If it's a test failure — run the specific test
pytest tests/test_auth.py::test_refresh -v
# If it's a UI bug — check the relevant component
If reproduction fails, ask the user for reproduction steps.
1.2: Locate the symptom
Find where the error occurs:
# Search for the error message, status code, or exception
grep -rn "500\|Internal Server Error\|{error_message}" --include="*.py" --include="*.ts"
# Find the route/handler for the affected endpoint
grep -rn "{endpoint_pattern}" --include="*.py" --include="*.ts"
1.2b: Use explore-light for fast codebase search (cost: 1x Haiku)
Before manually grepping, spawn explore-light to quickly locate relevant code:
subagent_type: "explore-light"
prompt: "Find all code related to {symptom description}. I need:
1. The route/handler for {endpoint or feature}
2. All functions that call or are called by the handler
3. Related test files
4. Any error handling or validation in this code path
Return file paths and line numbers."
This is 60x cheaper than doing the search yourself (Haiku vs Opus). Use the results to focus your manual reading in Step 1.3 — don't read every file, read only what explore-light identified.
1.3: Investigation Swarm — depth selection
Before spawning investigators, auto-detect the appropriate investigation depth:
| Signal | Level | Rationale |
|---|---|---|
--hotfix flag | Skip (single backward trace) | Production fire |
| Traceback points to single line | Lite (1 round, 2 agents) | Obvious bug |
| Multiple possible causes / unclear reproduction | Full (3 rounds, 3 agents) | Need competing hypotheses |
| User says "this is simple" | Lite or skip | Trust user |
| Touches auth/payments/data | Full (always) | High-risk area |
If auto-detection is uncertain, ask:
Bug complexity estimate: {simple | moderate | complex}
Investigation mode:
[1] Quick — single analyst, fastest (simple bugs)
[2] Verified — analyst + challenger (moderate bugs)
[3] Swarm — 3 competing investigators with debate (complex/unclear bugs)
[auto] Let me decide based on evidence
Your pick? [auto]
Team: fix-investigate-{slug}
Spawn these 3 agents in parallel:
| Agent | subagent_type | model | Strategy |
|---|---|---|---|
| backward-tracer | code-reviewer | sonnet | Traces backward from symptom through call chain |
| forward-tracer | code-reviewer | sonnet | Checks git log for recent changes in affected area, traces forward |
| pattern-matcher | explore-light | haiku | Searches knowledge base (incidents/, bug-patterns.md) for similar patterns |
Each agent receives: bug description, symptom location (file:line), reproduction steps, and access to CLAUDE.md / project-profile.md. They do NOT share conclusions with each other until Round 2.
Early termination conditions (check before proceeding to Round 2):
Round 1 — HYPOTHESIZE
Each agent submits independently to the orchestrator:
**Hypothesis**: {description of root cause}
**Location**: {file}:{line}
**Evidence**: {list of file:line citations supporting this hypothesis}
**Confidence**: high / medium / low
**Reasoning**: {how you traced from symptom to this cause}
Round 2 — CROSS-CHALLENGE (only if no early termination)
Orchestrator broadcasts all 3 hypotheses to all 3 agents. Each agent attempts to disprove the other two:
## Challenge to {agent}'s hypothesis
**Counter-evidence**: [file:line] — {why it contradicts}
**Verdict**: DISPROVED | WEAKENED | CONSISTENT
Round 3 — FINAL POSITION (only if no convergence after Round 2)
Each agent submits a final hypothesis, incorporating any valid challenges received.
Convergence scoring rubric:
| Signal | Score |
|---|---|
| Evidence citation: verified file:line | +2 per citation |
| Evidence citation: refuted by challenge | -1 per refuted |
| Corroborated by another agent | +3 |
| Contradicted with counter-evidence | -2 |
| Pattern match from KB (incidents/, bug-patterns.md) | +2 |
| Survived cross-challenge intact | +3 |
| Hypothesis disproved by challenge | -3 |
Convergence conditions:
| Outcome | When | Action |
|---|---|---|
| CONVERGE immediately | All 3 agree on same file:line (Round 1) | Proceed to Step 1.4 |
| CONVERGE | 2 agree, 3rd disproved (Round 2) | Adopt majority hypothesis |
| MERGE | 2 agree, 3rd compatible (Round 2) | Merge hypotheses — combined view is more complete |
| CONVERGE | Highest score leads by 3+ points (Round 3) | Adopt leading hypothesis |
| ESCALATE | Scores within 2 points (Round 3) | Present all 3 to user — do not guess |
Escalation prompt (if no convergence):
⚠ Investigation swarm could not converge — 3 competing hypotheses remain after debate:
backward-tracer (score: {N}):
Root cause: {hypothesis}
Location: {file}:{line}
Key evidence: {citations}
forward-tracer (score: {N}):
Root cause: {hypothesis}
Location: {file}:{line}
Key evidence: {citations}
pattern-matcher (score: {N}):
Root cause: {hypothesis}
Location: {file}:{line}
Key evidence: {citations}
Which should we pursue?
[1] backward-tracer's diagnosis
[2] forward-tracer's diagnosis
[3] pattern-matcher's diagnosis
[4] Investigate all — there may be multiple bugs
1.4: Identify root cause
The root cause is the earliest point in the dependency chain where behavior diverges from the specification. It's where "this code assumes X but X is not always true."
Document:
**Root Cause**: {precise description}
**Location**: {file}:{line}
**Why it breaks**: {what assumption is violated}
**When it breaks**: {specific conditions that trigger the bug}
**When it works**: {conditions under which the bug doesn't manifest}
**Swarm verdict**: {CONVERGE | MERGE | user-resolved} — {which agents agreed}
1.5: Assess severity (if not provided)
| Signal | Severity |
|---|---|
| Production users affected NOW | critical |
| Data corruption or security | critical |
| Feature broken but workaround exists | high |
| Edge case, rare trigger | medium |
| Cosmetic, no functional impact | low |
Goal: Determine exactly which files the fix may touch and which are OFF LIMITS.
2.1: Build dependency graph from root cause
Starting from the root cause location, trace outward:
Hop 0 (ROOT CAUSE): The file/function with the bug
→ Read the file, identify the broken function
Hop 1 (DIRECT DEPENDENCIES): Files the root cause imports from or is called by
→ grep for imports in the root cause file
→ grep for who imports/calls the root cause function
Hop 2 (INDIRECT): Files that depend on hop-1 files
→ Same grep pattern, one level out
Hop 3+: Everything else
Execute with actual code analysis:
# Hop 0: root cause file
ROOT_FILE="backend/app/services/auth.py"
# Hop 1: what does root cause import?
grep -E "^from|^import" "$ROOT_FILE" | grep -v "^from typing\|^import os\|^import json"
# Hop 1: who imports root cause?
grep -rl "from.*auth import\|import.*auth" backend/ --include="*.py"
# Hop 2: extend one more level from hop 1 results
# (repeat the above for each hop 1 file)
2.2: Classify zones
Write the scope lock document — this is the guardrail for the entire fix:
## Scope Lock — {bug description}
### Fix Zone (ALLOWED — may modify)
{Files at hop 0-1 that need changes to fix the bug}
- {file}:{lines} — {why this needs to change}
- {file}:{lines} — {why this needs to change}
- {test_file} — regression test goes here
### Watch Zone (CAUTION — may be affected)
{Files at hop 2 that might be affected by the fix}
- {file} — {how it relates, what to watch for}
### Frozen Zone (BLOCKED — must NOT modify)
{Everything else — especially:}
- All files outside the dependency chain
- Migration files (unless root cause is in schema)
- Config files (unless root cause is in config)
- Unrelated services/modules
- Frontend (if backend bug) or vice versa
2.3: Set up enforcement
Save the scope lock to a file that the implementation step reads:
# Write scope lock for the implementation agents
cat > .claude/.fix-scope-lock.json << 'EOF'
{
"bug": "{description}",
"severity": "{severity}",
"root_cause": "{file}:{line}",
"fix_zone": ["{file1}", "{file2}", "{test_file}"],
"watch_zone": ["{file3}", "{file4}"],
"frozen_zone_patterns": ["migrations/*", "frontend/*", "*.lock"],
"created_at": "{timestamp}"
}
EOF
The implementation agent reads this and is BLOCKED from editing files outside fix_zone.
The pre-edit-guard hook can enforce this if .claude/.fix-scope-lock.json exists.
Goal: Define exactly what the fix should change and what must stay the same.
3.1: Extract existing contracts
For each function in the Fix Zone, extract its behavioral contract:
From type signatures:
# Reading: def refresh_token(token: str) -> TokenPair
# Contract: str → TokenPair (not str → Exception)
From existing passing tests:
# Run tests that touch Fix Zone files BEFORE the fix
# Each passing test = a behavioral contract that must be preserved
pytest tests/test_auth.py -v --tb=no 2>&1
# Output:
# test_login_valid_creds PASSED → contract: valid creds → token
# test_login_invalid_creds PASSED → contract: invalid creds → 401
# test_refresh_valid_token FAILED → THIS IS THE BUG
# test_refresh_expired_token PASSED → contract: expired → 401
From API schemas/validators:
# Check route definitions for response models
grep -A10 "@router.post.*refresh" backend/app/api/auth.py
# Check Pydantic models, Zod schemas, etc.
From database constraints:
# Check model definitions for invariants
grep -A20 "class.*Token\|class.*Session" backend/app/models/*.py
3.2: Build the behavior contract table
## Behavior Contract — {bug description}
### Behaviors That MUST CHANGE (fix verification)
| Input | Current (broken) | Expected (after fix) | Test |
|-------|-----------------|---------------------|------|
| refresh("valid_token") | 500 Internal Server Error | 200 + new TokenPair | WRITE NEW |
| refresh("tampered_token") | 500 Internal Server Error | 401 Unauthorized | WRITE NEW |
### Behaviors That MUST NOT CHANGE (regression protection)
| Input | Current (correct) | After Fix (same) | Test |
|-------|------------------|------------------|------|
| login("user", "pass") | 200 + TokenPair | 200 + TokenPair | EXISTING: test_login_valid |
| login("user", "wrong") | 401 | 401 | EXISTING: test_login_invalid |
| refresh("expired_token") | 401 | 401 | EXISTING: test_refresh_expired |
| GET /me with valid token | 200 + User | 200 + User | EXISTING: test_get_me |
### Invariants (always true, before and after)
- Tokens table: every token has a user_id (FK constraint)
- Sessions table: expired sessions are never returned by get_active_session()
- Auth: no endpoint returns 200 without valid authentication
3.3: Capture baseline test results
# Run ALL tests and save results — this is the "before" snapshot
# for differential testing in Step 5
#
# Use the project's ACTUAL test commands from CLAUDE.md Test Commands table
# or project-profile.md testing conventions. Examples for common stacks:
#
# Python: pytest --tb=line -q
# Node: npm test -- --watchAll=false
# Go: go test ./... -count=1
# Rust: cargo test
# Ruby: bundle exec rspec --format progress
# Java: ./gradlew test
# Swift: swift test
# Flutter: flutter test
#
cd {project_root}
# Backend (use detected test command)
{backend_test_command} 2>&1 | tee /tmp/fix-baseline-backend.txt
# Frontend (if applicable, use detected test command)
{frontend_test_command} 2>&1 | tee /tmp/fix-baseline-frontend.txt
# Save pass/fail counts
echo "Baseline captured at $(date)" > .claude/.fix-baseline-summary.txt
grep -cE "passed|PASSED|ok|OK" /tmp/fix-baseline-backend.txt >> .claude/.fix-baseline-summary.txt || true
grep -cE "failed|FAILED|FAIL|ERROR" /tmp/fix-baseline-backend.txt >> .claude/.fix-baseline-summary.txt || true
Save to .claude/.fix-behavior-contract.md for the implementation and QA steps.
Goal: Apply the minimal fix within the scope lock, write regression tests.
4.1: Determine implementation approach
Based on severity:
| Severity | Approach |
|---|---|
| critical / --hotfix | Fix directly (no /planning, no team). Minimal change. |
| high | Fix directly with code-reviewer validation. |
| medium | Consider if /planning is needed for architectural changes. |
| low | Standard flow — may batch with other work. |
4.2: Spawn implementation agent
Select the right agent based on which layer the bug is in:
| Bug is in... | Spawn | subagent_type | Why |
|---|---|---|---|
| Backend (any language) | backend agent | python-backend | Adaptive — detects Python/Node/Go/Rust/Ruby/Java/Elixir at runtime |
| Frontend (any framework) | frontend agent | frontend | Adaptive — detects React/Vue/Svelte/Angular/etc. at runtime |
| Both layers | backend + frontend | Both agents | Scope lock keeps each in their zone |
| iOS/Android/Flutter/ML | custom agent (if /calibrate generated one) | Check .claude/agents/ for project-specific agents | /calibrate creates these for non-web platforms |
| Unknown | general-purpose | general-purpose | Fallback — reads project-profile.md for context |
If .claude/project-profile.md exists, read it to determine the platform and pick the right agent.
If /calibrate generated custom agents (e.g., ios-developer.md), use those for platform-specific bugs.
4.2b: Escalation protocol for fix agents
Include this in the implementation agent's prompt:
## When Your Fix Doesn't Work (MANDATORY)
1. After first failed attempt: re-read the root cause analysis from Step 1.
Is the root cause correct? If not, go back to Step 1.
2. After second failed attempt: consult knowledge base:
- .claude/knowledge/qa-knowledge/ (error keywords)
- .claude/knowledge/shared/conventions.md (project gotchas)
- git log --all --grep="<error keyword>" --oneline -10
3. After third failed attempt: STOP. Do not try another fix.
Generate a STUCK REPORT and send to team-lead:
- Error: [exact message]
- Root cause hypothesis: [from Step 1]
- Fix attempts: [1, 2, 3 with results]
- KB consultation results: [what you found]
- Recommendation: [re-investigate root cause / ask user for X / try different approach]
4. If a troubleshooter agent is available, team-lead may spawn one.
Agent prompt includes:
1. Root cause analysis from Step 1
2. Scope lock from Step 2 (Fix Zone ONLY)
3. Behavior contract from Step 3
4. Project profile context (from .claude/project-profile.md if exists)
5. Knowledge base conventions (from .claude/knowledge/shared/conventions.md if exists)
6. Past incidents in affected files (from .claude/qa-knowledge/incidents/)
INSTRUCTIONS:
- You may ONLY modify files listed in the Fix Zone: {fix_zone_files}
- If you need to modify a file NOT in the Fix Zone, STOP and explain why
- The root cause is: {root_cause_description} at {file}:{line}
- Your fix must satisfy the Behavior Contract:
- All "MUST CHANGE" rows must work after your fix
- All "MUST NOT CHANGE" rows must still work after your fix
- Write regression tests that:
a) Test the "MUST CHANGE" behaviors (prove the bug is fixed)
b) Would FAIL if the fix were reverted (prove the test catches this bug)
- Keep changes minimal — this is a bug fix, not a refactor
- Follow the project's coding conventions from project-profile.md / knowledge base
- Use the project's test framework (detected from profile or CLAUDE.md, not assumed)
4.3: Scope guard validation
After the agent finishes, verify scope compliance:
# What files were actually modified?
CHANGED=$(git diff --name-only)
# Check each against scope lock
for file in $CHANGED; do
if echo "{fix_zone_files}" | grep -q "$file"; then
echo "✓ $file (in Fix Zone)"
elif echo "{watch_zone_files}" | grep -q "$file"; then
echo "⚠ $file (in Watch Zone — needs justification)"
else
echo "✗ $file (in FROZEN ZONE — VIOLATION)"
fi
done
If ANY Frozen Zone file was modified → BLOCK and require explanation. If Watch Zone files were modified → WARN and require justification.
4.4: Regression test validation (Mutation Testing)
Verify the regression test actually catches the bug:
# 1. Save the fix
git diff > /tmp/fix.patch
# 2. Revert the fix temporarily
git checkout -- .
# 3. Run ONLY the new regression test — it MUST FAIL
pytest tests/test_auth_refresh_regression.py -v 2>&1
# Expected: FAIL (bug exists, test catches it)
# 4. Re-apply the fix
git apply /tmp/fix.patch
# 5. Run the regression test again — it MUST PASS
pytest tests/test_auth_refresh_regression.py -v 2>&1
# Expected: PASS (bug fixed, test confirms it)
If the test passes both with and without the fix → the test is USELESS. It doesn't actually verify the bug. Reject it and write a better test.
If the test fails both with and without the fix → the fix is BROKEN. The fix doesn't actually resolve the bug. Go back to Step 1.
Only valid result: FAIL without fix, PASS with fix.
Goal: Two independent agents review the fix simultaneously from different angles — one checks correctness, one attacks it. Their cross-examination produces a more reliable verdict than a single reviewer could.
Team: fix-review-{slug}
Spawn these 2 agents in parallel:
| Agent | subagent_type | model | Role |
|---|---|---|---|
| fix-reviewer | code-reviewer | sonnet | 7-point correctness checklist |
| qa-attacker | qa-challenger | sonnet | Generate 3-5 attack scenarios against this specific fix |
4b.1: Spawn fix-reviewer
subagent_type: "code-reviewer"
model: "sonnet"
prompt: |
## Fix Review — Correctness Checklist
A bug fix has been implemented. Review it for correctness and completeness.
**Bug description**: {bug_description}
**Root cause**: {root_cause} at {file}:{line}
**Swarm verdict**: {convergence result from Step 1.3}
**Scope lock**:
- Fix Zone: {fix_zone_files}
- Watch Zone: {watch_zone_files}
**Behavior contract**:
{behavior_contract_table_from_step_3}
**The diff**:
Run `git diff` to see exactly what changed.
**Review checklist — answer each with ✓ or ✗ and explanation**:
1. **Root cause addressed?** Does this fix actually address the identified root cause,
or does it patch the symptom? Read the root cause, then read the diff — does the
change fix WHY the bug happened, not just WHERE it manifested?
2. **Complete fix?** Are there other code paths that have the same bug pattern?
Search for similar patterns in the codebase — the bug may exist in multiple places.
```bash
grep -rn "{pattern_from_root_cause}" --include="*.{ext}" | grep -v "{fixed_file}"
```
3. **Scope respected?** Are all changed files within the Fix Zone? Any Watch Zone
files touched without justification? Any Frozen Zone violations?
4. **Behavior contract honored?** For each "MUST NOT CHANGE" row — does the diff
risk changing that behavior? Trace each changed line's callers to check.
5. **Edge cases?** Does the fix handle:
- Null/undefined/empty inputs at the fix point?
- Concurrent access (if applicable)?
- Error paths (what if the fix itself throws)?
6. **Regression test quality?** Read the new test(s):
- Does the test actually exercise the bug scenario?
- Would the test fail if the fix were reverted? (Don't run it — reason about it)
- Does the test name clearly describe the bug it catches?
7. **Subtle issues?** Any of these:
- Type coercion or implicit conversion near the fix?
- Changed error messages that other code might match on?
- Modified function signatures that callers depend on?
- Performance implications (e.g., added a DB query in a hot path)?
**Output format**:
| Check | Result | Notes |
|---|---|---|
| Root cause addressed | ✓/✗ | {explanation} |
| Complete fix | ✓/✗ | {similar patterns found?} |
| Scope respected | ✓/✗ | {any violations?} |
| Behavior contract | ✓/✗ | {any risks?} |
| Edge cases | ✓/✗ | {any missing?} |
| Test quality | ✓/✗ | {assessment} |
| Subtle issues | ✓/✗ | {any found?} |
Verdict: APPROVE / REQUEST CHANGES / BLOCK Summary: {1-2 sentence overall assessment} {If REQUEST CHANGES or BLOCK: specific changes needed}
4b.2: Spawn qa-attacker (in parallel with fix-reviewer)
subagent_type: "qa-challenger"
model: "sonnet"
prompt: |
## Fix Attack — Adversarial QA
A bug fix has been implemented. Your job is to find ways it could fail or cause regressions.
Do NOT try to be fair — try to break it.
**Bug description**: {bug_description}
**Root cause**: {root_cause} at {file}:{line}
**The diff**: Run `git diff` to see exactly what changed.
Generate 3-5 attack scenarios targeting this specific fix. For each:
Attack {N}: {scenario name} Input / trigger: {what causes this} Expected (correct): {what should happen} Attack outcome: {what could go wrong with this fix} Severity: HIGH / MEDIUM / LOW Exploits: {which aspect of the fix creates this risk}
Focus attacks on:
- Inputs the fix doesn't handle (null, empty, boundary values)
- Concurrent or race conditions introduced
- Callers of the changed function with different usage patterns
- Side effects on Watch Zone files
- The regression test's blind spots (inputs not covered)
- Conditions where the fix itself could throw
4b.3: Cross-examination (1 round after both complete)
Orchestrator performs cross-examination:
Share fix-reviewer's verdict with qa-attacker: "The fix-reviewer gave verdict {APPROVE/REQUEST CHANGES/BLOCK}. Do your attacks change their assessment? Which of your HIGH attacks did they miss?"
Share qa-attacker's attacks with fix-reviewer: "The qa-attacker found these scenarios: {attacks}. Are any of these covered by your findings? Do any change your verdict?"
4b.4: Verdict merge
| fix-reviewer | qa-attacker | Merged Verdict |
|---|---|---|
| APPROVE | 0 HIGH attacks | APPROVE |
| APPROVE | 1+ HIGH attacks | REQUEST CHANGES |
| REQUEST CHANGES | any | REQUEST CHANGES (union of feedback) |
| BLOCK | any | BLOCK |
| APPROVE (after cross-exam) | attacks downgraded | APPROVE |
4b.5: Handle merged verdict
| Verdict | Action |
|---|---|
| APPROVE | Proceed to Step 5 (Differential Testing). |
| REQUEST CHANGES | Send combined feedback (checklist findings + unaddressed attacks) to the implementing agent. Re-run Step 4b after changes. Max 2 rounds — if not resolved, escalate to user. |
| BLOCK | STOP. Present the blocker to the user. Common blockers: fix addresses symptom not root cause, same bug pattern exists elsewhere unfixed, scope violation. |
4b.6: Check for incomplete fixes (same pattern elsewhere)
If fix-reviewer found similar patterns (check #2), decide:
Reviewer found the same bug pattern in 2 other files:
- backend/app/services/session.py:45 — same missing null check
- backend/app/services/token.py:78 — same missing null check
Options:
[1] Fix all instances now (expand Fix Zone)
[2] Fix only the reported bug, create issues for the others
[3] Ask user
Default to option 2 for medium severity. Default to option 1 for critical/high.
Always inform the user of the other instances regardless.
Cost: Two Sonnet agent calls + 1 cross-examination round (~2-4 min). The attacker surfaces attack vectors the reviewer's structured checklist would miss.
Skip conditions: NOT skippable even in --hotfix mode. A bad fix shipped fast is worse
than a good fix shipped 2 minutes later. The fix-reviewer always runs; qa-attacker is dropped only in hotfix mode.
Goal: Prove the fix changes ONLY bug behavior, nothing else.
5.1: Run full test suite after fix
# Use the SAME test commands as the baseline capture (Step 3.3)
# Backend (detected test command)
{backend_test_command} 2>&1 | tee /tmp/fix-after-backend.txt
# Frontend (if applicable, detected test command)
{frontend_test_command} 2>&1 | tee /tmp/fix-after-frontend.txt
5.2: Diff against baseline
# Compare before vs after
diff /tmp/fix-baseline-backend.txt /tmp/fix-after-backend.txt
Expected results:
| Change | Meaning | Action |
|---|---|---|
| FAIL → PASS (bug tests) | Bug is fixed | ✓ Expected |
| New tests PASS | Regression tests work | ✓ Expected |
| No change (all other tests) | No side effects | ✓ Expected |
| PASS → FAIL (unrelated test) | FIX HAS SIDE EFFECTS | ✗ Investigate — fix broke something else |
| PASS → FAIL (related test) | Fix may be incomplete or wrong | ✗ Investigate — may need broader fix |
If any PASS → FAIL: The fix is NOT safe. Either:
Do NOT proceed to PR until all PASS→FAIL cases are resolved.
5.3: Behavior contract verification
Check each row of the behavior contract table:
✓ MUST CHANGE: refresh("valid_token") → 200 + TokenPair (was 500)
✓ MUST CHANGE: refresh("tampered_token") → 401 (was 500)
✓ MUST NOT CHANGE: login("user", "pass") → 200 + TokenPair (unchanged)
✓ MUST NOT CHANGE: login("user", "wrong") → 401 (unchanged)
✓ MUST NOT CHANGE: refresh("expired_token") → 401 (unchanged)
✓ MUST NOT CHANGE: GET /me → 200 + User (unchanged)
✓ INVARIANT: All tokens have user_id (FK intact)
After all verification steps pass, this is the coordinated handoff to PR. No redundant QA runs — Step 5 already ran the full test suite for differential testing.
Step 5 ran the full test suite for differential comparison, but that's not the same as a proper QA pass. Ask the user:
Fix verified. What level of QA should I run before PR?
[1] commit — targeted checks on changed files only (~1-3 min)
[2] full — comprehensive checks across full codebase (~10-20 min)
[3] skip — differential test was enough, go straight to PR
Default: commit (recommended — catches issues differential testing misses like lint, types, domain rules)
Store choice as QA_LEVEL.
If user chose skip, skip to Step 6.3.
Otherwise, invoke /qa {QA_LEVEL} via the Skill tool:
/qa {QA_LEVEL}yes → fix issues, re-run QAskip → proceed (user takes responsibility)abort → stopRecord QA results as QA_RESULT for the PR body.
Check if local services exist — read CLAUDE.md ## Local Dev Services table.
If the table exists and has entries:
/restart if the project has that skill configuredAfter restart (or if no local services), ask the user:
Local servers restarted. Please test the fix manually — verify the bug is gone
and nothing else broke.
When you're done:
[ready] — looks good, proceed to PR
[issues] — found problems (describe them)
If issues: Address problems, re-run /qa commit only, ask again.
If ready: Proceed to Step 6.4.
Invoke /pr --skip-qa with bug-specific PR content. The --skip-qa flag tells /pr
to skip its own /qa commit since QA already ran in Step 6.2.
/pr will only:
PR template for bug fixes (passed to /pr):
## Bug Fix: {title}
**Severity**: {critical|high|medium|low}
**Root Cause**: {precise description from Step 1}
**Location**: {file}:{line}
### What was broken
{Description of the bug behavior}
### Why it was broken
{Root cause explanation — what assumption was violated}
### What this fix does
{Description of the changes — minimal, precise}
### Scope
- Fix Zone: {files modified}
- Watch Zone: {files checked but not modified}
- Frozen Zone: {verified untouched}
### Behavior Contract
| Behavior | Before | After |
|----------|--------|-------|
| {bug behavior} | {broken} | {fixed} |
| {preserved behavior} | {same} | {same} |
### Test Evidence
- Root cause: investigation swarm converged (Step 1.3) — {convergence type}
- Fix review: approved by review swarm (Step 4b) — fix-reviewer + qa-attacker
- Regression test: `{test_file}::{test_name}`
- ✓ Fails without fix (catches the bug)
- ✓ Passes with fix (confirms the fix)
- Differential test: {N} tests unchanged, {M} tests fixed, 0 regressions
- QA: {QA_LEVEL} mode — {QA_RESULT summary}
- User testing: confirmed manually
### Rollback Plan
{How to revert if this fix causes problems — typically: revert this commit}
Closes #{issue_number}
PR #{number} created: {url}
What's next?
[1] Fix another bug — /fix #N or /fix <description>
[2] Start a feature — /planning <feature>
[3] See project status — /onboard
[4] Done for now
If autopilot is active, skip this and continue the loop.
After the PR is created, update the knowledge base:
Create incident record:
# .claude/qa-knowledge/incidents/{date}-{slug}.md
---
status: covered
severity: {severity}
affected_files: [{file1}, {file2}]
root_cause: {description}
fix_pr: #{pr_number}
regression_test: {test_file}::{test_name}
created: {date}
---
## What happened
{Bug description}
## Root cause
{From Step 1}
## How QA missed it
{Why existing tests didn't catch this}
## Prevention
{What kind of test would have prevented this}
Update bug patterns:
# Append to .claude/qa-knowledge/bug-patterns.md
echo "### {date} — {title}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Area: {affected_area}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Pattern: {root_cause_pattern}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Prevention: {test_type}" >> .claude/qa-knowledge/bug-patterns.md
Update knowledge base (if .claude/knowledge/ exists):
shared/conventions.md if the bug revealed a coding convention violationshared/domain.md if the bug revealed a business rule not in codeagents/{agent-name}.md if the implementing agent learned somethingFor critical production bugs, the pipeline is compressed:
| Step | Standard | Hotfix |
|---|---|---|
| 1. Root Cause | Investigation Swarm (3 agents, debate) | Single backward-tracer (Sonnet, 5 min max) |
| 2. Scope Lock | Full dependency graph | Direct file only — hop 0-1 |
| 3. Behavior Contract | Full table | Bug behavior + 3 critical paths only |
| 4. Implement | Spawn agent team | Fix directly — single agent |
| 4b. Fix Review | Review Swarm (2 agents, cross-examine) | Single fix-reviewer (no cross-examine) |
| 5. Differential Test | Full suite | Critical path tests only |
| 6. PR | Full template | Abbreviated — merge fast |
| 7. Knowledge Base | Full update | Post-merge (don't block the fix) |
Hotfix STILL requires:
| Severity | QA Mode | Review | Merge |
|---|---|---|---|
| critical | Full QA + critical paths + E2E | Bug-specific + SRE review | Expedited |
| high | Full QA | Bug-specific review | Standard |
| medium | Commit mode QA | Standard review | Standard |
| low | Commit mode QA | Standard review | Batched |
| Mode | Agents | Estimated Cost | When |
|---|---|---|---|
| Hotfix (investigation) | 1 Sonnet backward-tracer | ~$0.005 | --hotfix flag |
| Quick (Lite) | 1 Sonnet analyst | ~$0.007 | Simple / obvious bug |
| Verified (Lite swarm) | 1 Sonnet analyst + 1 Sonnet verifier | ~$0.014 | Moderate bugs |
| Full Swarm (investigation) | 2 Sonnet + 1 Haiku + debate rounds | ~$0.045 | Complex / unclear bugs |
| Review Swarm (hotfix) | 1 Sonnet fix-reviewer | ~$0.006 | --hotfix fix review |
| Review Swarm (standard) | 2 Sonnet + cross-exam | ~$0.018 | Standard fix review |
| Full pipeline (standard) | All of the above | ~$0.08 average | Default |
| Full pipeline (hotfix) | Compressed | ~$0.021 | --hotfix |
Delta from previous single-agent pattern: ~3.8x more per full pipeline run. One prevented misdiagnosis (wrong root cause → wrong fix → debugging cycle → re-fix) saves the entire investigation and implementation time — the swarm pays for itself on the first catch.
npm test/jest/vitest for Node, go test ./... for Go,
cargo test for Rust, dotnet test for C#, swift test for Swift. Read CLAUDE.md
Test Commands table for the project's actual commands.