Search everything...

Slash Command

/fix

Formal bug fix pipeline — root cause analysis, scope lock, behavior contract, differential testing, regression proof. Usage: /fix <description|#issue> <--severity critical|high|medium> <--hotfix>

From prime

Install

Run in your terminal

npx claudepluginhub arthtech-ai/arthai-marketplace --plugin prime

Command Content

Other plugins with /fix

/fix

autoresearch/

Iteratively repairs code errors until zero remain via autonomous loop, applying one atomic fix per iteration with auto-revert on failure. Supports --target, --scope, --category, --iterations flags.

autoresearch

3.2k

/fix

Analyzes bug reports, hypothesizes root causes, investigates via subagents, recommends fixes with debug logging, and implements upon approval using TDD.

spectre

119

/fix

Applies a quick fix or small change to code based on description, with commit discipline in turbo mode skipping planning.

ReadWriteEdit+7

vbw

/fix

Detects and fixes bugs by analyzing error messages, stack traces, logs, or file paths. Proposes solutions, implements minimal fixes, validates, and suggests tests.

ai-pair-programming

/fix

Fixes bugs using test-driven approach with retry. Requires natural language bug description and supports flags like --regression-test, --hotfix, --max-retries.

/fix

Diagnoses bugs via reproduction, implements TDD fixes, and validates with QA/security gates using iterative retries up to 5 attempts per phase.

alfred-dev

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 5, 2026

Actions

View Source View Plugin View on GitHub View README

/fix | prime | ClaudePluginHub

Slash Command

/fix

Formal bug fix pipeline — root cause analysis, scope lock, behavior contract, differential testing, regression proof. Usage: /fix <description|#issue> <--severity critical|high|medium> <--hotfix>

From prime

Install

Run in your terminal

npx claudepluginhub arthtech-ai/arthai-marketplace --plugin prime

Command Content

/fix — Formal Bug Fix Pipeline

A rigorous bug fix workflow grounded in formal methods. Every step is verifiable. Every guardrail is enforceable. No guessing.

The correctness pipeline:

Step	Formal Method	What It Proves
1. Root Cause Swarm	Competing Hypotheses + Adversarial Debate	Multiple strategies converge on the actual cause
2. Scope Lock	Graph Reachability	Fix only touches what it needs to
3. Behavior Contract	Hoare Logic	We know what should and shouldn't change
4. Implement Fix	Scoped Agent	Changes stay within the Fix Zone
4b. Review Swarm	Parallel Adversarial Review	Independent reviewer + QA attacker agree the fix is correct
5. Differential Test	Bisimulation + Mutation	Fix changes ONLY bug behavior, nothing else

Argument Parsing

Input	Action
`/fix POST /auth/refresh returns 500`	Free-text bug description
`/fix #123`	Load bug from GitHub issue
`/fix #123 --severity critical`	With explicit severity
`/fix #123 --hotfix`	Critical: expedited path (skip non-essential steps)
`/fix --severity high token refresh fails silently`	Severity + description

If no severity provided, assess automatically in Step 1.

Execution Steps

Step 0: Load Bug Context

If #N provided — load the GitHub issue:

gh issue view N --json title,body,labels,assignees,comments

Extract: description, reproduction steps, affected area, any labels (bug, critical, etc.)

If free-text — use as the bug description directly.

Read project context (Tier 0→2: read small files fully, grep directories before reading):

Read CLAUDE.md for tech stack, test commands, architecture (Tier 0)
Read .claude/project-profile.md if it exists (Tier 0)
Read .claude/knowledge/shared/domain.md if it exists (Tier 1)
Grep .claude/qa-knowledge/incidents/ for affected keywords (Tier 2 — selective)
Grep .claude/qa-knowledge/bug-patterns.md for affected keywords (Tier 2 — selective)
Check .claude/wikis/*/wiki/index.md for relevant domain context (Tier 1→2)

Check for similar past bugs:

# Search incidents by affected area (Tier 2 — grep-then-read)
grep -rl "{affected_keywords}" .claude/qa-knowledge/incidents/ 2>/dev/null
# Search bug patterns
grep -i "{affected_keywords}" .claude/qa-knowledge/bug-patterns.md 2>/dev/null
# Search topic wikis for domain context on affected area
for idx in .claude/wikis/*/wiki/index.md; do
  grep -li "{affected_keywords}" "$idx" 2>/dev/null
done

If similar past bugs found, show them:

⚠ Similar past incidents found:
  • 2026-03-15-token-expiry.md (severity: high, status: covered)
    Root cause: token expiry check used wrong timezone
  • 2026-02-28-auth-500.md (severity: critical, status: covered)
    Root cause: missing null check on session object

Step 1: Root Cause Analysis (Program Slicing)

Goal: Find the actual cause, not a symptom. This is NOT optional — no fix without understanding.

1.1: Reproduce the bug

From the description/issue, determine how to reproduce:

# If it's an API bug — try to hit the endpoint
curl -s -X POST http://localhost:8000/auth/refresh -H "Content-Type: application/json" \
  -d '{"refresh_token": "test"}' | head -20

# If it's a test failure — run the specific test
pytest tests/test_auth.py::test_refresh -v

# If it's a UI bug — check the relevant component

If reproduction fails, ask the user for reproduction steps.

1.2: Locate the symptom

Find where the error occurs:

# Search for the error message, status code, or exception
grep -rn "500\|Internal Server Error\|{error_message}" --include="*.py" --include="*.ts"

# Find the route/handler for the affected endpoint
grep -rn "{endpoint_pattern}" --include="*.py" --include="*.ts"

1.2b: Use explore-light for fast codebase search (cost: 1x Haiku)

Before manually grepping, spawn explore-light to quickly locate relevant code:

subagent_type: "explore-light"
prompt: "Find all code related to {symptom description}. I need:
1. The route/handler for {endpoint or feature}
2. All functions that call or are called by the handler
3. Related test files
4. Any error handling or validation in this code path
Return file paths and line numbers."

This is 60x cheaper than doing the search yourself (Haiku vs Opus). Use the results to focus your manual reading in Step 1.3 — don't read every file, read only what explore-light identified.

1.3: Investigation Swarm — depth selection

Before spawning investigators, auto-detect the appropriate investigation depth:

Signal	Level	Rationale
`--hotfix` flag	Skip (single backward trace)	Production fire
Traceback points to single line	Lite (1 round, 2 agents)	Obvious bug
Multiple possible causes / unclear reproduction	Full (3 rounds, 3 agents)	Need competing hypotheses
User says "this is simple"	Lite or skip	Trust user
Touches auth/payments/data	Full (always)	High-risk area

If auto-detection is uncertain, ask:

Bug complexity estimate: {simple | moderate | complex}

Investigation mode:
  [1] Quick — single analyst, fastest (simple bugs)
  [2] Verified — analyst + challenger (moderate bugs)
  [3] Swarm — 3 competing investigators with debate (complex/unclear bugs)
  [auto] Let me decide based on evidence

Your pick? [auto]

Team: fix-investigate-{slug}

Spawn these 3 agents in parallel:

Agent	subagent_type	model	Strategy
backward-tracer	code-reviewer	sonnet	Traces backward from symptom through call chain
forward-tracer	code-reviewer	sonnet	Checks git log for recent changes in affected area, traces forward
pattern-matcher	explore-light	haiku	Searches knowledge base (incidents/, bug-patterns.md) for similar patterns

Each agent receives: bug description, symptom location (file:line), reproduction steps, and access to CLAUDE.md / project-profile.md. They do NOT share conclusions with each other until Round 2.

Early termination conditions (check before proceeding to Round 2):

All 3 agents point to the same file:line after Round 1 → CONVERGE immediately, skip to verdict
A smoking gun exists (exception traceback with explicit file:line) → early converge, ask others to confirm

Round 1 — HYPOTHESIZE

Each agent submits independently to the orchestrator:

**Hypothesis**: {description of root cause}
**Location**: {file}:{line}
**Evidence**: {list of file:line citations supporting this hypothesis}
**Confidence**: high / medium / low
**Reasoning**: {how you traced from symptom to this cause}

Round 2 — CROSS-CHALLENGE (only if no early termination)

Orchestrator broadcasts all 3 hypotheses to all 3 agents. Each agent attempts to disprove the other two:

## Challenge to {agent}'s hypothesis
**Counter-evidence**: [file:line] — {why it contradicts}
**Verdict**: DISPROVED | WEAKENED | CONSISTENT

Round 3 — FINAL POSITION (only if no convergence after Round 2)

Each agent submits a final hypothesis, incorporating any valid challenges received.

Convergence scoring rubric:

Signal	Score
Evidence citation: verified file:line	+2 per citation
Evidence citation: refuted by challenge	-1 per refuted
Corroborated by another agent	+3
Contradicted with counter-evidence	-2
Pattern match from KB (incidents/, bug-patterns.md)	+2
Survived cross-challenge intact	+3
Hypothesis disproved by challenge	-3

Convergence conditions:

Outcome	When	Action
CONVERGE immediately	All 3 agree on same file:line (Round 1)	Proceed to Step 1.4
CONVERGE	2 agree, 3rd disproved (Round 2)	Adopt majority hypothesis
MERGE	2 agree, 3rd compatible (Round 2)	Merge hypotheses — combined view is more complete
CONVERGE	Highest score leads by 3+ points (Round 3)	Adopt leading hypothesis
ESCALATE	Scores within 2 points (Round 3)	Present all 3 to user — do not guess

Escalation prompt (if no convergence):

⚠ Investigation swarm could not converge — 3 competing hypotheses remain after debate:

  backward-tracer (score: {N}):
    Root cause: {hypothesis}
    Location: {file}:{line}
    Key evidence: {citations}

  forward-tracer (score: {N}):
    Root cause: {hypothesis}
    Location: {file}:{line}
    Key evidence: {citations}

  pattern-matcher (score: {N}):
    Root cause: {hypothesis}
    Location: {file}:{line}
    Key evidence: {citations}

Which should we pursue?
  [1] backward-tracer's diagnosis
  [2] forward-tracer's diagnosis
  [3] pattern-matcher's diagnosis
  [4] Investigate all — there may be multiple bugs

1.4: Identify root cause

The root cause is the earliest point in the dependency chain where behavior diverges from the specification. It's where "this code assumes X but X is not always true."

Document:

**Root Cause**: {precise description}
**Location**: {file}:{line}
**Why it breaks**: {what assumption is violated}
**When it breaks**: {specific conditions that trigger the bug}
**When it works**: {conditions under which the bug doesn't manifest}
**Swarm verdict**: {CONVERGE | MERGE | user-resolved} — {which agents agreed}

1.5: Assess severity (if not provided)

Signal	Severity
Production users affected NOW	critical
Data corruption or security	critical
Feature broken but workaround exists	high
Edge case, rare trigger	medium
Cosmetic, no functional impact	low

Step 2: Scope Lock (Graph Reachability)

Goal: Determine exactly which files the fix may touch and which are OFF LIMITS.

2.1: Build dependency graph from root cause

Starting from the root cause location, trace outward:

Hop 0 (ROOT CAUSE): The file/function with the bug
  → Read the file, identify the broken function

Hop 1 (DIRECT DEPENDENCIES): Files the root cause imports from or is called by
  → grep for imports in the root cause file
  → grep for who imports/calls the root cause function

Hop 2 (INDIRECT): Files that depend on hop-1 files
  → Same grep pattern, one level out

Hop 3+: Everything else

Execute with actual code analysis:

# Hop 0: root cause file
ROOT_FILE="backend/app/services/auth.py"

# Hop 1: what does root cause import?
grep -E "^from|^import" "$ROOT_FILE" | grep -v "^from typing\|^import os\|^import json"

# Hop 1: who imports root cause?
grep -rl "from.*auth import\|import.*auth" backend/ --include="*.py"

# Hop 2: extend one more level from hop 1 results
# (repeat the above for each hop 1 file)

2.2: Classify zones

Write the scope lock document — this is the guardrail for the entire fix:

## Scope Lock — {bug description}

### Fix Zone (ALLOWED — may modify)
{Files at hop 0-1 that need changes to fix the bug}
- {file}:{lines} — {why this needs to change}
- {file}:{lines} — {why this needs to change}
- {test_file} — regression test goes here

### Watch Zone (CAUTION — may be affected)
{Files at hop 2 that might be affected by the fix}
- {file} — {how it relates, what to watch for}

### Frozen Zone (BLOCKED — must NOT modify)
{Everything else — especially:}
- All files outside the dependency chain
- Migration files (unless root cause is in schema)
- Config files (unless root cause is in config)
- Unrelated services/modules
- Frontend (if backend bug) or vice versa

2.3: Set up enforcement

Save the scope lock to a file that the implementation step reads:

# Write scope lock for the implementation agents
cat > .claude/.fix-scope-lock.json << 'EOF'
{
  "bug": "{description}",
  "severity": "{severity}",
  "root_cause": "{file}:{line}",
  "fix_zone": ["{file1}", "{file2}", "{test_file}"],
  "watch_zone": ["{file3}", "{file4}"],
  "frozen_zone_patterns": ["migrations/*", "frontend/*", "*.lock"],
  "created_at": "{timestamp}"
}
EOF

The implementation agent reads this and is BLOCKED from editing files outside fix_zone. The pre-edit-guard hook can enforce this if .claude/.fix-scope-lock.json exists.

Step 3: Behavior Contract (Hoare Logic)

Goal: Define exactly what the fix should change and what must stay the same.

3.1: Extract existing contracts

For each function in the Fix Zone, extract its behavioral contract:

From type signatures:

# Reading: def refresh_token(token: str) -> TokenPair
# Contract: str → TokenPair (not str → Exception)

From existing passing tests:

# Run tests that touch Fix Zone files BEFORE the fix
# Each passing test = a behavioral contract that must be preserved
pytest tests/test_auth.py -v --tb=no 2>&1
# Output:
#   test_login_valid_creds PASSED        → contract: valid creds → token
#   test_login_invalid_creds PASSED      → contract: invalid creds → 401
#   test_refresh_valid_token FAILED      → THIS IS THE BUG
#   test_refresh_expired_token PASSED    → contract: expired → 401

From API schemas/validators:

# Check route definitions for response models
grep -A10 "@router.post.*refresh" backend/app/api/auth.py
# Check Pydantic models, Zod schemas, etc.

From database constraints:

# Check model definitions for invariants
grep -A20 "class.*Token\|class.*Session" backend/app/models/*.py

3.2: Build the behavior contract table

## Behavior Contract — {bug description}

### Behaviors That MUST CHANGE (fix verification)
| Input | Current (broken) | Expected (after fix) | Test |
|-------|-----------------|---------------------|------|
| refresh("valid_token") | 500 Internal Server Error | 200 + new TokenPair | WRITE NEW |
| refresh("tampered_token") | 500 Internal Server Error | 401 Unauthorized | WRITE NEW |

### Behaviors That MUST NOT CHANGE (regression protection)
| Input | Current (correct) | After Fix (same) | Test |
|-------|------------------|------------------|------|
| login("user", "pass") | 200 + TokenPair | 200 + TokenPair | EXISTING: test_login_valid |
| login("user", "wrong") | 401 | 401 | EXISTING: test_login_invalid |
| refresh("expired_token") | 401 | 401 | EXISTING: test_refresh_expired |
| GET /me with valid token | 200 + User | 200 + User | EXISTING: test_get_me |

### Invariants (always true, before and after)
- Tokens table: every token has a user_id (FK constraint)
- Sessions table: expired sessions are never returned by get_active_session()
- Auth: no endpoint returns 200 without valid authentication

3.3: Capture baseline test results

# Run ALL tests and save results — this is the "before" snapshot
# for differential testing in Step 5
#
# Use the project's ACTUAL test commands from CLAUDE.md Test Commands table
# or project-profile.md testing conventions. Examples for common stacks:
#
#   Python:  pytest --tb=line -q
#   Node:    npm test -- --watchAll=false
#   Go:      go test ./... -count=1
#   Rust:    cargo test
#   Ruby:    bundle exec rspec --format progress
#   Java:    ./gradlew test
#   Swift:   swift test
#   Flutter: flutter test
#
cd {project_root}

# Backend (use detected test command)
{backend_test_command} 2>&1 | tee /tmp/fix-baseline-backend.txt

# Frontend (if applicable, use detected test command)
{frontend_test_command} 2>&1 | tee /tmp/fix-baseline-frontend.txt

# Save pass/fail counts
echo "Baseline captured at $(date)" > .claude/.fix-baseline-summary.txt
grep -cE "passed|PASSED|ok|OK" /tmp/fix-baseline-backend.txt >> .claude/.fix-baseline-summary.txt || true
grep -cE "failed|FAILED|FAIL|ERROR" /tmp/fix-baseline-backend.txt >> .claude/.fix-baseline-summary.txt || true

Save to .claude/.fix-behavior-contract.md for the implementation and QA steps.

Step 4: Implement the Fix

Goal: Apply the minimal fix within the scope lock, write regression tests.

4.1: Determine implementation approach

Based on severity:

Severity	Approach
critical / --hotfix	Fix directly (no /planning, no team). Minimal change.
high	Fix directly with code-reviewer validation.
medium	Consider if `/planning` is needed for architectural changes.
low	Standard flow — may batch with other work.

4.2: Spawn implementation agent

Select the right agent based on which layer the bug is in:

Bug is in...	Spawn	subagent_type	Why
Backend (any language)	backend agent	`python-backend`	Adaptive — detects Python/Node/Go/Rust/Ruby/Java/Elixir at runtime
Frontend (any framework)	frontend agent	`frontend`	Adaptive — detects React/Vue/Svelte/Angular/etc. at runtime
Both layers	backend + frontend	Both agents	Scope lock keeps each in their zone
iOS/Android/Flutter/ML	custom agent (if /calibrate generated one)	Check `.claude/agents/` for project-specific agents	/calibrate creates these for non-web platforms
Unknown	general-purpose	`general-purpose`	Fallback — reads project-profile.md for context

If .claude/project-profile.md exists, read it to determine the platform and pick the right agent. If /calibrate generated custom agents (e.g., ios-developer.md), use those for platform-specific bugs.

4.2b: Escalation protocol for fix agents

Include this in the implementation agent's prompt:

## When Your Fix Doesn't Work (MANDATORY)

1. After first failed attempt: re-read the root cause analysis from Step 1.
   Is the root cause correct? If not, go back to Step 1.
2. After second failed attempt: consult knowledge base:
   - .claude/knowledge/qa-knowledge/ (error keywords)
   - .claude/knowledge/shared/conventions.md (project gotchas)
   - git log --all --grep="<error keyword>" --oneline -10
3. After third failed attempt: STOP. Do not try another fix.
   Generate a STUCK REPORT and send to team-lead:
   - Error: [exact message]
   - Root cause hypothesis: [from Step 1]
   - Fix attempts: [1, 2, 3 with results]
   - KB consultation results: [what you found]
   - Recommendation: [re-investigate root cause / ask user for X / try different approach]
4. If a troubleshooter agent is available, team-lead may spawn one.

Agent prompt includes:

1. Root cause analysis from Step 1
2. Scope lock from Step 2 (Fix Zone ONLY)
3. Behavior contract from Step 3
4. Project profile context (from .claude/project-profile.md if exists)
5. Knowledge base conventions (from .claude/knowledge/shared/conventions.md if exists)
6. Past incidents in affected files (from .claude/qa-knowledge/incidents/)

INSTRUCTIONS:
- You may ONLY modify files listed in the Fix Zone: {fix_zone_files}
- If you need to modify a file NOT in the Fix Zone, STOP and explain why
- The root cause is: {root_cause_description} at {file}:{line}
- Your fix must satisfy the Behavior Contract:
  - All "MUST CHANGE" rows must work after your fix
  - All "MUST NOT CHANGE" rows must still work after your fix
- Write regression tests that:
  a) Test the "MUST CHANGE" behaviors (prove the bug is fixed)
  b) Would FAIL if the fix were reverted (prove the test catches this bug)
- Keep changes minimal — this is a bug fix, not a refactor
- Follow the project's coding conventions from project-profile.md / knowledge base
- Use the project's test framework (detected from profile or CLAUDE.md, not assumed)

4.3: Scope guard validation

After the agent finishes, verify scope compliance:

# What files were actually modified?
CHANGED=$(git diff --name-only)

# Check each against scope lock
for file in $CHANGED; do
  if echo "{fix_zone_files}" | grep -q "$file"; then
    echo "✓ $file (in Fix Zone)"
  elif echo "{watch_zone_files}" | grep -q "$file"; then
    echo "⚠ $file (in Watch Zone — needs justification)"
  else
    echo "✗ $file (in FROZEN ZONE — VIOLATION)"
  fi
done

If ANY Frozen Zone file was modified → BLOCK and require explanation. If Watch Zone files were modified → WARN and require justification.

4.4: Regression test validation (Mutation Testing)

Verify the regression test actually catches the bug:

# 1. Save the fix
git diff > /tmp/fix.patch

# 2. Revert the fix temporarily
git checkout -- .

# 3. Run ONLY the new regression test — it MUST FAIL
pytest tests/test_auth_refresh_regression.py -v 2>&1
# Expected: FAIL (bug exists, test catches it)

# 4. Re-apply the fix
git apply /tmp/fix.patch

# 5. Run the regression test again — it MUST PASS
pytest tests/test_auth_refresh_regression.py -v 2>&1
# Expected: PASS (bug fixed, test confirms it)

If the test passes both with and without the fix → the test is USELESS. It doesn't actually verify the bug. Reject it and write a better test.

If the test fails both with and without the fix → the fix is BROKEN. The fix doesn't actually resolve the bug. Go back to Step 1.

Only valid result: FAIL without fix, PASS with fix.

Step 4b: Review Swarm (Parallel Adversarial Review)

Goal: Two independent agents review the fix simultaneously from different angles — one checks correctness, one attacks it. Their cross-examination produces a more reliable verdict than a single reviewer could.

Team: fix-review-{slug}

Spawn these 2 agents in parallel:

Agent	subagent_type	model	Role
fix-reviewer	code-reviewer	sonnet	7-point correctness checklist
qa-attacker	qa-challenger	sonnet	Generate 3-5 attack scenarios against this specific fix

4b.1: Spawn fix-reviewer

subagent_type: "code-reviewer"
model: "sonnet"
prompt: |
  ## Fix Review — Correctness Checklist

  A bug fix has been implemented. Review it for correctness and completeness.

  **Bug description**: {bug_description}
  **Root cause**: {root_cause} at {file}:{line}
  **Swarm verdict**: {convergence result from Step 1.3}

  **Scope lock**:
  - Fix Zone: {fix_zone_files}
  - Watch Zone: {watch_zone_files}

  **Behavior contract**:
  {behavior_contract_table_from_step_3}

  **The diff**:
  Run `git diff` to see exactly what changed.

  **Review checklist — answer each with ✓ or ✗ and explanation**:

  1. **Root cause addressed?** Does this fix actually address the identified root cause,
     or does it patch the symptom? Read the root cause, then read the diff — does the
     change fix WHY the bug happened, not just WHERE it manifested?

  2. **Complete fix?** Are there other code paths that have the same bug pattern?
     Search for similar patterns in the codebase — the bug may exist in multiple places.
     ```bash
     grep -rn "{pattern_from_root_cause}" --include="*.{ext}" | grep -v "{fixed_file}"
     ```

  3. **Scope respected?** Are all changed files within the Fix Zone? Any Watch Zone
     files touched without justification? Any Frozen Zone violations?

  4. **Behavior contract honored?** For each "MUST NOT CHANGE" row — does the diff
     risk changing that behavior? Trace each changed line's callers to check.

  5. **Edge cases?** Does the fix handle:
     - Null/undefined/empty inputs at the fix point?
     - Concurrent access (if applicable)?
     - Error paths (what if the fix itself throws)?

  6. **Regression test quality?** Read the new test(s):
     - Does the test actually exercise the bug scenario?
     - Would the test fail if the fix were reverted? (Don't run it — reason about it)
     - Does the test name clearly describe the bug it catches?

  7. **Subtle issues?** Any of these:
     - Type coercion or implicit conversion near the fix?
     - Changed error messages that other code might match on?
     - Modified function signatures that callers depend on?
     - Performance implications (e.g., added a DB query in a hot path)?

  **Output format**:

Fix Review Result

Check	Result	Notes
Root cause addressed	✓/✗	{explanation}
Complete fix	✓/✗	{similar patterns found?}
Scope respected	✓/✗	{any violations?}
Behavior contract	✓/✗	{any risks?}
Edge cases	✓/✗	{any missing?}
Test quality	✓/✗	{assessment}
Subtle issues	✓/✗	{any found?}

Verdict: APPROVE / REQUEST CHANGES / BLOCK Summary: {1-2 sentence overall assessment} {If REQUEST CHANGES or BLOCK: specific changes needed}

4b.2: Spawn qa-attacker (in parallel with fix-reviewer)

subagent_type: "qa-challenger"
model: "sonnet"
prompt: |
  ## Fix Attack — Adversarial QA

  A bug fix has been implemented. Your job is to find ways it could fail or cause regressions.
  Do NOT try to be fair — try to break it.

  **Bug description**: {bug_description}
  **Root cause**: {root_cause} at {file}:{line}
  **The diff**: Run `git diff` to see exactly what changed.

  Generate 3-5 attack scenarios targeting this specific fix. For each:

Attack {N}: {scenario name} Input / trigger: {what causes this} Expected (correct): {what should happen} Attack outcome: {what could go wrong with this fix} Severity: HIGH / MEDIUM / LOW Exploits: {which aspect of the fix creates this risk}


Focus attacks on:
- Inputs the fix doesn't handle (null, empty, boundary values)
- Concurrent or race conditions introduced
- Callers of the changed function with different usage patterns
- Side effects on Watch Zone files
- The regression test's blind spots (inputs not covered)
- Conditions where the fix itself could throw

4b.3: Cross-examination (1 round after both complete)

Orchestrator performs cross-examination:

Share fix-reviewer's verdict with qa-attacker: "The fix-reviewer gave verdict {APPROVE/REQUEST CHANGES/BLOCK}. Do your attacks change their assessment? Which of your HIGH attacks did they miss?"
Share qa-attacker's attacks with fix-reviewer: "The qa-attacker found these scenarios: {attacks}. Are any of these covered by your findings? Do any change your verdict?"

4b.4: Verdict merge

fix-reviewer	qa-attacker	Merged Verdict
APPROVE	0 HIGH attacks	APPROVE
APPROVE	1+ HIGH attacks	REQUEST CHANGES
REQUEST CHANGES	any	REQUEST CHANGES (union of feedback)
BLOCK	any	BLOCK
APPROVE (after cross-exam)	attacks downgraded	APPROVE

4b.5: Handle merged verdict

Verdict	Action
APPROVE	Proceed to Step 5 (Differential Testing).
REQUEST CHANGES	Send combined feedback (checklist findings + unaddressed attacks) to the implementing agent. Re-run Step 4b after changes. Max 2 rounds — if not resolved, escalate to user.
BLOCK	STOP. Present the blocker to the user. Common blockers: fix addresses symptom not root cause, same bug pattern exists elsewhere unfixed, scope violation.

4b.6: Check for incomplete fixes (same pattern elsewhere)

If fix-reviewer found similar patterns (check #2), decide:

Reviewer found the same bug pattern in 2 other files:
  - backend/app/services/session.py:45 — same missing null check
  - backend/app/services/token.py:78 — same missing null check

Options:
  [1] Fix all instances now (expand Fix Zone)
  [2] Fix only the reported bug, create issues for the others
  [3] Ask user

Default to option 2 for medium severity. Default to option 1 for critical/high. Always inform the user of the other instances regardless.

Cost: Two Sonnet agent calls + 1 cross-examination round (~2-4 min). The attacker surfaces attack vectors the reviewer's structured checklist would miss.

Skip conditions: NOT skippable even in --hotfix mode. A bad fix shipped fast is worse than a good fix shipped 2 minutes later. The fix-reviewer always runs; qa-attacker is dropped only in hotfix mode.

Step 5: Differential Testing (Bisimulation Proof)

Goal: Prove the fix changes ONLY bug behavior, nothing else.

5.1: Run full test suite after fix

# Use the SAME test commands as the baseline capture (Step 3.3)
# Backend (detected test command)
{backend_test_command} 2>&1 | tee /tmp/fix-after-backend.txt

# Frontend (if applicable, detected test command)
{frontend_test_command} 2>&1 | tee /tmp/fix-after-frontend.txt

5.2: Diff against baseline

# Compare before vs after
diff /tmp/fix-baseline-backend.txt /tmp/fix-after-backend.txt

Expected results:

Change	Meaning	Action
FAIL → PASS (bug tests)	Bug is fixed	✓ Expected
New tests PASS	Regression tests work	✓ Expected
No change (all other tests)	No side effects	✓ Expected
PASS → FAIL (unrelated test)	FIX HAS SIDE EFFECTS	✗ Investigate — fix broke something else
PASS → FAIL (related test)	Fix may be incomplete or wrong	✗ Investigate — may need broader fix

If any PASS → FAIL: The fix is NOT safe. Either:

The fix introduced a new bug (scope violation)
The fix changed behavior that something else depends on (contract violation)
An existing test was fragile/flaky (investigate)

Do NOT proceed to PR until all PASS→FAIL cases are resolved.

5.3: Behavior contract verification

Check each row of the behavior contract table:

✓ MUST CHANGE: refresh("valid_token") → 200 + TokenPair (was 500)
✓ MUST CHANGE: refresh("tampered_token") → 401 (was 500)
✓ MUST NOT CHANGE: login("user", "pass") → 200 + TokenPair (unchanged)
✓ MUST NOT CHANGE: login("user", "wrong") → 401 (unchanged)
✓ MUST NOT CHANGE: refresh("expired_token") → 401 (unchanged)
✓ MUST NOT CHANGE: GET /me → 200 + User (unchanged)
✓ INVARIANT: All tokens have user_id (FK intact)

Step 6: Post-Fix Workflow

After all verification steps pass, this is the coordinated handoff to PR. No redundant QA runs — Step 5 already ran the full test suite for differential testing.

6.1: Ask user what QA level to run

Step 5 ran the full test suite for differential comparison, but that's not the same as a proper QA pass. Ask the user:

Fix verified. What level of QA should I run before PR?

  [1] commit  — targeted checks on changed files only (~1-3 min)
  [2] full    — comprehensive checks across full codebase (~10-20 min)
  [3] skip    — differential test was enough, go straight to PR

Default: commit (recommended — catches issues differential testing misses like lint, types, domain rules)

Store choice as QA_LEVEL.

6.2: Run /qa at chosen level

If user chose skip, skip to Step 6.3.

Otherwise, invoke /qa {QA_LEVEL} via the Skill tool:

Run /qa {QA_LEVEL}
If QA fails — show failures. Ask: "Fix these? [yes / skip / abort]"
- yes → fix issues, re-run QA
- skip → proceed (user takes responsibility)
- abort → stop
If QA passes — continue

Record QA results as QA_RESULT for the PR body.

6.3: Restart local servers and ask user to test

Check if local services exist — read CLAUDE.md ## Local Dev Services table.

If the table exists and has entries:

Restart services using the Start Command from each row
Or invoke /restart if the project has that skill configured

After restart (or if no local services), ask the user:

Local servers restarted. Please test the fix manually — verify the bug is gone
and nothing else broke.

When you're done:
  [ready]  — looks good, proceed to PR
  [issues] — found problems (describe them)

If issues: Address problems, re-run /qa commit only, ask again. If ready: Proceed to Step 6.4.

6.4: Create PR via /pr --skip-qa

Invoke /pr --skip-qa with bug-specific PR content. The --skip-qa flag tells /pr to skip its own /qa commit since QA already ran in Step 6.2.

/pr will only:

Run a quick sanity check (lint + type check only — no full QA)
Stage and commit
Find or create GitHub issue
Push to feature branch
Open PR with the bug fix template below
Return PR URL

PR template for bug fixes (passed to /pr):

## Bug Fix: {title}

**Severity**: {critical|high|medium|low}
**Root Cause**: {precise description from Step 1}
**Location**: {file}:{line}

### What was broken
{Description of the bug behavior}

### Why it was broken
{Root cause explanation — what assumption was violated}

### What this fix does
{Description of the changes — minimal, precise}

### Scope
- Fix Zone: {files modified}
- Watch Zone: {files checked but not modified}
- Frozen Zone: {verified untouched}

### Behavior Contract
| Behavior | Before | After |
|----------|--------|-------|
| {bug behavior} | {broken} | {fixed} |
| {preserved behavior} | {same} | {same} |

### Test Evidence
- Root cause: investigation swarm converged (Step 1.3) — {convergence type}
- Fix review: approved by review swarm (Step 4b) — fix-reviewer + qa-attacker
- Regression test: `{test_file}::{test_name}`
  - ✓ Fails without fix (catches the bug)
  - ✓ Passes with fix (confirms the fix)
- Differential test: {N} tests unchanged, {M} tests fixed, 0 regressions
- QA: {QA_LEVEL} mode — {QA_RESULT summary}
- User testing: confirmed manually

### Rollback Plan
{How to revert if this fix causes problems — typically: revert this commit}

Closes #{issue_number}

6.5: Ask what's next

PR #{number} created: {url}

What's next?
  [1] Fix another bug — /fix #N or /fix <description>
  [2] Start a feature — /planning <feature>
  [3] See project status — /onboard
  [4] Done for now

If autopilot is active, skip this and continue the loop.

Step 7: Knowledge Base Update

After the PR is created, update the knowledge base:

Create incident record:

# .claude/qa-knowledge/incidents/{date}-{slug}.md
---
status: covered
severity: {severity}
affected_files: [{file1}, {file2}]
root_cause: {description}
fix_pr: #{pr_number}
regression_test: {test_file}::{test_name}
created: {date}
---

## What happened
{Bug description}

## Root cause
{From Step 1}

## How QA missed it
{Why existing tests didn't catch this}

## Prevention
{What kind of test would have prevented this}

Update bug patterns:

# Append to .claude/qa-knowledge/bug-patterns.md
echo "### {date} — {title}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Area: {affected_area}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Pattern: {root_cause_pattern}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Prevention: {test_type}" >> .claude/qa-knowledge/bug-patterns.md

Update knowledge base (if .claude/knowledge/ exists):

Append to shared/conventions.md if the bug revealed a coding convention violation
Append to shared/domain.md if the bug revealed a business rule not in code
Append to agents/{agent-name}.md if the implementing agent learned something

Hotfix Mode (--hotfix)

For critical production bugs, the pipeline is compressed:

Step	Standard	Hotfix
1. Root Cause	Investigation Swarm (3 agents, debate)	Single backward-tracer (Sonnet, 5 min max)
2. Scope Lock	Full dependency graph	Direct file only — hop 0-1
3. Behavior Contract	Full table	Bug behavior + 3 critical paths only
4. Implement	Spawn agent team	Fix directly — single agent
4b. Fix Review	Review Swarm (2 agents, cross-examine)	Single fix-reviewer (no cross-examine)
5. Differential Test	Full suite	Critical path tests only
6. PR	Full template	Abbreviated — merge fast
7. Knowledge Base	Full update	Post-merge (don't block the fix)

Hotfix STILL requires:

Root cause identified (not guessed)
Fix review passed (Step 4b — never skipped, even in hotfix)
Regression test written
Regression test passes mutation check (fails without fix, passes with)
No PASS→FAIL in critical paths
Post-merge: deferred investigation swarm runs automatically. If the swarm disagrees with the hotfix root cause, a follow-up GitHub issue is created.

Severity Routing

Severity	QA Mode	Review	Merge
critical	Full QA + critical paths + E2E	Bug-specific + SRE review	Expedited
high	Full QA	Bug-specific review	Standard
medium	Commit mode QA	Standard review	Standard
low	Commit mode QA	Standard review	Batched

Cost Analysis

Mode	Agents	Estimated Cost	When
Hotfix (investigation)	1 Sonnet backward-tracer	~$0.005	`--hotfix` flag
Quick (Lite)	1 Sonnet analyst	~$0.007	Simple / obvious bug
Verified (Lite swarm)	1 Sonnet analyst + 1 Sonnet verifier	~$0.014	Moderate bugs
Full Swarm (investigation)	2 Sonnet + 1 Haiku + debate rounds	~$0.045	Complex / unclear bugs
Review Swarm (hotfix)	1 Sonnet fix-reviewer	~$0.006	`--hotfix` fix review
Review Swarm (standard)	2 Sonnet + cross-exam	~$0.018	Standard fix review
Full pipeline (standard)	All of the above	~$0.08 average	Default
Full pipeline (hotfix)	Compressed	~$0.021	`--hotfix`

Delta from previous single-agent pattern: ~3.8x more per full pipeline run. One prevented misdiagnosis (wrong root cause → wrong fix → debugging cycle → re-fix) saves the entire investigation and implementation time — the swarm pays for itself on the first catch.

Rules

Never fix without understanding. The root cause MUST be identified before any code change. "I'm not sure why this fixes it but it works" is NOT acceptable.
Root cause must be swarm-verified. Step 1.3 uses competing investigators to converge on the diagnosis. If investigators cannot converge, the user decides — don't guess.
Investigation swarm runs 3 rounds maximum. If no convergence after Round 3, escalate to user with all competing hypotheses. Never proceed on an unresolved hypothesis.
Every hypothesis must cite file:line evidence. Claims without code citations score zero. Agents that submit hypotheses without evidence are disregarded in scoring.
Every fix gets a review swarm. Step 4b is NEVER skipped, even in hotfix mode. A bad fix shipped fast is worse than a good fix shipped 2 minutes later.
Review swarm verdict requires agreement or explicit merge. The verdict merge table in Step 4b.4 governs — do not improvise verdict logic.
Hotfix always queues deferred investigation swarm for post-merge verification. If the swarm disagrees with the hotfix root cause, create a follow-up GitHub issue.
Scope lock is mandatory. Every fix has a Fix Zone and a Frozen Zone. No exceptions.
Behavior contract is mandatory. You must know what changes and what doesn't BEFORE fixing.
Regression test must pass mutation check. If the test passes without the fix, it's useless.
Differential test must show zero unexpected changes. Any PASS→FAIL blocks the PR.
Minimal changes only. A bug fix is not a refactor. Touch the least code possible.
Knowledge base update is not optional. Every bug teaches something. Capture it.
Similar past bugs must be checked. The knowledge base exists for a reason.
Hotfix still requires proof. Speed doesn't mean skipping correctness — it means compressing the pipeline, not removing steps.
Language-agnostic. All examples use Python/pytest for illustration. Adapt commands to the project's actual stack: npm test/jest/vitest for Node, go test ./... for Go, cargo test for Rust, dotnet test for C#, swift test for Swift. Read CLAUDE.md Test Commands table for the project's actual commands.

Other plugins with /fix

/fix

autoresearch/

Iteratively repairs code errors until zero remain via autonomous loop, applying one atomic fix per iteration with auto-revert on failure. Supports --target, --scope, --category, --iterations flags.

autoresearch

3.2k

/fix

Analyzes bug reports, hypothesizes root causes, investigates via subagents, recommends fixes with debug logging, and implements upon approval using TDD.

spectre

119

/fix

Applies a quick fix or small change to code based on description, with commit discipline in turbo mode skipping planning.

ReadWriteEdit+7

vbw

/fix

Detects and fixes bugs by analyzing error messages, stack traces, logs, or file paths. Proposes solutions, implements minimal fixes, validates, and suggests tests.

ai-pair-programming

/fix

Fixes bugs using test-driven approach with retry. Requires natural language bug description and supports flags like --regression-test, --hotfix, --max-retries.

/fix

Diagnoses bugs via reproduction, implements TDD fixes, and validates with QA/security gates using iterative retries up to 5 attempts per phase.

alfred-dev

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 5, 2026

Actions

View Source View Plugin View on GitHub View README

/fix — Formal Bug Fix Pipeline

A rigorous bug fix workflow grounded in formal methods. Every step is verifiable. Every guardrail is enforceable. No guessing.

The correctness pipeline:

Step	Formal Method	What It Proves
1. Root Cause Swarm	Competing Hypotheses + Adversarial Debate	Multiple strategies converge on the actual cause
2. Scope Lock	Graph Reachability	Fix only touches what it needs to
3. Behavior Contract	Hoare Logic	We know what should and shouldn't change
4. Implement Fix	Scoped Agent	Changes stay within the Fix Zone
4b. Review Swarm	Parallel Adversarial Review	Independent reviewer + QA attacker agree the fix is correct
5. Differential Test	Bisimulation + Mutation	Fix changes ONLY bug behavior, nothing else

Argument Parsing

Input	Action
`/fix POST /auth/refresh returns 500`	Free-text bug description
`/fix #123`	Load bug from GitHub issue
`/fix #123 --severity critical`	With explicit severity
`/fix #123 --hotfix`	Critical: expedited path (skip non-essential steps)
`/fix --severity high token refresh fails silently`	Severity + description

If no severity provided, assess automatically in Step 1.

Execution Steps

Step 0: Load Bug Context

If #N provided — load the GitHub issue:

gh issue view N --json title,body,labels,assignees,comments

Extract: description, reproduction steps, affected area, any labels (bug, critical, etc.)

If free-text — use as the bug description directly.

Read project context (Tier 0→2: read small files fully, grep directories before reading):

Read CLAUDE.md for tech stack, test commands, architecture (Tier 0)
Read .claude/project-profile.md if it exists (Tier 0)
Read .claude/knowledge/shared/domain.md if it exists (Tier 1)
Grep .claude/qa-knowledge/incidents/ for affected keywords (Tier 2 — selective)
Grep .claude/qa-knowledge/bug-patterns.md for affected keywords (Tier 2 — selective)
Check .claude/wikis/*/wiki/index.md for relevant domain context (Tier 1→2)

Check for similar past bugs:

# Search incidents by affected area (Tier 2 — grep-then-read)
grep -rl "{affected_keywords}" .claude/qa-knowledge/incidents/ 2>/dev/null
# Search bug patterns
grep -i "{affected_keywords}" .claude/qa-knowledge/bug-patterns.md 2>/dev/null
# Search topic wikis for domain context on affected area
for idx in .claude/wikis/*/wiki/index.md; do
  grep -li "{affected_keywords}" "$idx" 2>/dev/null
done

If similar past bugs found, show them:

⚠ Similar past incidents found:
  • 2026-03-15-token-expiry.md (severity: high, status: covered)
    Root cause: token expiry check used wrong timezone
  • 2026-02-28-auth-500.md (severity: critical, status: covered)
    Root cause: missing null check on session object

Step 1: Root Cause Analysis (Program Slicing)

Goal: Find the actual cause, not a symptom. This is NOT optional — no fix without understanding.

1.1: Reproduce the bug

From the description/issue, determine how to reproduce:

# If it's an API bug — try to hit the endpoint
curl -s -X POST http://localhost:8000/auth/refresh -H "Content-Type: application/json" \
  -d '{"refresh_token": "test"}' | head -20

# If it's a test failure — run the specific test
pytest tests/test_auth.py::test_refresh -v

# If it's a UI bug — check the relevant component

If reproduction fails, ask the user for reproduction steps.

1.2: Locate the symptom

Find where the error occurs:

# Search for the error message, status code, or exception
grep -rn "500\|Internal Server Error\|{error_message}" --include="*.py" --include="*.ts"

# Find the route/handler for the affected endpoint
grep -rn "{endpoint_pattern}" --include="*.py" --include="*.ts"

1.2b: Use explore-light for fast codebase search (cost: 1x Haiku)

Before manually grepping, spawn explore-light to quickly locate relevant code:

subagent_type: "explore-light"
prompt: "Find all code related to {symptom description}. I need:
1. The route/handler for {endpoint or feature}
2. All functions that call or are called by the handler
3. Related test files
4. Any error handling or validation in this code path
Return file paths and line numbers."

This is 60x cheaper than doing the search yourself (Haiku vs Opus). Use the results to focus your manual reading in Step 1.3 — don't read every file, read only what explore-light identified.

1.3: Investigation Swarm — depth selection

Before spawning investigators, auto-detect the appropriate investigation depth:

Signal	Level	Rationale
`--hotfix` flag	Skip (single backward trace)	Production fire
Traceback points to single line	Lite (1 round, 2 agents)	Obvious bug
Multiple possible causes / unclear reproduction	Full (3 rounds, 3 agents)	Need competing hypotheses
User says "this is simple"	Lite or skip	Trust user
Touches auth/payments/data	Full (always)	High-risk area

If auto-detection is uncertain, ask:

Bug complexity estimate: {simple | moderate | complex}

Investigation mode:
  [1] Quick — single analyst, fastest (simple bugs)
  [2] Verified — analyst + challenger (moderate bugs)
  [3] Swarm — 3 competing investigators with debate (complex/unclear bugs)
  [auto] Let me decide based on evidence

Your pick? [auto]

Team: fix-investigate-{slug}

Spawn these 3 agents in parallel:

Agent	subagent_type	model	Strategy
backward-tracer	code-reviewer	sonnet	Traces backward from symptom through call chain
forward-tracer	code-reviewer	sonnet	Checks git log for recent changes in affected area, traces forward
pattern-matcher	explore-light	haiku	Searches knowledge base (incidents/, bug-patterns.md) for similar patterns

Each agent receives: bug description, symptom location (file:line), reproduction steps, and access to CLAUDE.md / project-profile.md. They do NOT share conclusions with each other until Round 2.

Early termination conditions (check before proceeding to Round 2):

All 3 agents point to the same file:line after Round 1 → CONVERGE immediately, skip to verdict
A smoking gun exists (exception traceback with explicit file:line) → early converge, ask others to confirm

Round 1 — HYPOTHESIZE

Each agent submits independently to the orchestrator:

**Hypothesis**: {description of root cause}
**Location**: {file}:{line}
**Evidence**: {list of file:line citations supporting this hypothesis}
**Confidence**: high / medium / low
**Reasoning**: {how you traced from symptom to this cause}

Round 2 — CROSS-CHALLENGE (only if no early termination)

Orchestrator broadcasts all 3 hypotheses to all 3 agents. Each agent attempts to disprove the other two:

## Challenge to {agent}'s hypothesis
**Counter-evidence**: [file:line] — {why it contradicts}
**Verdict**: DISPROVED | WEAKENED | CONSISTENT

Round 3 — FINAL POSITION (only if no convergence after Round 2)

Each agent submits a final hypothesis, incorporating any valid challenges received.

Convergence scoring rubric:

Signal	Score
Evidence citation: verified file:line	+2 per citation
Evidence citation: refuted by challenge	-1 per refuted
Corroborated by another agent	+3
Contradicted with counter-evidence	-2
Pattern match from KB (incidents/, bug-patterns.md)	+2
Survived cross-challenge intact	+3
Hypothesis disproved by challenge	-3

Convergence conditions:

Outcome	When	Action
CONVERGE immediately	All 3 agree on same file:line (Round 1)	Proceed to Step 1.4
CONVERGE	2 agree, 3rd disproved (Round 2)	Adopt majority hypothesis
MERGE	2 agree, 3rd compatible (Round 2)	Merge hypotheses — combined view is more complete
CONVERGE	Highest score leads by 3+ points (Round 3)	Adopt leading hypothesis
ESCALATE	Scores within 2 points (Round 3)	Present all 3 to user — do not guess

Escalation prompt (if no convergence):

⚠ Investigation swarm could not converge — 3 competing hypotheses remain after debate:

  backward-tracer (score: {N}):
    Root cause: {hypothesis}
    Location: {file}:{line}
    Key evidence: {citations}

  forward-tracer (score: {N}):
    Root cause: {hypothesis}
    Location: {file}:{line}
    Key evidence: {citations}

  pattern-matcher (score: {N}):
    Root cause: {hypothesis}
    Location: {file}:{line}
    Key evidence: {citations}

Which should we pursue?
  [1] backward-tracer's diagnosis
  [2] forward-tracer's diagnosis
  [3] pattern-matcher's diagnosis
  [4] Investigate all — there may be multiple bugs

1.4: Identify root cause

The root cause is the earliest point in the dependency chain where behavior diverges from the specification. It's where "this code assumes X but X is not always true."

Document:

**Root Cause**: {precise description}
**Location**: {file}:{line}
**Why it breaks**: {what assumption is violated}
**When it breaks**: {specific conditions that trigger the bug}
**When it works**: {conditions under which the bug doesn't manifest}
**Swarm verdict**: {CONVERGE | MERGE | user-resolved} — {which agents agreed}

1.5: Assess severity (if not provided)

Signal	Severity
Production users affected NOW	critical
Data corruption or security	critical
Feature broken but workaround exists	high
Edge case, rare trigger	medium
Cosmetic, no functional impact	low

Step 2: Scope Lock (Graph Reachability)

Goal: Determine exactly which files the fix may touch and which are OFF LIMITS.

2.1: Build dependency graph from root cause

Starting from the root cause location, trace outward:

Hop 0 (ROOT CAUSE): The file/function with the bug
  → Read the file, identify the broken function

Hop 1 (DIRECT DEPENDENCIES): Files the root cause imports from or is called by
  → grep for imports in the root cause file
  → grep for who imports/calls the root cause function

Hop 2 (INDIRECT): Files that depend on hop-1 files
  → Same grep pattern, one level out

Hop 3+: Everything else

Execute with actual code analysis:

# Hop 0: root cause file
ROOT_FILE="backend/app/services/auth.py"

# Hop 1: what does root cause import?
grep -E "^from|^import" "$ROOT_FILE" | grep -v "^from typing\|^import os\|^import json"

# Hop 1: who imports root cause?
grep -rl "from.*auth import\|import.*auth" backend/ --include="*.py"

# Hop 2: extend one more level from hop 1 results
# (repeat the above for each hop 1 file)

2.2: Classify zones

Write the scope lock document — this is the guardrail for the entire fix:

## Scope Lock — {bug description}

### Fix Zone (ALLOWED — may modify)
{Files at hop 0-1 that need changes to fix the bug}
- {file}:{lines} — {why this needs to change}
- {file}:{lines} — {why this needs to change}
- {test_file} — regression test goes here

### Watch Zone (CAUTION — may be affected)
{Files at hop 2 that might be affected by the fix}
- {file} — {how it relates, what to watch for}

### Frozen Zone (BLOCKED — must NOT modify)
{Everything else — especially:}
- All files outside the dependency chain
- Migration files (unless root cause is in schema)
- Config files (unless root cause is in config)
- Unrelated services/modules
- Frontend (if backend bug) or vice versa

2.3: Set up enforcement

Save the scope lock to a file that the implementation step reads:

# Write scope lock for the implementation agents
cat > .claude/.fix-scope-lock.json << 'EOF'
{
  "bug": "{description}",
  "severity": "{severity}",
  "root_cause": "{file}:{line}",
  "fix_zone": ["{file1}", "{file2}", "{test_file}"],
  "watch_zone": ["{file3}", "{file4}"],
  "frozen_zone_patterns": ["migrations/*", "frontend/*", "*.lock"],
  "created_at": "{timestamp}"
}
EOF

The implementation agent reads this and is BLOCKED from editing files outside fix_zone. The pre-edit-guard hook can enforce this if .claude/.fix-scope-lock.json exists.

Step 3: Behavior Contract (Hoare Logic)

Goal: Define exactly what the fix should change and what must stay the same.

3.1: Extract existing contracts

For each function in the Fix Zone, extract its behavioral contract:

From type signatures:

# Reading: def refresh_token(token: str) -> TokenPair
# Contract: str → TokenPair (not str → Exception)

From existing passing tests:

# Run tests that touch Fix Zone files BEFORE the fix
# Each passing test = a behavioral contract that must be preserved
pytest tests/test_auth.py -v --tb=no 2>&1
# Output:
#   test_login_valid_creds PASSED        → contract: valid creds → token
#   test_login_invalid_creds PASSED      → contract: invalid creds → 401
#   test_refresh_valid_token FAILED      → THIS IS THE BUG
#   test_refresh_expired_token PASSED    → contract: expired → 401

From API schemas/validators:

# Check route definitions for response models
grep -A10 "@router.post.*refresh" backend/app/api/auth.py
# Check Pydantic models, Zod schemas, etc.

From database constraints:

# Check model definitions for invariants
grep -A20 "class.*Token\|class.*Session" backend/app/models/*.py

3.2: Build the behavior contract table

## Behavior Contract — {bug description}

### Behaviors That MUST CHANGE (fix verification)
| Input | Current (broken) | Expected (after fix) | Test |
|-------|-----------------|---------------------|------|
| refresh("valid_token") | 500 Internal Server Error | 200 + new TokenPair | WRITE NEW |
| refresh("tampered_token") | 500 Internal Server Error | 401 Unauthorized | WRITE NEW |

### Behaviors That MUST NOT CHANGE (regression protection)
| Input | Current (correct) | After Fix (same) | Test |
|-------|------------------|------------------|------|
| login("user", "pass") | 200 + TokenPair | 200 + TokenPair | EXISTING: test_login_valid |
| login("user", "wrong") | 401 | 401 | EXISTING: test_login_invalid |
| refresh("expired_token") | 401 | 401 | EXISTING: test_refresh_expired |
| GET /me with valid token | 200 + User | 200 + User | EXISTING: test_get_me |

### Invariants (always true, before and after)
- Tokens table: every token has a user_id (FK constraint)
- Sessions table: expired sessions are never returned by get_active_session()
- Auth: no endpoint returns 200 without valid authentication

3.3: Capture baseline test results

# Run ALL tests and save results — this is the "before" snapshot
# for differential testing in Step 5
#
# Use the project's ACTUAL test commands from CLAUDE.md Test Commands table
# or project-profile.md testing conventions. Examples for common stacks:
#
#   Python:  pytest --tb=line -q
#   Node:    npm test -- --watchAll=false
#   Go:      go test ./... -count=1
#   Rust:    cargo test
#   Ruby:    bundle exec rspec --format progress
#   Java:    ./gradlew test
#   Swift:   swift test
#   Flutter: flutter test
#
cd {project_root}

# Backend (use detected test command)
{backend_test_command} 2>&1 | tee /tmp/fix-baseline-backend.txt

# Frontend (if applicable, use detected test command)
{frontend_test_command} 2>&1 | tee /tmp/fix-baseline-frontend.txt

# Save pass/fail counts
echo "Baseline captured at $(date)" > .claude/.fix-baseline-summary.txt
grep -cE "passed|PASSED|ok|OK" /tmp/fix-baseline-backend.txt >> .claude/.fix-baseline-summary.txt || true
grep -cE "failed|FAILED|FAIL|ERROR" /tmp/fix-baseline-backend.txt >> .claude/.fix-baseline-summary.txt || true

Save to .claude/.fix-behavior-contract.md for the implementation and QA steps.

Step 4: Implement the Fix

Goal: Apply the minimal fix within the scope lock, write regression tests.

4.1: Determine implementation approach

Based on severity:

Severity	Approach
critical / --hotfix	Fix directly (no /planning, no team). Minimal change.
high	Fix directly with code-reviewer validation.
medium	Consider if `/planning` is needed for architectural changes.
low	Standard flow — may batch with other work.

4.2: Spawn implementation agent

Select the right agent based on which layer the bug is in:

Bug is in...	Spawn	subagent_type	Why
Backend (any language)	backend agent	`python-backend`	Adaptive — detects Python/Node/Go/Rust/Ruby/Java/Elixir at runtime
Frontend (any framework)	frontend agent	`frontend`	Adaptive — detects React/Vue/Svelte/Angular/etc. at runtime
Both layers	backend + frontend	Both agents	Scope lock keeps each in their zone
iOS/Android/Flutter/ML	custom agent (if /calibrate generated one)	Check `.claude/agents/` for project-specific agents	/calibrate creates these for non-web platforms
Unknown	general-purpose	`general-purpose`	Fallback — reads project-profile.md for context

4.2b: Escalation protocol for fix agents

Include this in the implementation agent's prompt:

## When Your Fix Doesn't Work (MANDATORY)

1. After first failed attempt: re-read the root cause analysis from Step 1.
   Is the root cause correct? If not, go back to Step 1.
2. After second failed attempt: consult knowledge base:
   - .claude/knowledge/qa-knowledge/ (error keywords)
   - .claude/knowledge/shared/conventions.md (project gotchas)
   - git log --all --grep="<error keyword>" --oneline -10
3. After third failed attempt: STOP. Do not try another fix.
   Generate a STUCK REPORT and send to team-lead:
   - Error: [exact message]
   - Root cause hypothesis: [from Step 1]
   - Fix attempts: [1, 2, 3 with results]
   - KB consultation results: [what you found]
   - Recommendation: [re-investigate root cause / ask user for X / try different approach]
4. If a troubleshooter agent is available, team-lead may spawn one.

Agent prompt includes:

1. Root cause analysis from Step 1
2. Scope lock from Step 2 (Fix Zone ONLY)
3. Behavior contract from Step 3
4. Project profile context (from .claude/project-profile.md if exists)
5. Knowledge base conventions (from .claude/knowledge/shared/conventions.md if exists)
6. Past incidents in affected files (from .claude/qa-knowledge/incidents/)

INSTRUCTIONS:
- You may ONLY modify files listed in the Fix Zone: {fix_zone_files}
- If you need to modify a file NOT in the Fix Zone, STOP and explain why
- The root cause is: {root_cause_description} at {file}:{line}
- Your fix must satisfy the Behavior Contract:
  - All "MUST CHANGE" rows must work after your fix
  - All "MUST NOT CHANGE" rows must still work after your fix
- Write regression tests that:
  a) Test the "MUST CHANGE" behaviors (prove the bug is fixed)
  b) Would FAIL if the fix were reverted (prove the test catches this bug)
- Keep changes minimal — this is a bug fix, not a refactor
- Follow the project's coding conventions from project-profile.md / knowledge base
- Use the project's test framework (detected from profile or CLAUDE.md, not assumed)

4.3: Scope guard validation

After the agent finishes, verify scope compliance:

# What files were actually modified?
CHANGED=$(git diff --name-only)

# Check each against scope lock
for file in $CHANGED; do
  if echo "{fix_zone_files}" | grep -q "$file"; then
    echo "✓ $file (in Fix Zone)"
  elif echo "{watch_zone_files}" | grep -q "$file"; then
    echo "⚠ $file (in Watch Zone — needs justification)"
  else
    echo "✗ $file (in FROZEN ZONE — VIOLATION)"
  fi
done

If ANY Frozen Zone file was modified → BLOCK and require explanation. If Watch Zone files were modified → WARN and require justification.

4.4: Regression test validation (Mutation Testing)

Verify the regression test actually catches the bug:

# 1. Save the fix
git diff > /tmp/fix.patch

# 2. Revert the fix temporarily
git checkout -- .

# 3. Run ONLY the new regression test — it MUST FAIL
pytest tests/test_auth_refresh_regression.py -v 2>&1
# Expected: FAIL (bug exists, test catches it)

# 4. Re-apply the fix
git apply /tmp/fix.patch

# 5. Run the regression test again — it MUST PASS
pytest tests/test_auth_refresh_regression.py -v 2>&1
# Expected: PASS (bug fixed, test confirms it)

If the test passes both with and without the fix → the test is USELESS. It doesn't actually verify the bug. Reject it and write a better test.

If the test fails both with and without the fix → the fix is BROKEN. The fix doesn't actually resolve the bug. Go back to Step 1.

Only valid result: FAIL without fix, PASS with fix.

Step 4b: Review Swarm (Parallel Adversarial Review)

Team: fix-review-{slug}

Spawn these 2 agents in parallel:

Agent	subagent_type	model	Role
fix-reviewer	code-reviewer	sonnet	7-point correctness checklist
qa-attacker	qa-challenger	sonnet	Generate 3-5 attack scenarios against this specific fix

4b.1: Spawn fix-reviewer

subagent_type: "code-reviewer"
model: "sonnet"
prompt: |
  ## Fix Review — Correctness Checklist

  A bug fix has been implemented. Review it for correctness and completeness.

  **Bug description**: {bug_description}
  **Root cause**: {root_cause} at {file}:{line}
  **Swarm verdict**: {convergence result from Step 1.3}

  **Scope lock**:
  - Fix Zone: {fix_zone_files}
  - Watch Zone: {watch_zone_files}

  **Behavior contract**:
  {behavior_contract_table_from_step_3}

  **The diff**:
  Run `git diff` to see exactly what changed.

  **Review checklist — answer each with ✓ or ✗ and explanation**:

  1. **Root cause addressed?** Does this fix actually address the identified root cause,
     or does it patch the symptom? Read the root cause, then read the diff — does the
     change fix WHY the bug happened, not just WHERE it manifested?

  2. **Complete fix?** Are there other code paths that have the same bug pattern?
     Search for similar patterns in the codebase — the bug may exist in multiple places.
     ```bash
     grep -rn "{pattern_from_root_cause}" --include="*.{ext}" | grep -v "{fixed_file}"
     ```

  3. **Scope respected?** Are all changed files within the Fix Zone? Any Watch Zone
     files touched without justification? Any Frozen Zone violations?

  4. **Behavior contract honored?** For each "MUST NOT CHANGE" row — does the diff
     risk changing that behavior? Trace each changed line's callers to check.

  5. **Edge cases?** Does the fix handle:
     - Null/undefined/empty inputs at the fix point?
     - Concurrent access (if applicable)?
     - Error paths (what if the fix itself throws)?

  6. **Regression test quality?** Read the new test(s):
     - Does the test actually exercise the bug scenario?
     - Would the test fail if the fix were reverted? (Don't run it — reason about it)
     - Does the test name clearly describe the bug it catches?

  7. **Subtle issues?** Any of these:
     - Type coercion or implicit conversion near the fix?
     - Changed error messages that other code might match on?
     - Modified function signatures that callers depend on?
     - Performance implications (e.g., added a DB query in a hot path)?

  **Output format**:

Fix Review Result

Check	Result	Notes
Root cause addressed	✓/✗	{explanation}
Complete fix	✓/✗	{similar patterns found?}
Scope respected	✓/✗	{any violations?}
Behavior contract	✓/✗	{any risks?}
Edge cases	✓/✗	{any missing?}
Test quality	✓/✗	{assessment}
Subtle issues	✓/✗	{any found?}

Verdict: APPROVE / REQUEST CHANGES / BLOCK Summary: {1-2 sentence overall assessment} {If REQUEST CHANGES or BLOCK: specific changes needed}

4b.2: Spawn qa-attacker (in parallel with fix-reviewer)

subagent_type: "qa-challenger"
model: "sonnet"
prompt: |
  ## Fix Attack — Adversarial QA

  A bug fix has been implemented. Your job is to find ways it could fail or cause regressions.
  Do NOT try to be fair — try to break it.

  **Bug description**: {bug_description}
  **Root cause**: {root_cause} at {file}:{line}
  **The diff**: Run `git diff` to see exactly what changed.

  Generate 3-5 attack scenarios targeting this specific fix. For each:


Focus attacks on:
- Inputs the fix doesn't handle (null, empty, boundary values)
- Concurrent or race conditions introduced
- Callers of the changed function with different usage patterns
- Side effects on Watch Zone files
- The regression test's blind spots (inputs not covered)
- Conditions where the fix itself could throw

4b.3: Cross-examination (1 round after both complete)

Orchestrator performs cross-examination:

Share fix-reviewer's verdict with qa-attacker: "The fix-reviewer gave verdict {APPROVE/REQUEST CHANGES/BLOCK}. Do your attacks change their assessment? Which of your HIGH attacks did they miss?"
Share qa-attacker's attacks with fix-reviewer: "The qa-attacker found these scenarios: {attacks}. Are any of these covered by your findings? Do any change your verdict?"

4b.4: Verdict merge

fix-reviewer	qa-attacker	Merged Verdict
APPROVE	0 HIGH attacks	APPROVE
APPROVE	1+ HIGH attacks	REQUEST CHANGES
REQUEST CHANGES	any	REQUEST CHANGES (union of feedback)
BLOCK	any	BLOCK
APPROVE (after cross-exam)	attacks downgraded	APPROVE

4b.5: Handle merged verdict

Verdict	Action
APPROVE	Proceed to Step 5 (Differential Testing).
REQUEST CHANGES	Send combined feedback (checklist findings + unaddressed attacks) to the implementing agent. Re-run Step 4b after changes. Max 2 rounds — if not resolved, escalate to user.
BLOCK	STOP. Present the blocker to the user. Common blockers: fix addresses symptom not root cause, same bug pattern exists elsewhere unfixed, scope violation.

4b.6: Check for incomplete fixes (same pattern elsewhere)

If fix-reviewer found similar patterns (check #2), decide:

Reviewer found the same bug pattern in 2 other files:
  - backend/app/services/session.py:45 — same missing null check
  - backend/app/services/token.py:78 — same missing null check

Options:
  [1] Fix all instances now (expand Fix Zone)
  [2] Fix only the reported bug, create issues for the others
  [3] Ask user

Default to option 2 for medium severity. Default to option 1 for critical/high. Always inform the user of the other instances regardless.

Cost: Two Sonnet agent calls + 1 cross-examination round (~2-4 min). The attacker surfaces attack vectors the reviewer's structured checklist would miss.

Step 5: Differential Testing (Bisimulation Proof)

Goal: Prove the fix changes ONLY bug behavior, nothing else.

5.1: Run full test suite after fix

# Use the SAME test commands as the baseline capture (Step 3.3)
# Backend (detected test command)
{backend_test_command} 2>&1 | tee /tmp/fix-after-backend.txt

# Frontend (if applicable, detected test command)
{frontend_test_command} 2>&1 | tee /tmp/fix-after-frontend.txt

5.2: Diff against baseline

# Compare before vs after
diff /tmp/fix-baseline-backend.txt /tmp/fix-after-backend.txt

Expected results:

Change	Meaning	Action
FAIL → PASS (bug tests)	Bug is fixed	✓ Expected
New tests PASS	Regression tests work	✓ Expected
No change (all other tests)	No side effects	✓ Expected
PASS → FAIL (unrelated test)	FIX HAS SIDE EFFECTS	✗ Investigate — fix broke something else
PASS → FAIL (related test)	Fix may be incomplete or wrong	✗ Investigate — may need broader fix

If any PASS → FAIL: The fix is NOT safe. Either:

The fix introduced a new bug (scope violation)
The fix changed behavior that something else depends on (contract violation)
An existing test was fragile/flaky (investigate)

Do NOT proceed to PR until all PASS→FAIL cases are resolved.

5.3: Behavior contract verification

Check each row of the behavior contract table:

✓ MUST CHANGE: refresh("valid_token") → 200 + TokenPair (was 500)
✓ MUST CHANGE: refresh("tampered_token") → 401 (was 500)
✓ MUST NOT CHANGE: login("user", "pass") → 200 + TokenPair (unchanged)
✓ MUST NOT CHANGE: login("user", "wrong") → 401 (unchanged)
✓ MUST NOT CHANGE: refresh("expired_token") → 401 (unchanged)
✓ MUST NOT CHANGE: GET /me → 200 + User (unchanged)
✓ INVARIANT: All tokens have user_id (FK intact)

Step 6: Post-Fix Workflow

After all verification steps pass, this is the coordinated handoff to PR. No redundant QA runs — Step 5 already ran the full test suite for differential testing.

6.1: Ask user what QA level to run

Step 5 ran the full test suite for differential comparison, but that's not the same as a proper QA pass. Ask the user:

Fix verified. What level of QA should I run before PR?

  [1] commit  — targeted checks on changed files only (~1-3 min)
  [2] full    — comprehensive checks across full codebase (~10-20 min)
  [3] skip    — differential test was enough, go straight to PR

Default: commit (recommended — catches issues differential testing misses like lint, types, domain rules)

Store choice as QA_LEVEL.

6.2: Run /qa at chosen level

If user chose skip, skip to Step 6.3.

Otherwise, invoke /qa {QA_LEVEL} via the Skill tool:

Run /qa {QA_LEVEL}
If QA fails — show failures. Ask: "Fix these? [yes / skip / abort]"
- yes → fix issues, re-run QA
- skip → proceed (user takes responsibility)
- abort → stop
If QA passes — continue

Record QA results as QA_RESULT for the PR body.

6.3: Restart local servers and ask user to test

Check if local services exist — read CLAUDE.md ## Local Dev Services table.

If the table exists and has entries:

Restart services using the Start Command from each row
Or invoke /restart if the project has that skill configured

After restart (or if no local services), ask the user:

Local servers restarted. Please test the fix manually — verify the bug is gone
and nothing else broke.

When you're done:
  [ready]  — looks good, proceed to PR
  [issues] — found problems (describe them)

If issues: Address problems, re-run /qa commit only, ask again. If ready: Proceed to Step 6.4.

6.4: Create PR via /pr --skip-qa

Invoke /pr --skip-qa with bug-specific PR content. The --skip-qa flag tells /pr to skip its own /qa commit since QA already ran in Step 6.2.

/pr will only:

Run a quick sanity check (lint + type check only — no full QA)
Stage and commit
Find or create GitHub issue
Push to feature branch
Open PR with the bug fix template below
Return PR URL

PR template for bug fixes (passed to /pr):

## Bug Fix: {title}

**Severity**: {critical|high|medium|low}
**Root Cause**: {precise description from Step 1}
**Location**: {file}:{line}

### What was broken
{Description of the bug behavior}

### Why it was broken
{Root cause explanation — what assumption was violated}

### What this fix does
{Description of the changes — minimal, precise}

### Scope
- Fix Zone: {files modified}
- Watch Zone: {files checked but not modified}
- Frozen Zone: {verified untouched}

### Behavior Contract
| Behavior | Before | After |
|----------|--------|-------|
| {bug behavior} | {broken} | {fixed} |
| {preserved behavior} | {same} | {same} |

### Test Evidence
- Root cause: investigation swarm converged (Step 1.3) — {convergence type}
- Fix review: approved by review swarm (Step 4b) — fix-reviewer + qa-attacker
- Regression test: `{test_file}::{test_name}`
  - ✓ Fails without fix (catches the bug)
  - ✓ Passes with fix (confirms the fix)
- Differential test: {N} tests unchanged, {M} tests fixed, 0 regressions
- QA: {QA_LEVEL} mode — {QA_RESULT summary}
- User testing: confirmed manually

### Rollback Plan
{How to revert if this fix causes problems — typically: revert this commit}

Closes #{issue_number}

6.5: Ask what's next

PR #{number} created: {url}

What's next?
  [1] Fix another bug — /fix #N or /fix <description>
  [2] Start a feature — /planning <feature>
  [3] See project status — /onboard
  [4] Done for now

If autopilot is active, skip this and continue the loop.

Step 7: Knowledge Base Update

After the PR is created, update the knowledge base:

Create incident record:

# .claude/qa-knowledge/incidents/{date}-{slug}.md
---
status: covered
severity: {severity}
affected_files: [{file1}, {file2}]
root_cause: {description}
fix_pr: #{pr_number}
regression_test: {test_file}::{test_name}
created: {date}
---

## What happened
{Bug description}

## Root cause
{From Step 1}

## How QA missed it
{Why existing tests didn't catch this}

## Prevention
{What kind of test would have prevented this}

Update bug patterns:

# Append to .claude/qa-knowledge/bug-patterns.md
echo "### {date} — {title}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Area: {affected_area}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Pattern: {root_cause_pattern}" >> .claude/qa-knowledge/bug-patterns.md
echo "- Prevention: {test_type}" >> .claude/qa-knowledge/bug-patterns.md

Update knowledge base (if .claude/knowledge/ exists):

Append to shared/conventions.md if the bug revealed a coding convention violation
Append to shared/domain.md if the bug revealed a business rule not in code
Append to agents/{agent-name}.md if the implementing agent learned something

Hotfix Mode (--hotfix)

For critical production bugs, the pipeline is compressed:

Step	Standard	Hotfix
1. Root Cause	Investigation Swarm (3 agents, debate)	Single backward-tracer (Sonnet, 5 min max)
2. Scope Lock	Full dependency graph	Direct file only — hop 0-1
3. Behavior Contract	Full table	Bug behavior + 3 critical paths only
4. Implement	Spawn agent team	Fix directly — single agent
4b. Fix Review	Review Swarm (2 agents, cross-examine)	Single fix-reviewer (no cross-examine)
5. Differential Test	Full suite	Critical path tests only
6. PR	Full template	Abbreviated — merge fast
7. Knowledge Base	Full update	Post-merge (don't block the fix)

Hotfix STILL requires:

Root cause identified (not guessed)
Fix review passed (Step 4b — never skipped, even in hotfix)
Regression test written
Regression test passes mutation check (fails without fix, passes with)
No PASS→FAIL in critical paths
Post-merge: deferred investigation swarm runs automatically. If the swarm disagrees with the hotfix root cause, a follow-up GitHub issue is created.

Severity Routing

Severity	QA Mode	Review	Merge
critical	Full QA + critical paths + E2E	Bug-specific + SRE review	Expedited
high	Full QA	Bug-specific review	Standard
medium	Commit mode QA	Standard review	Standard
low	Commit mode QA	Standard review	Batched

Cost Analysis

Mode	Agents	Estimated Cost	When
Hotfix (investigation)	1 Sonnet backward-tracer	~$0.005	`--hotfix` flag
Quick (Lite)	1 Sonnet analyst	~$0.007	Simple / obvious bug
Verified (Lite swarm)	1 Sonnet analyst + 1 Sonnet verifier	~$0.014	Moderate bugs
Full Swarm (investigation)	2 Sonnet + 1 Haiku + debate rounds	~$0.045	Complex / unclear bugs
Review Swarm (hotfix)	1 Sonnet fix-reviewer	~$0.006	`--hotfix` fix review
Review Swarm (standard)	2 Sonnet + cross-exam	~$0.018	Standard fix review
Full pipeline (standard)	All of the above	~$0.08 average	Default
Full pipeline (hotfix)	Compressed	~$0.021	`--hotfix`

Rules

Never fix without understanding. The root cause MUST be identified before any code change. "I'm not sure why this fixes it but it works" is NOT acceptable.
Root cause must be swarm-verified. Step 1.3 uses competing investigators to converge on the diagnosis. If investigators cannot converge, the user decides — don't guess.
Investigation swarm runs 3 rounds maximum. If no convergence after Round 3, escalate to user with all competing hypotheses. Never proceed on an unresolved hypothesis.
Every hypothesis must cite file:line evidence. Claims without code citations score zero. Agents that submit hypotheses without evidence are disregarded in scoring.
Every fix gets a review swarm. Step 4b is NEVER skipped, even in hotfix mode. A bad fix shipped fast is worse than a good fix shipped 2 minutes later.
Review swarm verdict requires agreement or explicit merge. The verdict merge table in Step 4b.4 governs — do not improvise verdict logic.
Hotfix always queues deferred investigation swarm for post-merge verification. If the swarm disagrees with the hotfix root cause, create a follow-up GitHub issue.
Scope lock is mandatory. Every fix has a Fix Zone and a Frozen Zone. No exceptions.
Behavior contract is mandatory. You must know what changes and what doesn't BEFORE fixing.
Regression test must pass mutation check. If the test passes without the fix, it's useless.
Differential test must show zero unexpected changes. Any PASS→FAIL blocks the PR.
Minimal changes only. A bug fix is not a refactor. Touch the least code possible.
Knowledge base update is not optional. Every bug teaches something. Capture it.
Similar past bugs must be checked. The knowledge base exists for a reason.
Hotfix still requires proof. Speed doesn't mean skipping correctness — it means compressing the pipeline, not removing steps.
Language-agnostic. All examples use Python/pytest for illustration. Adapt commands to the project's actual stack: npm test/jest/vitest for Node, go test ./... for Go, cargo test for Rust, dotnet test for C#, swift test for Swift. Read CLAUDE.md Test Commands table for the project's actual commands.