From claude-swe-workflows
Proactively hunts bugs by assessing codebase risks via complexity, coverage gaps, and structural analysis, then writes reproducing tests for high-risk hotspots. Use before releases for confirmed issues.
npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflowsThis skill uses the workspace's default tool permissions.
Systematically hunts for bugs before they reach users. An assessor analyzes the codebase to identify high-risk hotspots by cross-referencing code complexity, test coverage gaps, and structural risk factors. Focused hunters then deep-dive into each hotspot, writing reproducing tests to validate or invalidate suspected bugs.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Systematically hunts for bugs before they reach users. An assessor analyzes the codebase to identify high-risk hotspots by cross-referencing code complexity, test coverage gaps, and structural risk factors. Focused hunters then deep-dive into each hotspot, writing reproducing tests to validate or invalidate suspected bugs.
This is deliberately thorough. Each suspected bug gets a reproducing test — no speculative reports. The goal is confirmed findings with evidence, not a noisy list of maybes.
┌──────────────────────────────────────────────────────┐
│ BUG HUNT WORKFLOW │
├──────────────────────────────────────────────────────┤
│ 1. Determine scope │
│ 2. Spawn assessor (risk analysis) │
│ └─ Output: ranked hotspot list + coverage map │
│ 3. For each hotspot: │
│ └─ Spawn hunter (investigation + repro tests) │
│ └─ Prior findings passed to subsequent hunters │
│ 4. Synthesize findings │
│ 5. Present consolidated findings to user │
│ 6. Optionally route findings to fixers │
│ 7. Optionally commit reproducing tests │
└──────────────────────────────────────────────────────┘
Default: Production code only. Excluded by default:
Inform the user of these exclusions.
Ask the user:
User concerns influence prioritization but don't replace systematic analysis.
Spawn a swe-bug-assessor agent:
You are the risk assessor for a proactive bug hunt. Your analysis will guide
focused investigators who will deep-dive into the hotspots you identify.
Scope: [entire codebase | user-specified scope]
User concerns: [any areas mentioned, or "none specified"]
Exclusions: [test code, vendored code, generated code, plus any user additions]
Perform your full methodology:
1. Map the codebase — language, framework, structure, entry points
2. Coverage analysis — use instrumented coverage if available, fall back to
manual inspection
3. Complexity analysis — identify functions with high cognitive complexity
4. Structural risk analysis — error handling gaps, input validation gaps,
shared mutable state, resource management issues, concurrency risks,
edge case blindness, consistency gaps
5. Git enrichment (optional) — churn hotspots, recent large changes
6. Cross-reference signals and produce a ranked hotspot list
Focus on hotspots where MULTIPLE signals converge — complex AND untested AND
structurally risky. Single-signal hotspots are lower priority.
Output your full assessment in your standard format.
When the assessor reports back: Review the hotspot list. This drives the investigation phase.
For each hotspot in the assessor's list (ALL priorities), spawn a dedicated swe-bug-hunter agent:
You are a focused bug hunter investigating a specific hotspot.
## YOUR HOTSPOT
Target: [from assessor's report]
Files: [from assessor's report]
Risk signals: [from assessor's report]
Hypothesis: [from assessor's report]
Investigation approach: [from assessor's report]
## PRIOR FINDINGS (if any)
[Findings from previous hunters — confirmed bugs, patterns observed]
## YOUR MISSION
Deep-dive into this hotspot. Systematically probe for bugs. For each
suspected issue, write a reproducing test that encodes the correct expected
behavior.
- If the test FAILS: bug confirmed. Keep the test. Document the finding.
- If the test PASSES: hypothesis invalidated. Evaluate whether the test
improves coverage:
- Covers a previously untested path → keep it
- Redundant with existing tests → delete it
Every confirmed finding must have a reproducing test. No speculative reports.
Note any patterns that might apply to other hotspots.
Run hunters sequentially, not in parallel. Each hunter's findings and pattern observations are passed to the next. This enables cross-hotspot pattern detection — if hunter 2 finds that error handling is broken in module A, hunter 5 (investigating module B which shares error-handling utilities) gets that context.
Pass prior findings to each new hunter. As findings accumulate, each subsequent hunter receives confirmed bugs and observed patterns from previous investigations.
After all hunters have reported, synthesize:
Cross-cutting analysis:
Pattern escalation:
/refactorCompile all findings into a single report:
## Bug Hunt Summary
Scope: [what was analyzed]
Assessment: [N hotspots identified across X files]
Hotspots investigated: [N]
Confirmed bugs: N (X critical, Y high, Z medium, W low)
Coverage improvements: N tests added
Systemic patterns: N
## CONFIRMED BUGS
### CRITICAL
- **[file:line — description]**
- Bug: [concrete description]
- Root cause: [why it exists]
- Impact: [what happens in practice]
- Reproducing test: [test file:test name]
- Fix guidance: [what needs to change]
### HIGH
[same format]
### MEDIUM
[same format]
### LOW
[same format]
## SYSTEMIC PATTERNS
[Cross-cutting issues observed across multiple hotspots]
- [pattern] — observed in [locations] — suggests [recommendation]
## COVERAGE IMPROVEMENTS
[Tests added that didn't find bugs but improved coverage]
- [test name] in [file] — covers [what]
## SUSPECTED BUT UNCONFIRMED
[Issues suspected but not validated with tests — lower confidence]
- [description] — couldn't test because [reason]
## AREAS NOT INVESTIGATED
[Hotspots deprioritized or areas outside scope that may warrant future attention]
Present to user interactively. Walk through CRITICAL findings first. For each, explain the bug, the impact, and show the reproducing test. Let the user ask questions before moving on.
After presenting findings, ask: "Would you like to route confirmed bugs to agents for fixing?"
If yes:
qa-engineer to verify:
If no: The report and reproducing tests stand on their own.
If the user does not route to fixers (or after fixes are complete), ask: "Would you like to commit the reproducing tests? They document the bugs and improve coverage."
If yes:
If no: Leave tests uncommitted for the user to handle.
Sequential execution within investigation phase. The assessor runs first, then hunters run sequentially so findings accumulate for pattern detection.
Fresh instances for every agent. Each agent gets a clean context window dedicated entirely to its task.
State to maintain (as orchestrator):
Abort investigation of a hotspot:
Abort entire workflow:
Do NOT abort for:
Relationship to /review-security:
/review-security is security-focused — blue team + red team methodology/bug-hunt targets correctness bugs — logic errors, edge cases, missing error handling/review-security asks "can an attacker exploit this?" while /bug-hunt asks "will this fail for a normal user?"Relationship to /bug-fix:
/bug-fix is reactive — fixes a known, reported bug/bug-hunt is proactive — finds bugs before they're reported/bug-fix for thorough remediation of complex issuesRelationship to /review-test:
/review-test focuses on test quality — coverage gaps, brittle tests, missing fuzz tests/bug-hunt uses coverage data as one input signal but focuses on finding actual bugs, not improving test quality/bug-hunt are a side effect, not the primary goalRelationship to /refactor:
/bug-hunt (e.g., "inconsistent error handling across 15 modules") may warrant a follow-up /refactor/bug-hunt identifies the pattern; /refactor fixes it systematically> /bug-hunt
What is the scope of the hunt?
> Focus on the payment processing module — we've had some edge case reports
Anything you're particularly worried about?
> Currency conversion and rounding — we support 30+ currencies now
Anything to skip beyond the defaults?
> No, defaults are fine
Starting proactive bug hunt...
[Phase 1 — Risk Assessment]
Spawning assessor...
Assessment report:
Coverage: 67% line coverage (instrumented via go test -cover)
Hotspots identified: 8 (3 critical, 3 high, 2 medium)
CRITICAL-1: payment/converter.go:ConvertAmount (lines 45-112)
Signals: 0% test coverage + deep nesting (6 levels) + floating-point
arithmetic
Hypothesis: Currency conversion may lose precision or handle edge
currencies incorrectly
CRITICAL-2: payment/checkout.go:ProcessCheckout (lines 23-89)
Signals: Partial coverage (happy path only) + error handling
inconsistency + 3 bug-fix commits in last month
Hypothesis: Error paths may leave order in inconsistent state
CRITICAL-3: payment/refund.go:CalculateRefund (lines 15-78)
Signals: No test coverage + complex conditional logic + shared mutable
state (order object)
Hypothesis: Partial refund calculations may be incorrect for multi-item
orders
HIGH-1: payment/currency/rates.go:FetchRates (lines 30-67)
Signals: No error path tests + external API dependency + no timeout
handling
...
[Phase 2 — Focused Investigation]
Spawning hunter for CRITICAL-1 (ConvertAmount)...
Test 1: TestConvertAmount_ZeroCurrencyPrecision — FAIL
Bug confirmed: JPY (0-decimal currency) conversion multiplies by 100
then divides by 100, losing the original integer value for odd amounts.
Impact: ¥101 → ¥100 (1 yen lost per odd-amount transaction)
Test 2: TestConvertAmount_SameCurrency — PASS
Kept: covers previously untested identity conversion path
Test 3: TestConvertAmount_NegativeAmount — FAIL
Bug confirmed: Negative amounts (credits/adjustments) bypass validation
and produce positive conversion results due to Abs() call without
sign restoration.
Impact: -$10.00 credit → +€8.50 charge
Test 4: TestConvertAmount_UnknownCurrency — PASS
Kept: covers error path for unsupported currency codes
Findings: 2 confirmed bugs, 2 coverage improvements
Spawning hunter for CRITICAL-2 (ProcessCheckout)...
Test 1: TestProcessCheckout_PaymentFailureCleanup — FAIL
Bug confirmed: When payment gateway returns error after inventory was
reserved, inventory reservation is not released. Order stuck in
"processing" state.
Impact: Phantom inventory holds that never clear (requires manual DB fix)
Pattern noted: cleanup-on-error-path is missing in 3 other functions
in this package (passed to next hunter)
...
[Phase 3 — Synthesis]
Confirmed bugs: 7 (3 critical, 3 high, 1 medium)
Coverage improvements: 9 tests added
Systemic pattern: Error-path cleanup is missing in 5 of 12 functions
that reserve resources — this is a codebase-wide pattern, not isolated.
## Bug Hunt Summary
[Full report...]
Would you like to route confirmed bugs to agents for fixing?
> Yes, fix the criticals
[Routing CRITICAL bugs to Go SME...]
[Reproducing tests serve as acceptance criteria — fix is done when they pass]