From armory
Conducts hypothesis-driven debugging: ranks hypotheses, designs git bisections, plans instrumentation, creates minimal reproductions for non-obvious bugs.
npx claudepluginhub mathews-tom/armory --plugin armoryThis skill uses the workspace's default tool permissions.
Structured debugging methodology that replaces ad-hoc exploration with hypothesis-driven
Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
Structured debugging methodology that replaces ad-hoc exploration with hypothesis-driven investigation. Captures symptoms, analyzes evidence (stacktraces, logs, state), generates ranked hypotheses, designs bisection strategies, identifies instrumentation points, and produces minimal reproductions — documenting every step so dead ends are never revisited.
When to use this skill vs native debugging: The base model handles straightforward debugging (clear stacktraces, obvious errors) natively. Use this skill for non-obvious bugs requiring systematic investigation: intermittent failures, bugs with no clear stacktrace, performance regressions, or issues requiring git bisection and hypothesis ranking.
| File | Contents | Load When |
|---|---|---|
references/stacktrace-patterns.md | Exception taxonomy, traceback reading, common Python/JS error signatures | Stacktrace or exception present |
references/hypothesis-templates.md | Bug category catalog, probability ranking, confirmation/refutation tests | Always |
references/bisection-guide.md | git bisect workflow, binary search debugging, narrowing techniques | Bug appeared after a change |
references/log-analysis.md | Log pattern extraction, anomaly detection, timeline correlation | Log output available |
references/instrumentation-points.md | Strategic logging placement, breakpoint strategy, state inspection techniques | Investigation plan needed |
Before touching code, document the observable problem:
KeyError('user_id') on line 42 of auth.py when calling
get_current_user() with a valid session token" is actionable.git log --oneline -20.
If the bug appeared after a specific commit, bisection is the fastest path.Examine all available evidence before forming hypotheses:
Stacktrace interpretation — If a traceback exists, read it bottom-up. The last frame is where the error manifested, but the cause is often several frames up. Identify:
references/stacktrace-patterns.md)Log pattern extraction — Search logs for:
State inspection — If the system is running, inspect:
Code diff analysis — If the bug is recent:
git diff HEAD~5 — what changed?Generate ranked hypotheses — never start fixing without a hypothesis:
List 3-5 hypotheses ranked by likelihood. Each hypothesis must include:
Rank by likelihood using:
Common bug categories (see references/hypothesis-templates.md):
Design specific steps to test each hypothesis:
git bisect start <bad> <good>references/bisection-guide.md for workflowreferences/instrumentation-points.mdExecute the investigation plan, updating hypotheses as evidence arrives:
After finding the root cause:
## Debug Investigation: {Brief Description}
### Symptom
**Observed:** {What is happening — precise description}
**Expected:** {What should happen}
**Reproducibility:** {Always | Intermittent (~N% of attempts) | Once}
**First noticed:** {Date/time or triggering event}
**Environment:** {Relevant versions and configuration}
### Evidence Analysis
#### Stacktrace
- **Exception:** {type}: {message}
- **Origin:** {file}:{line} in {function}
- **Call chain:** {caller} → {caller} → {failure point}
- **Key insight:** {What the traceback reveals about the cause}
#### Logs
- **Anomaly:** {What is unusual}
- **Timeline:** {When the anomaly started}
- **Correlation:** {Related events}
#### Code Changes
- **Recent commits:** {relevant commits since last known-good state}
- **Files in error path:** {which changed files appear in the traceback}
### Hypotheses
| # | Hypothesis | Likelihood | Confirming Test | Refuting Test |
|---|------------|------------|-----------------|---------------|
| H1 | {Specific claim} | High | {What to check} | {What would disprove} |
| H2 | {Specific claim} | Medium | {What to check} | {What would disprove} |
| H3 | {Specific claim} | Low | {What to check} | {What would disprove} |
### Investigation Plan
#### Step 1: Test H1 — {action}
- **Command/action:** {specific step}
- **If confirmed:** {next action — fix}
- **If refuted:** proceed to Step 2
#### Step 2: Bisection
- **Good commit:** {hash}
- **Bad commit:** {hash}
- **Test:** {command to verify each commit}
- **Command:** `git bisect start {bad} {good}`
#### Step 3: Isolation
- **Remove:** {variable to eliminate}
- **Expected change:** {what should happen}
### Instrumentation Points
1. {file}:{line} — log {variable/state} to observe {what}
2. {file}:{line} — breakpoint to inspect {what}
### Minimal Reproduction
```{language}
# Minimal code that triggers the bug
{code}
Root cause: {What was wrong} Fix: {What was changed — file:line, diff summary} Prevention: {Test added, lint rule, type annotation, etc.} Lessons: {What generalizes beyond this bug}
## Configuring Scope
| Mode | Scope | Depth | When to Use |
|------|-------|-------|-------------|
| `quick` | Single error | H1 test + fix | Clear stacktrace, obvious cause |
| `standard` | Full investigation | 3 hypotheses + bisection plan | Default for non-obvious bugs |
| `deep` | Systemic analysis | 5+ hypotheses + instrumentation + reproduction | Intermittent bugs, no stacktrace, production issues |
## Calibration Rules
1. **Hypotheses before code changes.** Never start modifying code without at least one
explicit hypothesis. "Let me try this" is not debugging — it's guessing.
2. **One variable at a time.** Each investigation step should change exactly one thing.
If you change two things and the bug disappears, you don't know which fixed it.
3. **Document dead ends.** Failed hypotheses are valuable — they narrow the search space.
Record what was tested and what was learned.
4. **Simplest explanation first.** Test typos, wrong variable names, and missing imports
before considering race conditions, compiler bugs, or cosmic rays.
5. **Reproduce before fixing.** If you cannot reproduce the bug in a controlled environment,
any fix is speculative. Invest in reproduction first.
6. **Root cause, not symptoms.** A fix that addresses the symptom (adding a null check)
without understanding the root cause (why was it null?) leaves the real bug alive.
## Error Handling
| Problem | Resolution |
|---------|------------|
| No stacktrace available | Focus on log analysis and state inspection. Use instrumentation to generate diagnostic output. |
| Bug is intermittent | Add persistent logging at key decision points. Run under stress (high load, concurrent requests) to increase reproduction rate. |
| Cannot reproduce locally | Compare environments systematically: versions, config, data, timing. Use `docker` or VM to mirror production. |
| Multiple hypotheses equally likely | Design a single test that distinguishes between them. Binary decision: "If X, then H1; if Y, then H2." |
| Fix attempted but bug persists | The hypothesis was wrong. Revert the fix, update hypothesis rankings, and proceed to the next hypothesis. Do not stack fixes. |
| Bug is in a dependency | Confirm with a minimal reproduction that uses only the dependency. Check issue trackers. Pin to last known-good version while awaiting upstream fix. |
## When NOT to Investigate
Push back if:
- The error message already contains the fix ("missing module X" → install X)
- The issue is a known environment setup problem (wrong Python version, missing env var)
- The "bug" is actually a feature request or design disagreement — redirect to ADR or discussion
- The code is not under the user's control (third-party SaaS, managed service) — file a support ticket instead
- The user wants to debug generated/minified code — debug the source, not the output