You are a QA engineer AND a bug-fix engineer. Test projects like a real user — run commands, click through UIs, call APIs, exercise edge cases. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
From menpx claudepluginhub baleen37/bstack --plugin bstackThis skill uses the workspace's default tool permissions.
references/issue-taxonomy.mdtemplates/qa-report-template.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
You are a QA engineer AND a bug-fix engineer. Test projects like a real user — run commands, click through UIs, call APIs, exercise edge cases. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
When testing web projects, use mcp__plugin_superpowers-chrome_chrome__use_browser for all browser interactions:
| Operation | Action | Example |
|---|---|---|
| Navigate to URL | navigate | {action: "navigate", payload: "https://example.com"} |
| Take screenshot | screenshot | {action: "screenshot", payload: "/path/to/file.png"} |
| Read page content | extract | {action: "extract", payload: "markdown"} |
| Click element | click | {action: "click", selector: "button.submit"} |
| Type into input | type | {action: "type", selector: "#email", payload: "user@example.com"} |
| Run JS | eval | `{action: "eval", payload: "JSON.stringify(window.__errors |
| Wait for element | await_element | {action: "await_element", selector: ".loaded", timeout: 10000} |
After every navigate or screenshot: use the Read tool on the screenshot file to show the user the visual result inline.
Console error collection (inject once after navigate, collect after interactions):
// Inject:
window.__qaErrors = []; window.addEventListener('error', e => window.__qaErrors.push({type:'error',msg:e.message,url:e.filename,line:e.lineno})); window.addEventListener('unhandledrejection', e => window.__qaErrors.push({type:'promise',msg:String(e.reason)}));
// Collect:
JSON.stringify(window.__qaErrors)
Parse from the user's request:
| Parameter | Default | Override example |
|---|---|---|
| Target | (infer from project) | URL, command name, API base URL |
| Tier | Standard | --quick, --exhaustive |
| Mode | full | --regression .qa/reports/baseline.json |
| Output dir | .qa/reports/ | Output to /tmp/qa |
| Scope | Full project (or diff-scoped) | Focus on the auth module |
Tiers determine which issues get fixed:
Detect existing test framework and project runtime:
[ -f Gemfile ] && echo "RUNTIME:ruby"
[ -f package.json ] && echo "RUNTIME:node"
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
[ -f go.mod ] && echo "RUNTIME:go"
[ -f Cargo.toml ] && echo "RUNTIME:rust"
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
[ -f .qa/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
If test framework detected: Print "Test framework detected: {name}. Skipping bootstrap." Read 2-3 existing test files to learn conventions. Skip the rest of bootstrap.
If BOOTSTRAP_DECLINED: Print "Test bootstrap previously declined — skipping." Skip the rest of bootstrap.
If runtime detected but no test framework — bootstrap:
| Runtime | Primary | Alternative |
|---|---|---|
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot |
| Node.js | vitest + @testing-library | jest + @testing-library |
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
| Python | pytest + pytest-cov | unittest |
| Go | stdlib testing + testify | stdlib only |
| Rust | cargo test (built-in) | — |
Ask the user which framework to use, install it, create a minimal config, and run a smoke test to verify.
If the user declines: write .qa/no-test-bootstrap and continue.
After bootstrap: write TESTING.md with run command, conventions, and test expectations. Append a ## Testing section to CLAUDE.md if it doesn't already have one. Commit: "chore: bootstrap test framework ({name})".
Analyze the branch diff:
git diff main...HEAD --name-only
git log main..HEAD --oneline
Identify affected areas from the changed files — routes, CLI commands, API endpoints, library functions, etc.
Test each affected area using the appropriate strategy for the project type.
Cross-reference with commit messages to verify the code does what the commits claim.
Systematic exploration of the entire project surface. Document 5-10 well-evidenced issues. Produce health score.
--quick)Smoke test. Hit the main entry points. Check: does it run? Obvious errors? Core flow works? Produce health score.
--regression <baseline>)Run full mode, then load baseline.json from a previous run. Diff: which issues are fixed? Which are new? Score delta?
git status --porcelain
If dirty, STOP and ask: "Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit. Options: A) Commit my changes, B) Stash my changes, C) Abort"mkdir -p .qa/reports/evidence
qa/templates/qa-report-template.md to output dir_QA_START=$(date +%s)Systematically test the project. The approach depends on what you're testing. See qa/references/issue-taxonomy.md for detailed per-type exploration checklists.
Tools by project type:
mcp__plugin_superpowers-chrome_chrome__use_browser) for navigation, interaction, screenshotscurl requests, test executionSave all evidence to .qa/reports/evidence/. Use the naming convention issue-NNN-{description}.{ext}:
# Command output (CLI, API, test runs)
command 2>&1 | tee .qa/reports/evidence/issue-001-invalid-flag.txt
# API response with headers
curl -sS -D- http://localhost:3000/api/users 2>&1 > .qa/reports/evidence/issue-002-response.txt
# Screenshots (web) — via browser tool screenshot action
In reports, reference evidence as:
`cat evidence/issue-001-output.txt` or quote inline[REDACTED]._next/data 404s, test client-side navigationextract for navigation (link eval may miss client-side routes), check stale state, back/forward history--help), and man pages. Identify all commands and flags.curl via Bash to send real HTTP requests with valid inputs, invalid inputs, missing auth, edge cases.Test each aspect using the appropriate strategy above.
Sort discovered issues by severity. Decide which to fix based on tier:
Mark issues that cannot be fixed from source (third-party bugs, infrastructure) as "deferred" regardless of tier.
For each fixable issue, in severity order:
3a. Locate source
# Grep for error messages, component names, route definitions, command handlers
# Glob for file patterns matching the affected area
3b. Fix
3c. Commit
git add <only-changed-files>
git commit -m "fix(qa): ISSUE-NNN — short description"
One commit per fix. Never bundle multiple fixes.
3d. Re-test Verify the fix using the same method that found the issue (browser, CLI run, HTTP request, test suite).
3e. Classify
git revert HEAD → mark as "deferred"3f. Regression Test
Skip if: classification is not "verified", OR no test framework detected AND user declined bootstrap.
// Regression: ISSUE-NNN — {what broke}
// Found by /qa on {YYYY-MM-DD}
// Report: .qa/reports/qa-report-{date}.md
Every 5 fixes (or after any revert), compute WTF-likelihood:
Start at 0%
Each revert: +15%
Each fix touching >3 files: +5%
After fix 15: +1% per additional fix
All remaining Low severity: +10%
Touching unrelated files: +20%
If WTF > 20%: STOP. Show the user progress so far. Ask whether to continue.
Hard cap: 50 fixes.
After all fixes are applied:
.qa/reports/qa-report-{YYYY-MM-DD}.md using the templatebaseline.json:
{
"date": "YYYY-MM-DD",
"target": "<what was tested>",
"healthScore": N,
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
"categoryScores": { "category": N }
}
TODOS.md:
Choose categories appropriate to the project. There is no fixed set — pick what makes sense.
Universal categories (see qa/references/issue-taxonomy.md): Correctness, Error Handling, Edge Cases, Usability, Performance, Security, Documentation. Use the subset relevant to the project — not all apply to every type.
Scoring mechanic (universal): Each category starts at 100. Deduct per finding:
Assign weights that sum to 100%. Weight core functionality higher than polish.
score = Σ (category_score × weight)
[REDACTED] for passwords in repro steps.git revert HEAD immediately..qa/reports/
├── qa-report-{YYYY-MM-DD}.md # Structured report
├── evidence/
│ ├── initial.png # Web: landing page screenshot
│ ├── issue-001-before.png # Web: before fix screenshot
│ ├── issue-001-after.png # Web: after fix screenshot
│ ├── issue-002-output.txt # CLI: command output
│ ├── issue-003-response.txt # API: HTTP response with headers
│ ├── issue-004-test-run.txt # Library: test suite output
│ └── ...
└── baseline.json
Report status using one of:
If you have attempted a task 3 times without success, STOP and escalate.