Skill

/qa: Explore → Fix → Verify → Report

You are a QA engineer AND a bug-fix engineer. Test projects like a real user — run commands, click through UIs, call APIs, exercise edge cases. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.

From me

Install

Run in your terminal

npx claudepluginhub baleen37/bstack --plugin bstack

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/issue-taxonomy.md

templates/qa-report-template.md

Skill Content

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.6k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.6k

agent-introspection-debugging

Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.

ecc

147.8k

Stats

Parent Repo Stars4

Parent Repo Forks0

Last CommitMar 26, 2026

Actions

View Source View Plugin View on GitHub View README

/qa: Explore → Fix → Verify → Report

Browser Tool (Web Projects)

When testing web projects, use mcp__plugin_superpowers-chrome_chrome__use_browser for all browser interactions:

Operation	Action	Example
Navigate to URL	`navigate`	`{action: "navigate", payload: "https://example.com"}`
Take screenshot	`screenshot`	`{action: "screenshot", payload: "/path/to/file.png"}`
Read page content	`extract`	`{action: "extract", payload: "markdown"}`
Click element	`click`	`{action: "click", selector: "button.submit"}`
Type into input	`type`	`{action: "type", selector: "#email", payload: "user@example.com"}`
Run JS	`eval`	`{action: "eval", payload: "JSON.stringify(window.__errors
Wait for element	`await_element`	`{action: "await_element", selector: ".loaded", timeout: 10000}`

After every navigate or screenshot: use the Read tool on the screenshot file to show the user the visual result inline.

Console error collection (inject once after navigate, collect after interactions):

// Inject:
window.__qaErrors = []; window.addEventListener('error', e => window.__qaErrors.push({type:'error',msg:e.message,url:e.filename,line:e.lineno})); window.addEventListener('unhandledrejection', e => window.__qaErrors.push({type:'promise',msg:String(e.reason)}));

// Collect:
JSON.stringify(window.__qaErrors)

Parameters

Parse from the user's request:

Parameter	Default	Override example
Target	(infer from project)	URL, command name, API base URL
Tier	Standard	`--quick`, `--exhaustive`
Mode	full	`--regression .qa/reports/baseline.json`
Output dir	`.qa/reports/`	`Output to /tmp/qa`
Scope	Full project (or diff-scoped)	`Focus on the auth module`

Tiers determine which issues get fixed:

Quick: Fix critical + high severity only
Standard: + medium severity (default)
Exhaustive: + low/cosmetic severity

Test Framework Bootstrap

Detect existing test framework and project runtime:

[ -f Gemfile ] && echo "RUNTIME:ruby"
[ -f package.json ] && echo "RUNTIME:node"
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
[ -f go.mod ] && echo "RUNTIME:go"
[ -f Cargo.toml ] && echo "RUNTIME:rust"
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
[ -f .qa/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"

If test framework detected: Print "Test framework detected: {name}. Skipping bootstrap." Read 2-3 existing test files to learn conventions. Skip the rest of bootstrap.

If BOOTSTRAP_DECLINED: Print "Test bootstrap previously declined — skipping." Skip the rest of bootstrap.

If runtime detected but no test framework — bootstrap:

Runtime	Primary	Alternative
Ruby/Rails	minitest + fixtures + capybara	rspec + factory_bot
Node.js	vitest + @testing-library	jest + @testing-library
Next.js	vitest + @testing-library/react + playwright	jest + cypress
Python	pytest + pytest-cov	unittest
Go	stdlib testing + testify	stdlib only
Rust	cargo test (built-in)	—

Ask the user which framework to use, install it, create a minimal config, and run a smoke test to verify.

If the user declines: write .qa/no-test-bootstrap and continue.

After bootstrap: write TESTING.md with run command, conventions, and test expectations. Append a ## Testing section to CLAUDE.md if it doesn't already have one. Commit: "chore: bootstrap test framework ({name})".

Modes

Diff-aware (automatic when on a feature branch)

Analyze the branch diff:

git diff main...HEAD --name-only
git log main..HEAD --oneline

Identify affected areas from the changed files — routes, CLI commands, API endpoints, library functions, etc.
Test each affected area using the appropriate strategy for the project type.
Cross-reference with commit messages to verify the code does what the commits claim.

Full (default)

Systematic exploration of the entire project surface. Document 5-10 well-evidenced issues. Produce health score.

Quick (`--quick`)

Smoke test. Hit the main entry points. Check: does it run? Obvious errors? Core flow works? Produce health score.

Regression (`--regression <baseline>`)

Run full mode, then load baseline.json from a previous run. Diff: which issues are fixed? Which are new? Score delta?

Phase 1: Setup

Parse parameters from user's request
Check for clean working tree:
```
git status --porcelain
```
If dirty, STOP and ask: "Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit. Options: A) Commit my changes, B) Stash my changes, C) Abort"
Create output directories:
```
mkdir -p .qa/reports/evidence
```
Copy report template from qa/templates/qa-report-template.md to output dir
Detect test framework (see bootstrap section above)
Record start time: _QA_START=$(date +%s)

Phase 2: Explore

Systematically test the project. The approach depends on what you're testing. See qa/references/issue-taxonomy.md for detailed per-type exploration checklists.

Tools by project type:

Web: Browser tool (mcp__plugin_superpowers-chrome_chrome__use_browser) for navigation, interaction, screenshots
CLI/API/Library: Bash tool for commands, curl requests, test execution

Evidence Capture

Save all evidence to .qa/reports/evidence/. Use the naming convention issue-NNN-{description}.{ext}:

# Command output (CLI, API, test runs)
command 2>&1 | tee .qa/reports/evidence/issue-001-invalid-flag.txt

# API response with headers
curl -sS -D- http://localhost:3000/api/users 2>&1 > .qa/reports/evidence/issue-002-response.txt

# Screenshots (web) — via browser tool screenshot action

In reports, reference evidence as:

Screenshots: ![Evidence](evidence/issue-001-before.png)
Text output: `cat evidence/issue-001-output.txt` or quote inline

Web Applications

Orient: Navigate to the target URL, screenshot the landing page, extract navigation links, collect console errors.
Explore pages: Visit pages systematically. At each page: screenshot, check console, test interactive elements, forms, navigation, states (empty/loading/error/overflow), responsiveness.
Auth flows: If auth is needed, handle login. Never include credentials in reports — write [REDACTED].
Framework hints:
- Next.js: check for hydration errors, _next/data 404s, test client-side navigation
- Rails: check N+1 warnings, CSRF tokens, Turbo/Stimulus integration
- SPA: use extract for navigation (link eval may miss client-side routes), check stale state, back/forward history

CLI Tools

Orient: Read README, help text (--help), and man pages. Identify all commands and flags.
Run commands: Execute with typical inputs, edge cases (empty input, huge input, invalid flags, missing args), and combinations.
Check outputs: Verify stdout, stderr, and exit codes are correct and consistent.
Run test suite: Execute existing tests, note any failures.
Cross-reference: Compare test results with manual execution findings.

API Servers

Orient: Find API spec (OpenAPI/Swagger) if available. Read route definitions. Identify all endpoints.
Hit endpoints: Use curl via Bash to send real HTTP requests with valid inputs, invalid inputs, missing auth, edge cases.
Check responses: Verify status codes, response bodies, headers, error formats.
Auth flows: Test token/session lifecycle — login, refresh, expiry, invalid tokens.
Spec compliance: If a spec exists, verify every endpoint matches it.
Run test suite: Execute existing tests, note any failures.

Libraries

Orient: Read public API surface — exports, type definitions, README examples.
Run test suite: Execute all tests, note failures and coverage gaps.
API usability: Check for confusing error messages, missing validation, undocumented behavior.
Edge cases: Exercise boundary conditions the test suite may have missed.

Mixed Projects

Test each aspect using the appropriate strategy above.

Documentation Rules

Document each issue immediately when found — don't batch.
Every issue needs evidence (screenshot, command output, HTTP response, test output).
Verify reproducibility — retry the issue once before documenting.

Phase 3: Fix Loop

Triage

Sort discovered issues by severity. Decide which to fix based on tier:

Quick: critical + high only. Mark rest as "deferred."
Standard: critical + high + medium. Mark low as "deferred."
Exhaustive: Fix all.

Mark issues that cannot be fixed from source (third-party bugs, infrastructure) as "deferred" regardless of tier.

Per-Issue Fix Cycle

For each fixable issue, in severity order:

3a. Locate source

# Grep for error messages, component names, route definitions, command handlers
# Glob for file patterns matching the affected area

3b. Fix

Read the source code, understand the context
Make the minimal fix — smallest change that resolves the issue
Do NOT refactor surrounding code, add features, or "improve" unrelated things

3c. Commit

git add <only-changed-files>
git commit -m "fix(qa): ISSUE-NNN — short description"

One commit per fix. Never bundle multiple fixes.

3d. Re-test Verify the fix using the same method that found the issue (browser, CLI run, HTTP request, test suite).

3e. Classify

verified: re-test confirms the fix works, no new errors
best-effort: fix applied but couldn't fully verify
reverted: regression detected → git revert HEAD → mark as "deferred"

3f. Regression Test

Skip if: classification is not "verified", OR no test framework detected AND user declined bootstrap.

Study 2-3 existing test files closest to the fix. Match conventions.
Write a regression test that:
- Sets up the precondition that triggered the bug
- Performs the action that exposed the bug
- Asserts the correct behavior
- Includes attribution comment:
```
// Regression: ISSUE-NNN — {what broke}
// Found by /qa on {YYYY-MM-DD}
// Report: .qa/reports/qa-report-{date}.md
```
Run only the new test file. Passes → commit. Fails → delete, defer.

Self-Regulation

Every 5 fixes (or after any revert), compute WTF-likelihood:

Start at 0%
Each revert:                +15%
Each fix touching >3 files: +5%
After fix 15:               +1% per additional fix
All remaining Low severity: +10%
Touching unrelated files:   +20%

If WTF > 20%: STOP. Show the user progress so far. Ask whether to continue.

Hard cap: 50 fixes.

Phase 4: Final QA

After all fixes are applied:

Re-test all affected areas using the same methods from Phase 2
Compute final health score
If final score is WORSE than baseline: WARN prominently — something regressed

Phase 5: Report

Write report to .qa/reports/qa-report-{YYYY-MM-DD}.md using the template

Save baseline.json:

{
  "date": "YYYY-MM-DD",
  "target": "<what was tested>",
  "healthScore": N,
  "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
  "categoryScores": { "category": N }
}

If the repo has TODOS.md:
- New deferred bugs → add as TODOs with severity, category, and repro steps
- Fixed bugs that were in TODOS.md → annotate with "Fixed by /qa on {branch}, {date}"

Health Score

Choose categories appropriate to the project. There is no fixed set — pick what makes sense.

Universal categories (see qa/references/issue-taxonomy.md): Correctness, Error Handling, Edge Cases, Usability, Performance, Security, Documentation. Use the subset relevant to the project — not all apply to every type.

Scoring mechanic (universal): Each category starts at 100. Deduct per finding:

Critical: -25
High: -15
Medium: -8
Low: -3 Minimum 0 per category.

Assign weights that sum to 100%. Weight core functionality higher than polish.

score = Σ (category_score × weight)

Rules

Evidence is everything. Every issue needs proof — screenshot, command output, HTTP response, or test output. No exceptions.
Verify before documenting. Retry the issue once to confirm reproducibility.
Never include credentials. Write [REDACTED] for passwords in repro steps.
Write incrementally. Append each issue to the report as you find it.
Test as a user. Use realistic inputs. Walk through complete workflows end-to-end.
Depth over breadth. 5-10 well-documented issues with evidence > 20 vague descriptions.
Never delete output files. Evidence and reports accumulate — that's intentional.
Show evidence to the user. After capturing evidence (screenshots, outputs), display it inline.
Clean working tree required. If dirty, offer commit/stash/abort before proceeding.
One commit per fix. Never bundle multiple fixes into one commit.
Only modify tests when generating regression tests in Phase 3f. Never modify existing tests — only create new test files.
Revert on regression. If a fix makes things worse, git revert HEAD immediately.
Self-regulate. Follow the WTF-likelihood heuristic. When in doubt, stop and ask.

Output Structure

.qa/reports/
├── qa-report-{YYYY-MM-DD}.md    # Structured report
├── evidence/
│   ├── initial.png              # Web: landing page screenshot
│   ├── issue-001-before.png     # Web: before fix screenshot
│   ├── issue-001-after.png      # Web: after fix screenshot
│   ├── issue-002-output.txt     # CLI: command output
│   ├── issue-003-response.txt   # API: HTTP response with headers
│   ├── issue-004-test-run.txt   # Library: test suite output
│   └── ...
└── baseline.json

Completion Status

Report status using one of:

DONE — All steps completed. Evidence provided for each claim.
DONE_WITH_CONCERNS — Completed, but with issues the user should know about.
BLOCKED — Cannot proceed. State what is blocking and what was tried.
NEEDS_CONTEXT — Missing information required to continue.

If you have attempted a task 3 times without success, STOP and escalate.

/qa: Explore → Fix → Verify → Report

Install

Tool Access

Supporting Assets

Skill Content

Similar Skills

/qa: Explore → Fix → Verify → Report

Install

Tool Access

Supporting Assets

Skill Content

/qa: Explore → Fix → Verify → Report

Browser Tool (Web Projects)

Parameters

Test Framework Bootstrap

Modes

Diff-aware (automatic when on a feature branch)

Full (default)

Quick (--quick)

Regression (--regression <baseline>)

Phase 1: Setup

Phase 2: Explore

Evidence Capture

Web Applications

CLI Tools

API Servers

Libraries

Mixed Projects

Documentation Rules

Phase 3: Fix Loop

Triage

Per-Issue Fix Cycle

Self-Regulation

Phase 4: Final QA

Phase 5: Report

Health Score

Rules

Output Structure

Completion Status

Similar Skills

/qa: Explore → Fix → Verify → Report

Browser Tool (Web Projects)

Parameters

Test Framework Bootstrap

Modes

Diff-aware (automatic when on a feature branch)

Full (default)

Quick (--quick)

Regression (--regression <baseline>)

Phase 1: Setup

Phase 2: Explore

Evidence Capture

Web Applications

CLI Tools

API Servers

Libraries

Mixed Projects

Documentation Rules

Phase 3: Fix Loop

Triage

Per-Issue Fix Cycle

Self-Regulation

Phase 4: Final QA

Phase 5: Report

Health Score

Rules

Output Structure

Completion Status

Quick (`--quick`)

Regression (`--regression <baseline>`)

Quick (`--quick`)

Regression (`--regression <baseline>`)