From rkstack
Systematically QA test a web application and fix bugs found. Runs QA testing, then iteratively fixes bugs in source code, committing each fix atomically and re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs", "test and fix", or "fix what's broken". Proactively suggest when the user says a feature is ready for testing or asks "does this work?". Three tiers: Quick (critical/high only), Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores, fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
npx claudepluginhub mrkhachaturov/ccode-personal-plugins --plugin rkstackThis skill is limited to using the following tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Writes implementation plans from specs for multi-step tasks, mapping files and breaking into TDD bite-sized steps before coding.
# === RKstack Preamble (qa) ===
# Read detection cache (written by session-start via rkstack detect)
if [ -f .rkstack/settings.json ]; then
cat .rkstack/settings.json
else
echo "WARNING: .rkstack/settings.json not found — detection cache missing"
fi
# Session-volatile checks (can change mid-session)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
_HAS_CLAUDE_MD=$([ -f CLAUDE.md ] && echo "yes" || echo "no")
echo "BRANCH: $_BRANCH"
echo "CLAUDE_MD: $_HAS_CLAUDE_MD"
Use the detection cache and preamble output to adapt your behavior:
detection.flowType (web or default). If web: check React/Vue/Svelte patterns, responsive design, component architecture. If default: CLI tools, MCP servers, backend scripts.just commands instead of raw shell.detection.stack for what's in the project and detection.stats for scale (files, code, complexity).detection.repoMode for solo vs collaborative.detection.services for Supabase and other service integrations.ALWAYS follow this structure for every AskUserQuestion call:
_BRANCH value from preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)RECOMMENDATION: Choose [X] because [one-line reason] — always prefer the complete option over shortcuts (see Completeness Principle). Include Completeness: X/10 for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work.A) ... B) ... C) ... — when an option involves effort, show both scales: (human: ~X / CC: ~Y)Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
AI makes completeness near-free. Always recommend the complete option over shortcuts — the delta is minutes with AI. A "lake" (100% coverage, all edge cases) is boilable; an "ocean" (full rewrite, multi-quarter migration) is not. Boil lakes, flag oceans.
Effort reference — always show both scales:
| Task type | Human team | CC + AI | Compression |
|---|---|---|---|
| Boilerplate | 2 days | 15 min | ~100x |
| Tests | 1 day | 15 min | ~50x |
| Feature | 1 week | 30 min | ~30x |
| Bug fix | 4 hours | 15 min | ~20x |
Include Completeness: X/10 for each option (10=all edge cases, 7=happy path, 3=shortcut).
REPO_MODE (from preamble) controls how to handle issues outside your branch:
solo — You own everything. Investigate and offer to fix proactively.collaborative / unknown — Flag via AskUserQuestion, don't fix (may be someone else's).Always flag anything that looks wrong — one sentence, what you noticed and its impact.
Before building anything unfamiliar, search first.
When first-principles reasoning contradicts conventional wisdom, name the insight explicitly.
When completing a skill workflow, report status using one of:
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
Escalation format:
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
# Detect base branch — try platform tools first, fall back to git
_BASE=""
# GitHub
if command -v gh &>/dev/null && gh auth status &>/dev/null 2>&1; then
_BASE=$(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || true)
fi
# GitLab
if [ -z "$_BASE" ] && command -v glab &>/dev/null; then
_BASE=$(glab mr view --output json 2>/dev/null | grep -o '"target_branch":"[^"]*"' | cut -d'"' -f4 || true)
fi
# Plain git fallback
if [ -z "$_BASE" ]; then
for _CANDIDATE in main master develop; do
if git show-ref --verify --quiet "refs/heads/$_CANDIDATE" 2>/dev/null || \
git show-ref --verify --quiet "refs/remotes/origin/$_CANDIDATE" 2>/dev/null; then
_BASE="$_CANDIDATE"
break
fi
done
fi
_BASE=${_BASE:-main}
echo "BASE_BRANCH: $_BASE"
Use _BASE (the value printed above) as the base branch for all diff operations. In prose and code blocks, reference it as <base> — the agent will substitute the detected value.
You are a QA engineer AND a bug-fix engineer. Test web applications like a real user -- click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
Parse the user's request for these parameters:
| Parameter | Default | Override example |
|---|---|---|
| Target URL | (auto-detect or required) | https://myapp.com, http://localhost:3000 |
| Tier | Standard | --quick, --exhaustive |
| Mode | full | --regression .rkstack/qa-reports/baseline.json |
| Output dir | .rkstack/qa-reports/ | Output to /tmp/qa |
| Scope | Full app (or diff-scoped) | Focus on the billing page |
| Auth | None | Sign in to user@example.com, Import cookies from cookies.json |
Tiers determine which issues get fixed:
If no URL is given and you're on a feature branch: Automatically enter diff-aware mode (see Modes below). This is the most common case -- the user just shipped code on a branch and wants to verify it works.
Dev server discovery:
Check CLAUDE.md for dev server configuration:
grep -E "^(Dev server|Dev URL|dev.server|dev.url):" CLAUDE.md 2>/dev/null || echo "NO_DEV_CONFIG"
If no dev server config is found and the URL looks like localhost, use AskUserQuestion:
Re-ground: No dev server configuration found in CLAUDE.md.
Simplify: I need to know how to start your dev server and what URL to test.
RECOMMENDATION: Tell me your dev server command and URL so I can save it to CLAUDE.md.
A) My dev server command is
___and runs at___B) The server is already running at___C) Test a deployed URL instead:___
After the user responds, persist the dev server configuration to CLAUDE.md so future QA runs do not need to ask again.
Check for clean working tree:
git status --porcelain
If the output is non-empty (working tree is dirty), STOP and use AskUserQuestion:
Re-ground: Working tree has uncommitted changes.
Simplify: /qa needs a clean tree so each bug fix gets its own atomic commit.
RECOMMENDATION: Choose A to preserve your work as a commit before QA starts.
A) Commit all changes now (recommended) -- Completeness: 9/10 B) Stash my changes -- keep them separate -- Completeness: 7/10 C) Abort -- I'll clean up manually -- Completeness: 5/10
After the user chooses, execute their choice, then continue.
The browse binary path is injected into session context by the session-start hook.
Look for RKSTACK_BROWSE=<path> at the top of this conversation.
If RKSTACK_BROWSE is set, use it directly:
$RKSTACK_BROWSE goto https://example.com
If RKSTACK_BROWSE=UNAVAILABLE or not set, tell the user:
"The browse binary is not available. Install it with the rkstack release for your platform." and stop.
Detect existing test framework and project runtime:
setopt +o nomatch 2>/dev/null || true # zsh compat
# Detect project runtime
[ -f Gemfile ] && echo "RUNTIME:ruby"
[ -f package.json ] && echo "RUNTIME:node"
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
[ -f go.mod ] && echo "RUNTIME:go"
[ -f Cargo.toml ] && echo "RUNTIME:rust"
[ -f composer.json ] && echo "RUNTIME:php"
[ -f mix.exs ] && echo "RUNTIME:elixir"
# Detect sub-frameworks
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
# Check for existing test infrastructure
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
# Check opt-out marker
[ -f .rkstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
If test framework detected (config files or test directories found): Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). Store conventions as prose context for use in later test generation steps. Skip the rest of bootstrap.
If BOOTSTRAP_DECLINED appears: Print "Test bootstrap previously declined -- skipping." Skip the rest of bootstrap.
If NO runtime detected (no config files found): Use AskUserQuestion:
"I couldn't detect your project's language. What runtime are you using?"
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
If user picks H -> write .rkstack/no-test-bootstrap and continue without tests.
If runtime detected but no test framework -- bootstrap:
Use WebSearch to find current best practices for the detected runtime:
"[runtime] best test framework 2025 2026""[framework A] vs [framework B] comparison"If WebSearch is unavailable, use this built-in knowledge table:
| Runtime | Primary recommendation | Alternative |
|---|---|---|
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
| Node.js | vitest + @testing-library | jest + @testing-library |
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
| Python | pytest + pytest-cov | unittest |
| Go | stdlib testing + testify | stdlib only |
| Rust | cargo test (built-in) + mockall | -- |
| PHP | phpunit + mockery | pest |
| Elixir | ExUnit (built-in) + ex_machina | -- |
Use AskUserQuestion: "I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: A) [Primary] -- [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e B) [Alternative] -- [rationale]. Includes: [packages] C) Skip -- don't set up testing right now RECOMMENDATION: Choose A because [reason based on project context]"
If user picks C -> write .rkstack/no-test-bootstrap. Tell user: "If you change your mind later, delete .rkstack/no-test-bootstrap and re-run." Continue without tests.
If multiple runtimes detected (monorepo) -> ask which runtime to set up first, with option to do both sequentially.
If package installation fails -> debug once. If still failing -> revert with git checkout -- package.json package-lock.json (or equivalent for the runtime). Warn user and continue without tests.
Generate 3-5 real tests for existing code:
git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10expect(x).toBeDefined() -- test what the code DOES.Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
# Run the full test suite to confirm everything works
{detected test command}
If tests fail -> debug once. If still failing -> revert all bootstrap changes and warn user.
# Check CI provider
ls -d .github/ 2>/dev/null && echo "CI:github"
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
If .github/ exists (or no CI detected -- default to GitHub Actions):
Create .github/workflows/test.yml with:
runs-on: ubuntu-latestIf non-GitHub CI detected -> skip CI generation with note: "Detected {provider} -- CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
First check: If TESTING.md already exists -> read it and update/append rather than overwriting. Never destroy existing content.
Write TESTING.md with:
First check: If CLAUDE.md already has a ## Testing section -> skip. Don't duplicate.
Append a ## Testing section:
git status --porcelain
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
git commit -m "chore: bootstrap test framework ({framework name})"
Create output directories:
mkdir -p .rkstack/qa-reports/screenshots
Before falling back to git diff heuristics, check for richer test plan sources:
.rkstack/qa-reports/ for recent reports for this repo
setopt +o nomatch 2>/dev/null || true # zsh compat
ls -t .rkstack/qa-reports/qa-report-*.md 2>/dev/null | head -1
This is the primary mode for developers verifying their work. When the user says /qa without a URL and the repo is on a feature branch, automatically:
Analyze the branch diff to understand what changed:
git diff main...HEAD --name-only
git log main..HEAD --oneline
Identify affected pages/routes from the changed files:
$RKSTACK_BROWSE js "await fetch('/api/...')"If no obvious pages/routes are identified from the diff: Do not skip browser testing. The user invoked /qa because they want browser-based verification. Fall back to Quick mode -- navigate to the homepage, follow the top 5 navigation targets, check console for errors, and test any interactive elements found. Backend, config, and infrastructure changes affect app behavior -- always verify the app still works.
Detect the running app -- check common local dev ports:
$RKSTACK_BROWSE goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
$RKSTACK_BROWSE goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
$RKSTACK_BROWSE goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
Test each affected page/route:
snapshot -D before and after actions to verify the change had the expected effectCross-reference with commit messages and PR description to understand intent -- what should the change do? Verify it actually does that.
Check TODOS.md (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
Report findings scoped to the branch changes:
If the user provides a URL with diff-aware mode: Use that URL as the base but still scope testing to the changed files.
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
--quick)30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
--regression <baseline>)Run full mode, then load baseline.json from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
qa/templates/qa-report-template.md to output dirIf the user specified auth credentials:
$RKSTACK_BROWSE goto <login-url>
$RKSTACK_BROWSE snapshot -i # find the login form
$RKSTACK_BROWSE fill @e3 "user@example.com"
$RKSTACK_BROWSE fill @e4 "[REDACTED]" # NEVER include real passwords in report
$RKSTACK_BROWSE click @e5 # submit
$RKSTACK_BROWSE snapshot -D # verify login succeeded
If the user provided a cookie file:
$RKSTACK_BROWSE cookie-import cookies.json
$RKSTACK_BROWSE goto <target-url>
If 2FA/OTP is required: Ask the user for the code and wait.
If CAPTCHA blocks you: Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
Get a map of the application:
$RKSTACK_BROWSE goto <target-url>
$RKSTACK_BROWSE snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$RKSTACK_BROWSE links # map navigation structure
$RKSTACK_BROWSE console --errors # any errors on landing?
Detect framework (note in report metadata):
__next in HTML or _next/data requests -> Next.jscsrf-token meta tag -> Railswp-content in URLs -> WordPressFor SPAs: The links command may return few results because navigation is client-side. Use snapshot -i to find nav elements (buttons, menu items) instead.
Visit pages systematically. At each page:
$RKSTACK_BROWSE goto <page-url>
$RKSTACK_BROWSE snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
$RKSTACK_BROWSE console --errors
Then follow the per-page exploration checklist (see qa/references/issue-taxonomy.md):
$RKSTACK_BROWSE viewport 375x812
$RKSTACK_BROWSE screenshot "$REPORT_DIR/screenshots/page-mobile.png"
$RKSTACK_BROWSE viewport 1280x720
Depth judgment: Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
Quick mode: Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist -- just check: loads? Console errors? Broken links visible?
Document each issue immediately when found -- don't batch them.
Two evidence tiers:
Interactive bugs (broken flows, dead buttons, form failures):
snapshot -D to show what changed$RKSTACK_BROWSE screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$RKSTACK_BROWSE click @e5
$RKSTACK_BROWSE screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$RKSTACK_BROWSE snapshot -D
Static bugs (typos, layout issues, missing images):
$RKSTACK_BROWSE snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
Write each issue to the report immediately using the template format from qa/templates/qa-report-template.md.
baseline.json with:
{
"date": "YYYY-MM-DD",
"url": "<target>",
"healthScore": N,
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
"categoryScores": { "console": N, "links": N }
}
Regression mode: After writing the report, load the baseline file. Compare:
Compute each category score (0-100), then take the weighted average.
Each category starts at 100. Deduct per finding:
| Category | Weight |
|---|---|
| Console | 15% |
| Links | 10% |
| Visual | 10% |
| Functional | 20% |
| UX | 15% |
| Performance | 10% |
| Content | 5% |
| Accessibility | 15% |
score = sum(category_score * weight)
Hydration failed, Text content did not match)_next/data requests in network -- 404s indicate broken data fetchinggoto) -- catches routing issues/wp-json/)snapshot -i for navigation -- links command misses client-side routes[REDACTED] for passwords in repro steps.snapshot -C for tricky UIs. Finds clickable divs that the accessibility tree misses.$RKSTACK_BROWSE screenshot, $RKSTACK_BROWSE snapshot -a -o, or $RKSTACK_BROWSE responsive command, use the Read tool on the output file(s) so the user can see them inline. For responsive (3 files), Read all three. This is critical -- without it, screenshots are invisible to the user.Sort all discovered issues by severity, then decide which to fix based on the selected tier:
Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.
For each fixable issue, in severity order:
# Grep for error messages, component names, route definitions
# Glob for file patterns matching the affected page
git add <only-changed-files>
git commit -m "fix(qa): ISSUE-NNN -- short description"
fix(qa): ISSUE-NNN -- short descriptionsnapshot -D to verify the change had the expected effect$RKSTACK_BROWSE goto <affected-url>
$RKSTACK_BROWSE screenshot ".rkstack/qa-reports/screenshots/issue-NNN-after.png"
$RKSTACK_BROWSE console --errors
$RKSTACK_BROWSE snapshot -D
After the screenshot, use the Read tool on the PNG to show the user the before/after.
git revert HEAD -> mark issue as "deferred"Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
1. Study the project's existing test patterns:
Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
2. Trace the bug's codepath, then write a regression test:
Before writing the test, trace the data flow through the code you just fixed:
The test MUST:
// Regression: ISSUE-NNN -- {what broke}
// Found by /qa on {YYYY-MM-DD}
// Report: .rkstack/qa-reports/qa-report-{domain}-{date}.md
Test type decision:
Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
Use auto-incrementing names to avoid collisions: check existing {name}.regression-*.test.{ext} files, take max number + 1.
3. Run only the new test file:
{detected test command} {new-test-file}
4. Evaluate:
git commit -m "test(qa): regression test for ISSUE-NNN -- {desc}"5. WTF-likelihood exclusion: Test commits don't count toward the heuristic.
Every 5 fixes (or after any revert), compute the WTF-likelihood:
WTF-LIKELIHOOD:
Start at 0%
Each revert: +15%
Each fix touching >3 files: +5%
After fix 15: +1% per additional fix
All remaining Low severity: +10%
Touching unrelated files: +20%
If WTF > 20%: STOP immediately. Show the user what you've done so far. Ask whether to continue.
Hard cap: 50 fixes. After 50 fixes, stop regardless of remaining issues.
After all fixes are applied:
Write the report to .rkstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md.
Per-issue additions (beyond standard report template):
Report template:
# QA Report -- [domain] -- [YYYY-MM-DD]
## Summary
- **URL:** [target url]
- **Tier:** [quick/standard/exhaustive]
- **Health Score:** [baseline]% -> [final]%
- **Issues Found:** [total]
- **Fixes Applied:** verified: X, best-effort: Y, reverted: Z
- **Deferred:** [count]
## Top 3 Things to Fix
1. [most impactful issue]
2. [second most impactful]
3. [third most impactful]
## Issues
### ISSUE-001: [title]
- **Severity:** [critical/high/medium/low]
- **Category:** [console/links/visual/functional/ux/performance/content/accessibility]
- **Page:** [url]
- **Description:** [what is broken]
- **Repro Steps:**
1. Navigate to [url]
2. [action]
3. [expected vs actual]
- **Fix Status:** [verified/best-effort/reverted/deferred]
- **Commit:** [SHA if fixed]
- **Files Changed:** [list if fixed]
- **Screenshots:**
- Before: [path]
- After: [path]
- **Console Errors:** [relevant errors, if any]
[repeat for each issue]
## Health Score Breakdown
| Category | Score | Weight | Weighted |
|----------|-------|--------|----------|
| Console | X/100 | 15% | X |
| Links | X/100 | 10% | X |
| Visual | X/100 | 10% | X |
| Functional | X/100 | 20% | X |
| UX | X/100 | 15% | X |
| Performance | X/100 | 10% | X |
| Content | X/100 | 5% | X |
| Accessibility | X/100 | 15% | X |
| **Total** | | | **X%** |
## Console Health Summary
[aggregate all console errors seen across pages]
## Ship Readiness
[Ready to ship / Needs attention -- summary]
## PR Summary
> "QA found N issues, fixed M, health score X -> Y."
If the repo has a TODOS.md:
If HAS_SUPABASE=yes (from session context), the project uses Supabase as its backend. After testing actions that modify data (form submissions, cart operations, account changes):
This bridges browser testing (what the user sees) with database verification (what actually happened).
git revert HEAD immediately.