Help us improve
Share bugs, ideas, or general feedback.
From Flagrare
Executes a disciplined bug bash against a live system by ingesting a test plan (Notion, markdown), driving the browser or API, reproducing each case with evidence, and running exploratory passes before writing results.
npx claudepluginhub flagrare/agent-skills --plugin flagrareHow this skill is triggered — by the user, by Claude, or both
Slash command
/flagrare:bug-bashThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A disciplined pass through a prescribed test plan against a real running system. The pass ends when every in-scope scenario has been run, every bug has been *reproduced* (not assumed), evidence is captured, and the results have landed where the user wants them.
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Share bugs, ideas, or general feedback.
A disciplined pass through a prescribed test plan against a real running system. The pass ends when every in-scope scenario has been run, every bug has been reproduced (not assumed), evidence is captured, and the results have landed where the user wants them.
This skill exists because of a specific failure mode: when someone hands you a test plan and a transcript of past discussion, the temptation is to just write up "the team already found these bugs" without actually testing. That produces noise, not signal. The whole point of a bash is that you ran it and saw it.
Most "QA passes" rot in two directions. Either they're a single happy-path click-through that never finds the interesting bugs, or they're an essay summarizing what other people said in meetings. The first misses real defects; the second pollutes the bug tracker with unverified claims that someone else later has to chase down.
A good bash does the prescribed work and explores around it and keeps a strict line between "I reproduced this" and "someone mentioned this." That line is what makes the output trustworthy.
The second reason this skill exists: the source-of-truth (a Notion test case database, a PRD with acceptance criteria, a markdown checklist) is usually the most important artifact, and it's the part that doesn't get updated. Filling Eng QA columns, adding bugs to the right database, and posting notes in the team's tone is half the work.
State the goal in one sentence and surface it to the user so they can correct scope before the loop starts.
Goal: bash
[feature / ticket / PR]. Every prescribed test case is run against the live system with evidence. Every exploratory pass (viewports, multi-actor, edge inputs, codebase-driven concerns) is completed. Every bug recorded is one I reproduced — no second-hand claims. Results land in[user's chosen sink].
The goal owns the session. The bash is not done until each clause holds.
Ask the user where the test cases live. The skill must not assume a format. Use AskUserQuestion with:
After ingestion, restate the test cases back to the user as a numbered list with the What and Should for each. This is the chance to catch a misread before driving the browser.
If the source contains pre-set columns for results (a Notion "Eng QA" column, a "Status" cell, a checkbox), note them — those are where results will land in Step 7.
Inspect what's being tested and pick the surface area:
| Signal | Target |
|---|---|
| Test cases describe browser navigation, clicks, form submission, URL changes | ui |
| Test cases describe API endpoints, request/response shapes, status codes | backend |
| Both | both |
| Ambiguous | ask via AskUserQuestion |
Verify the right MCP / tool is reachable:
postman:send-request), curl via Bash, or an MCP that hits the service. Confirm before assuming.If the UI bash needs a browser MCP and neither Chrome DevTools nor Playwright is connected, ask via AskUserQuestion:
Read the matching reference for tactics:
references/ui.md — browser-driven flows, isolated contexts, screenshot evidence, viewport handlingreferences/backend.md — request shaping, auth, status/contract assertions, observability checksThree things to confirm before any scenario runs:
Where is the live system? The user almost always knows from context. If the answer isn't obvious, ask via AskUserQuestion:
Auth. Default to asking the user to log in themselves in the browser window the MCP is attached to. Automated login is a tarpit — MFA, captchas, OAuth redirects, and corporate SSO all break it. Wait for the user to confirm they're logged in.
Feature flags / preconditions. If the spec calls out a flag (release-foo is on, partner X has enabled = true), verify it's set before bashing. If a precondition can only be set by a developer (DB seed, environment toggle), name it and ask them to do it before continuing — don't run a session that's blocked from the start.
If the test plan requires multiple user roles (inviter / invitee / admin), use isolated browser contexts (new_page with isolatedContext) — they share no cookies, no storage. This is the only reliable way to test impersonation, invite-acceptance, or cross-tenant flows without log-out/log-in churn.
Walk the list in order. For each case:
[workspace]/bug-bash-screenshots/ with descriptive names (tc2-03-invite-link-generated.png, not screenshot1.png).pass, fail, skip-blocked, skip-external.skip-blocked means a prior case failure makes this one unreachable — fix the prior first or surface it.
skip-external is for cases that can't be run from this seat: expired states needing a DB write, mobile-only flows needing a real device, partner config needing a config push. Name what's missing and who would need to do it. Do not mark these as pass — they are not verified.
When a case fails, capture evidence and keep going. Don't fix-as-you-go — a bash is observation, not implementation. The exception is P0-equivalent "feature didn't load at all" — that blocks everything else. Surface it via AskUserQuestion:
Strict cases catch what the team thought to ask about. Exploratory catches what they didn't. Do this every time the strict pass is done — not "if time allows."
Five lenses, in this order. See references/exploratory.md for tactics on each.
For each exploratory finding, the rules are the same: reproduce it yourself, capture evidence, record it. If you can't reproduce a finding someone else mentioned, that is itself a useful result — file it as "could not reproduce" with the steps you tried.
Default sink is a local MD file at the user's working dir: bug-bash-[feature]-results.md. Always produce this — it's the audit trail the user reads first.
The MD has three sections:
# [Feature] bug bash — [date]
## Summary
| # | Test case | Status | Notes |
## Detailed results
[per test case: step / expected / actual / status / evidence ref]
## Bugs reproduced
[per bug: description / repro steps / evidence / severity / source: strict or exploratory]
## Could not verify
[external-blocked items with what's needed and who from]
Then ask the user via AskUserQuestion whether to also write back to the source:
When writing back, match the team's tone in that source. If the existing Eng QA cells use "✅ CB" style, match. If the bug DB entries are casual / first-person, match. Read the surrounding rows before composing yours. This is what makes the output feel like it belongs there.
If the user authored prior content in a recognizable voice (you can usually tell from earlier rows / comments), mirror it. Conversational, informal, no corporate softening unless that's what the team uses.
Surface the final state via AskUserQuestion:
The button-prompt is the audit trail. Don't declare done in prose.
skip-external, not pass. Naming what's missing is more useful than pretending it worked.[acceptance criteria written / feature implemented]
↓
/flagrare:bug-bash (this skill — verify the spec with evidence, find what the spec missed)
↓
[bugs filed back to source]
↓
[follow-up tickets created for sev-1/sev-2]
↓
[smoke-test reruns on each fix; bash rerun before release]
The bash is distinct from /flagrare:smoke-test. Smoke-test is "the feature I just implemented works end-to-end" — narrow, ten-minute budget, one author. Bug-bash is "the team's prescribed test plan plus exploratory coverage" — wider, longer budget, often runs against features other people built. Both are evidence-driven; they're not interchangeable.
references/ui.md — Chrome DevTools MCP / Playwright tactics: isolated contexts, screenshot conventions, viewport sweeps, evidence directory structure, common assertion patternsreferences/backend.md — request shaping, auth matrix, contract assertions, status-code checks, observability lookupsreferences/exploratory.md — the five-lens framework expanded: viewport sweep checklist, edge-input patterns, multi-actor scenarios, codebase-derived concerns, ingesting transcripts and follow-up context without falling into the "paraphrase the meeting" trap