From browser-qa
Use when testing a web app end-to-end via Chrome browser. Navigates all screens, discovers interactive elements, verifies functionality, checks console errors and network failures, finds bugs, and automatically launches fix agents inline. Also supports targeted workflow testing and bug fix cycles. Requires Claude in Chrome extension connected.
npx claudepluginhub narailabs/narai-claude-plugins --plugin browser-qaThis skill uses the workspace's default tool permissions.
Systematically explore and test a web application using Chrome MCP browser tools. Auto-discovers screens, interactive elements, and testable surfaces from the DOM — no project-specific configuration needed. **Automatically spawns fix agents for every bug found**, then re-tests to confirm the fix. Goes beyond runtime errors — analyzes docs and code to build expectations of how the app should beh...
feedback/patterns.mdfeedback/run-log.jsonlreference/accessibility.mdreference/expectations-validation.mdreference/fix-agents.mdreference/fix-mode.mdreference/interaction-protocol.mdreference/performance.mdreference/reporting.mdreference/responsive.mdreference/testing-layers.mdreference/workflow-mode.mdVerifies tests pass on completed feature branch, presents options to merge locally, create GitHub PR, keep as-is or discard; executes choice and cleans up worktree.
Guides root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Writes implementation plans from specs for multi-step tasks, mapping files and breaking into TDD bite-sized steps before coding.
Systematically explore and test a web application using Chrome MCP browser tools. Auto-discovers screens, interactive elements, and testable surfaces from the DOM — no project-specific configuration needed. Automatically spawns fix agents for every bug found, then re-tests to confirm the fix. Goes beyond runtime errors — analyzes docs and code to build expectations of how the app should behave, then validates each expectation with dedicated subagents that test and fix independently.
Arguments received: $ARGUMENTS
Parse the arguments:
http://localhost:5173)--mode smoke|functional|full = testing depth (default: full)--record = enable GIF recording of test session--focus "#route" = test only a specific route/area--no-autofix = skip auto-fixing bugs, just report them--a11y = run accessibility checks (included in full mode)--perf = run performance checks (included in full mode)--responsive = test at mobile/tablet/desktop breakpoints (included in full mode)--skip-auth = skip login, test only public pages--workflow "description" = test a specific user journey. Implies --mode functional. Skips broad discovery.--fix "description" = reproduce and fix a known bug. Runs reproduce → fix → verify cycle.--workflow and --fix are mutually exclusive. If both provided, ask which was intended.
If no URL provided, ask the user.
Mode selection: If --mode is NOT explicitly set and neither --workflow nor --fix is used, ask the user to pick a mode before starting:
AskUserQuestion:
question: "Which testing mode should I use?"
header: "Test mode"
options:
- label: "Full (Recommended)"
description: "CRUD lifecycle, action verification, permutations, a11y, perf, responsive, dark mode. Targets ≥90% coverage."
- label: "Functional"
description: "CRUD lifecycle, action verification, form validation, navigation testing. No permutations or a11y/perf/responsive."
- label: "Smoke"
description: "Navigate and observe only — no interactions. Quick pass for DOM, console, and network errors."
Map the answer to the mode and proceed. If the user picks "Other", use their custom input as the mode or clarify.
Example invocations:
/browser-qa http://localhost:3000
/browser-qa http://localhost:3000 --mode full
/browser-qa http://localhost:3000 --workflow "register user, then verify dashboard loads"
/browser-qa http://localhost:3000 --fix "clicking save on settings throws TypeError"
CRITICAL: Do NOT proceed until the Chrome extension is confirmed working.
Call Claude_in_Chrome:tabs_context_mcp immediately.
If it succeeds: Proceed.
If it fails: Stop. Show the user:
Chrome extension not connected. Install from:
https://chromewebstore.google.com/detail/claude/fcoeoabgfenejglbffodgkkbkcdhcgfn
Setup: Install → grant permissions → ensure Chrome is running → restart Claude Code if needed.
Ask if user wants to retry or cancel. If cancel → end execution.
Before opening the browser, understand the project so fix agents have context.
Read project docs (CLAUDE.md, README.md) for tech stack, conventions, build/test commands
Explore project structure to identify frontend/backend source directories, component patterns, framework
Build a CODEBASE CONTEXT block for fix agent prompts:
CODEBASE CONTEXT:
- Tech stack: [framework] frontend, [backend] server
- Frontend source: [path]
- Backend source: [path]
- Build/test commands: [from docs]
- Key patterns: [notable conventions]
Store this for injection into all fix agent prompts.
Connect to Chrome: tabs_context_mcp → tabs_create_mcp → navigate to app URL
Verify app reachable: read_page + screenshot. If blank/error → app may not be running, notify user.
Auth wall detection: Look for login forms, "Sign in" buttons, OAuth buttons, /login redirects.
--skip-auth set: Test only public pages. Note auth-gated screens as "skipped" in report.Optional: If --record, start GIF recording via gif_creator(action=start_recording)
Clear console baseline: read_console_messages(clear=true)
Detect SPA vs MPA:
| Signal | Indicates |
|---|---|
Hash routes (#/tasks) | SPA (hash routing) |
Framework root (<div id="root">) | Likely SPA |
| Full page reload on link click | MPA |
| Page shell persists on navigation | SPA |
Record APP_TYPE: SPA or MPA. Navigation strategy differs:
| Behavior | SPA | MPA |
|---|---|---|
| Navigate | Click links or navigate | navigate to full URL |
| Wait after nav | 2s (client render) | 3s (full page load) |
| Console | Persists — clear per screen | Resets per page load |
| Network | Accumulates — note new requests | Resets — all are relevant |
--workflow set → Workflow Mode. Skip Phases 3-5.--fix set → Fix Mode. Skip Phases 3-5.Flag interactions:
--workflow/--fix imply --mode functional--a11y/--perf with --workflow/--fix: After the workflow/fix completes, run Layer 6/7 checks on the screens that were visited during execution--responsive: Re-run workflow at each breakpoint, or verify fix at all breakpoints--no-autofix + --fix: Contradictory — ask user to clarify--focus: Limits broad testing (Phases 3-7) to the specified route only. Ignored when --workflow/--fix is setMap the app's testable surfaces:
read_page — find nav elements, sidebar, header linksfind(query="navigation links") — record each link's text and hrefread_page(filter=interactive) — catalog buttons, inputs, selects per screen"Screen: Tasks (#tasks) — 14 interactive elements"--focus filtering: If --focus is set, limit discovery and all subsequent phases (4-7) to only the matching route/screen. Other screens are skipped and noted as "out of focus scope" in the report.
Validate the app works as intended — not just that it doesn't crash. Uses subagents for clean context per validation. Full procedures in expectations-validation.md.
Spawn a general-purpose subagent (no browser access) to analyze the codebase adaptively:
The agent returns a JSON array of structured expectations — each with an ID, category, screen, verification steps, and success criteria.
smoke: Only high priority expectationsfunctional: high + medium priorityfull: All expectationsFor each expectation, spawn a separate general-purpose subagent with clean context. Each validation agent:
Run sequentially (agents share the browser tab). Report progress: "Validating 3/15: Dashboard shows user count..."
expectation_failure--no-autofix: Validation agents report failures but skip fix attempts.
For each screen, run verification layers. Follow the Interaction Stability Protocol for all interactions.
read_page — verify meaningful content (not empty/error)read_console_messages(pattern="error|Error|warn|Warning|exception", clear=true)read_network_requests — check for 4xx/5xx, failed/aborted requestsfunctional/full mode)For each interactive element, follow the detailed procedures in testing-layers.md:
New in functional/full mode:
New in full mode only:
--a11y or full mode)Run WCAG-informed checks per accessibility.md:
full mode only (WCAG 2.1.2)--perf or full mode)Observational checks per performance.md:
Performance findings are advisory — they do NOT trigger auto-fix.
Maintain a running bug registry to avoid duplicate reports.
Dedup key: error_signature (constant part of error text) + bug_type (console_error, network_failure, dom_issue, visual_issue, functional_issue, a11y_issue, perf_issue, responsive_issue)
Before logging any bug:
affected_screensaffected_screens: [current_screen]Auto-fix dedup: Only spawn fix agent the FIRST time a bug is encountered.
Registry format:
BUG REGISTRY:
- BUG-001: {key: "TypeError: Cannot read properties of undefined", type: console_error, severity: major, affected_screens: ["#tasks", "#agents"], first_seen: "#tasks"}
Runs immediately when any bug is confirmed during Phase 5 (unless --no-autofix). Note: bugs found during Phase 4 (expectations validation) are already handled by the validation subagents — this phase covers runtime bugs from Phase 5.
For each bug, follow the process in fix-agents.md:
Runs with --responsive or --mode full. Follow procedures in responsive.md.
Re-test screens at breakpoints: Mobile (375px), Mobile landscape (812x375), Tablet (768px), Desktop (1280px).
For each: resize → navigate → screenshot → check for layout issues (overflow, overlap, missing elements, unreadable text, small touch targets).
Dark mode testing: Toggle color scheme via in-app toggle or media query emulation. Check for invisible text, missing theme variables, hardcoded colors.
Responsive bugs trigger auto-fix. Restore desktop viewport after testing.
Generate a comprehensive report per reporting.md:
--a11y/full): Issues with WCAG references--perf/full): Advisory findings--responsive/full): Per-breakpoint findingsfull): Advisory findingsIf --record: Stop GIF recording, export, and offer to the user.
| Always ask before | Ask once, then remember | Never ask about |
|---|---|---|
| Destructive buttons (delete, remove, reset) | Auth credentials | Navigation / reading pages |
| External actions (send, publish, deploy) | Destructive action preference | Screenshots / console / network |
| Real credentials (API keys, passwords) | Filling forms with test data | |
| OAuth/SSO flows (user does manually) | Clicking safe buttons (non-destructive, in-app) | |
| Spawning fix agents | ||
| A11y / perf / responsive checks | ||
| Retrying failed interactions |
smoke: Navigate each screen → DOM + console + network + screenshot. Expectations validation (high priority only). No interactions. Auto-fix. Deduplication active. Coverage target: element coverage only.
functional: smoke + interact with safe elements, fill forms, click buttons, verify state. CRUD lifecycle for all data entities. Action verification with pre/post state comparison. Expectations validation (high + medium priority). Form validation + navigation testing. Auto-fix. Stability protocol active. Coverage target: ≥90% element + action coverage, 100% CRUD coverage.
full (default): functional + all expectations validated + permutation testing (all select options, toggle states, form variations, tab views, flow branches) + error state testing + state persistence + security spot checks + accessibility (Layer 6) + performance (Layer 7) + responsive (Phase 7) + dark mode. Auto-fix. Coverage target: ≥90% all dimensions, ≥80% permutation coverage.
workflow: Test specific user journey via --workflow. See workflow-mode.md. Parses description into steps, executes sequentially, fixes bugs, re-runs entire workflow after each fix (max 3 re-runs). No expectations discovery (the workflow IS the expectation).
fix: Reproduce and fix known bug via --fix. See fix-mode.md. Structured cycle: understand → reproduce → evidence → fix (up to 2 attempts) → validate → regression check. No expectations discovery (the bug IS the expectation).
Individual layers can also be enabled via flags: --a11y, --perf, --responsive.
This skill improves itself over time. The following protocol runs automatically.
At the start of every run:
feedback/patterns.md — apply all listed patterns as constraints on this run.feedback/run-log.jsonl — note recent corrections to avoid repeating them.Do not mention the feedback loop to the user unless asked. Apply patterns silently.
After completing the user's task (before saying "done"):
Produce a structured internal review (not shown to user yet).
After @REVIEW, present findings to the user:
"I completed the task. Before wrapping up, I noticed some patterns that could improve how I handle browser QA testing in the future. Here's what I'd like to update:"
For EACH proposed change, present:
| Field | Content |
|---|---|
| What | The exact text to add/modify in feedback/patterns.md |
| Why | What triggered this — user correction, self-review finding, or repeated pattern |
| Impact | How this makes future runs better |
Then ask:
"Would you like me to apply any of these? You can approve, modify, or reject each one."
Rules:
feedback/patterns.md without user approvalfeedback/patterns.mdfeedback/run-log.jsonlThe feedback loop may ONLY modify:
feedback/patterns.md — learned behavioral patternsfeedback/run-log.jsonl — run historyIt may NEVER modify: