From browserbase-cli
Runs adversarial UI tests in browser via browse CLI. Analyzes git diffs for changes or explores full app. Verifies functional correctness, accessibility, responsive layout, UX heuristics.
npx claudepluginhub browserbase/skills --plugin browserbase-cliThis skill is limited to using the following tools:
Test UI changes in a real browser. Your job is to **try to break things**, not confirm they work.
Audits web app UX by dogfooding as user persona: tracks emotional friction, click efficiency, resilience to back/refresh, return intent. Uses Playwright/Chrome MCP for live sites, outputs ranked reports.
Drives Chrome browser sessions against running web apps to detect bugs, UX issues, accessibility violations, performance problems, and copy errors, outputting structured JSON findings for QA passes, smoke tests, or audits.
Runs ScoutQA CLI for AI-powered exploratory testing of web apps, including smoke tests, accessibility audits, login flows, and e-commerce validation. Proactive after feature implementation.
Share bugs, ideas, or general feedback.
Test UI changes in a real browser. Your job is to try to break things, not confirm they work.
Three workflows:
The main agent coordinates — it plans test strategy, delegates to sub-agents, and merges results. Sub-agents do the actual browser testing.
You MUST complete all three planning rounds yourself and output them before launching any sub-agents. Planning happens in your own response — it is NOT delegated to sub-agents. Do not skip ahead to execution.
Round 1 — Functional: What are the core user flows? What should work? Write out each test as: action → expected result.
Round 2 — Adversarial: Re-read Round 1. What did you miss? Think about: different user types/roles, error paths, empty states, race conditions, edge inputs (empty, huge, special chars, rapid clicks).
Round 3 — Coverage gaps: Re-read Rounds 1–2. What about: accessibility (axe-core, keyboard-only), mobile viewports, console errors, visual consistency with the rest of the app?
Deduplicate: Merge all three rounds into one numbered list of tests. Remove overlaps. Assign each test to a group (e.g. Group A, Group B).
Then execute once — launch one sub-agent per group. Each sub-agent receives its specific list of tests to run, nothing more. Sub-agents do not explore or plan — they execute assigned tests and report results.
Output the three rounds, the merged plan, and the group assignments in your response before calling any Agent tool.
The main agent MUST include an explicit browse step limit in every sub-agent prompt. Sub-agents do not self-limit — they will run until done unless told otherwise.
As a rough heuristic: ~25 steps for a few targeted checks, ~40 for a full page with functional + adversarial + a11y, ~75 for multiple pages or a broad category. Adjust based on what the assigned tests actually require — these are starting points, not rules.
As a rough heuristic: ~25 steps for a few targeted checks, ~40 for a full page with functional + adversarial + a11y, ~75 for multiple pages or a broad category. Adjust based on what the assigned tests actually require — these are starting points, not rules.
Every sub-agent prompt must include:
You have a budget of N browse steps (each `browse` command = 1 step). Count your steps as you go. When you reach N, stop immediately and report:
- STEP_PASS/STEP_FAIL for every test you completed
- STEP_SKIP|<test-id>|budget reached for every test you didn't get to
Do not retry or continue after hitting the budget.
Run only these tests: [numbered list from the merged plan]
Do not explore beyond the assigned tests.
Do NOT generate an HTML report or write any files. Return only step markers and your findings as text.
The main agent should NOT run browse commands itself (except to verify the dev server is up). All testing happens in sub-agents.
When a sub-agent hits its budget, the main agent accepts the partial results as-is. Do not re-run or retry the sub-agent. Include SKIPPED tests in the final report so the developer knows what wasn't covered.
Every sub-agent reports back with:
Tests: 8 | Passed: 5 | Failed: 2 | Skipped: 1 | Pages visited: 2
The main agent merges into a final report with:
Tests: 20 | Passed: 14 | Failed: 4 | Skipped: 2 | Agents: 3 | Pass rate: 70%
Do not report "steps used" — browse command counts are implementation plumbing, not a meaningful metric for reviewers.
You are an adversarial tester. Your goal is to find bugs, not prove correctness.
Every test step MUST produce a structured assertion. Do not write freeform "this looks good."
For each test step, emit exactly one marker:
STEP_PASS|<step-id>|<evidence>
or
STEP_FAIL|<step-id>|<expected> → <actual>|<screenshot-path>
step-id: short identifier like homepage-cta, form-validation-error, modal-cancelevidence: what you observed that proves the step passed (element ref, text content, URL, eval result)expected → actual: what you expected vs what you gotscreenshot-path: path to the saved screenshot (failures only — see Screenshot Capture below)Every STEP_FAIL MUST have an accompanying screenshot so the developer can see what went wrong visually.
When a test step fails:
# 1. Take a screenshot immediately after observing the failure
browse screenshot --path .context/ui-test-screenshots/<step-id>.png
# If --path is not supported, take the screenshot and save manually:
browse screenshot
# The browse CLI will output the screenshot path — move/copy it:
cp /tmp/browse-screenshot-*.png .context/ui-test-screenshots/<step-id>.png
Setup the screenshot directory at the start of any test run:
mkdir -p .context/ui-test-screenshots
Rules:
double-submit.png, axe-audit.png, modal-focus-trap.png).context/ui-test-screenshots/ — this directory is gitignored and accessible to the developer and other agents<session>-<step-id>.png (e.g., signup-double-submit.png)<step-id>-baseline.pngbrowse eval returns structured data you can inspect. Examples: axe-core violation count, document.title, form field value, console error array, element count.@0-12 button "Save". An element either exists in the tree or it doesn't.This is the core verification loop. Use it for every interaction:
# 1. BEFORE: capture state
browse snapshot
# Record: what elements exist, their text, their refs
# 2. ACT: perform the interaction
browse click @0-12
# 3. AFTER: capture new state
browse snapshot
# Compare: what changed? What appeared? What disappeared?
# 4. ASSERT: emit marker based on comparison
# If dialog appeared: STEP_PASS|modal-open|dialog "Confirm" appeared at @0-20
# If nothing changed:
browse screenshot --path .context/ui-test-screenshots/modal-open.png
# STEP_FAIL|modal-open|expected dialog to appear → snapshot unchanged|.context/ui-test-screenshots/modal-open.png
which browse || npm install -g @browserbasehq/browse-cli
This skill runs many browse commands (snapshots, clicks, evals). To avoid approving each one, add browse to your allowed commands:
Add both patterns to .claude/settings.json (project-level) or ~/.claude/settings.json (user-level):
{
"permissions": {
"allow": [
"Bash(browse:*)",
"Bash(BROWSE_SESSION=*)"
]
}
}
The first pattern covers plain browse commands. The second covers parallel sessions (BROWSE_SESSION=signup browse open ...). Both are needed to avoid approval prompts.
| Target | Mode | Command | Auth |
|---|---|---|---|
localhost / 127.0.0.1 | Local | browse env local | None needed (clean isolated local browser by default) |
| Deployed/staging site | Remote | browse env remote | cookie-sync → --context-id |
Rule: If the target URL contains localhost or 127.0.0.1, always use browse env local.
browse env local
browse open http://localhost:3000
browse env local uses a clean isolated local browser by default, which is best for reproducible localhost QA runs.
Use local-mode variants only when needed:
browse env local --auto-connect — auto-discover existing local Chrome, fallback to isolated. Use this only when the test explicitly needs existing local login/cookies/state.browse env local <port|url> — attach to a specific CDP target (explicit local browser attach).# Step 1: Sync cookies from local Chrome to Browserbase
node .claude/skills/cookie-sync/scripts/cookie-sync.mjs --domains your-app.com
# Output: Context ID: ctx_abc123
# Step 2: Switch to remote mode
browse env remote
browse open https://staging.your-app.com --context-id ctx_abc123 --persist
browse snapshot
# ... run tests ...
browse stop
Cookie-sync flags: --domains, --context, --stealth, --proxy "City,ST,US"
git diff --name-only HEAD~1 # or: git diff --name-only / git diff --name-only main...HEAD
git diff HEAD~1 -- <file> # read actual changes
Categorize changed files:
| File pattern | UI impact | What to test |
|---|---|---|
*.tsx, *.jsx, *.vue, *.svelte | Component | Render, interaction, state, edge cases |
pages/**, app/**, src/routes/** | Route/page | Navigation, page load, content, 404 handling |
*.css, *.scss, *.module.css | Style | Visual appearance (screenshot), responsive |
*form*, *input*, *field* | Form | Validation, submission, empty input, long input, special chars |
*modal*, *dialog*, *dropdown* | Interactive | Open/close, escape, focus trap, cancel vs confirm |
*nav*, *menu*, *header* | Navigation | Links, active states, routing, keyboard nav |
| Non-UI files only | None | Skip — report "no UI tests needed" |
Detect framework: cat package.json | grep -E '"(next|react|vue|nuxt|svelte|@sveltejs|angular|vite)"'
| Framework | Default port | File → URL pattern |
|---|---|---|
| Next.js App Router | 3000 | app/dashboard/page.tsx → /dashboard |
| Next.js Pages Router | 3000 | pages/about.tsx → /about |
| Vite | 5173 | Check router config |
| Nuxt | 3000 | pages/index.vue → / |
| SvelteKit | 5173 | src/routes/+page.svelte → / |
| Angular | 4200 | Check routing module |
Before testing, verify the dev server is serving the code from the diff — not a stale branch.
If testing a PR or specific branch:
# Check what branch is currently checked out
git branch --show-current
# If it's not the PR branch, switch to it
git fetch origin <branch> && git checkout <branch>
# Install deps — the lockfile may differ between branches
yarn install # or npm install / pnpm install
If the dev server was already running on a different branch, restart it after checkout.
Find a running dev server:
for port in 3000 3001 5173 4200 8080 8000 5000; do
s=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port" 2>/dev/null)
if [ "$s" != "000" ]; then echo "Dev server on port $port (HTTP $s)"; fi
done
If nothing found: tell the user to start their dev server.
Verify it actually renders:
After browse open + browse snapshot, check that the accessibility tree contains real page content (navigation, headings, interactive elements) — not just an error overlay or empty body. Next.js dev servers can return HTTP 200 while showing a full-screen build error dialog. If the snapshot is empty or dominated by an error dialog, the server is broken — fix the build before testing.
For each changed area, plan both happy path AND adversarial tests:
Test Plan (based on git diff)
=============================
Changed: src/components/SignupForm.tsx (added email validation)
1. [happy] Valid email submits successfully
URL: http://localhost:3000/signup
Steps: fill valid email → submit → verify success message appears
2. [adversarial] Invalid email shows error
Steps: fill "not-an-email" → submit → verify error message appears
3. [adversarial] Empty form submission
Steps: click submit without filling anything → verify error, no crash
4. [adversarial] XSS in email field
Steps: fill "<script>alert(1)</script>" → submit → verify sanitized/rejected
5. [adversarial] Rapid double-submit
Steps: click submit twice quickly → verify no duplicate submission
6. [adversarial] Keyboard-only flow
Steps: Tab to email → type → Tab to submit → Enter → verify success
browse stop 2>/dev/null
mkdir -p .context/ui-test-screenshots
# localhost/default QA → clean, reproducible local run
browse env local
For each test, follow the before/after pattern:
# Navigate
browse open http://localhost:3000/path
browse wait load
# BEFORE snapshot
browse snapshot
# Note the current state: elements, refs, text
# ACT
browse click @0-ref
# or: browse fill "selector" "value"
# or: browse type "text"
# or: browse press Enter
# AFTER snapshot
browse snapshot
# Compare against BEFORE: what changed?
# ASSERT with marker
# STEP_PASS|step-id|evidence OR STEP_FAIL|step-id|expected → actual
## UI Test Results
### STEP_PASS|valid-email-submit|status "Thanks!" appeared at @0-42 after submit
- URL: http://localhost:3000/signup
- Before: form with email input @0-3, submit button @0-7
- Action: filled "user@test.com", clicked @0-7
- After: form replaced by status element with "Thanks! We'll be in touch."
### STEP_FAIL|double-submit|expected single submission → form submitted twice|.context/ui-test-screenshots/double-submit.png
- URL: http://localhost:3000/signup
- Before: form with submit button @0-7
- Action: clicked @0-7 twice rapidly
- After: two success toasts appeared, suggesting duplicate submission
- Screenshot: .context/ui-test-screenshots/double-submit.png
- Suggestion: disable submit button after first click, or debounce the handler
---
**Summary: 4/6 passed, 2 failed**
Failed: double-submit, xss-sanitization
Screenshots saved to `.context/ui-test-screenshots/` — open any failed step's screenshot to see the broken state.
Always browse stop when done.
After producing the text report, generate a standalone HTML report that a reviewer can open in a browser. The report embeds screenshots inline (base64) so it works as a single file — no external dependencies.
Why: Text reports are good for the agent conversation, but reviewers (PMs, designers, other engineers) want a visual artifact they can open, scan, and share. Screenshots inline make failures immediately obvious.
| Placeholder | Value |
|---|---|
{{TITLE}} | Report title for <title> tag (e.g., "UI Test: PR #1234 — OAuth Settings") |
{{TITLE_HTML}} | Report title for the visible <h1>. If a PR URL is available, wrap the PR reference in an <a> tag so it's clickable (e.g., UI Test: <a href="https://github.com/org/repo/pull/1234">PR #1234</a> — OAuth Settings). If no URL, use plain text same as {{TITLE}}. |
{{META}} | One-line context: date, app URL, user, branch |
{{TOTAL_TESTS}} | Total STEP_PASS + STEP_FAIL count |
{{AGENT_COUNT}} | Number of sub-agents that ran |
{{PASS_COUNT}} | Number of STEP_PASS |
{{FAIL_COUNT}} | Number of STEP_FAIL |
{{PASS_RATE}} | Integer percentage (e.g., "92") |
{{RATE_CLASS}} | good (≥90%), warn (70–89%), bad (<70%) |
{{FAILURES_SECTION}} | HTML for failed test cards (see below) |
{{PASSES_SECTION}} | HTML for passed test cards (see below) |
<details> card. Failed tests should be open by default so reviewers see them immediately:<!-- Failed test card (open by default) -->
<div class="section">
<h2>Failures <span class="count">{{FAIL_COUNT}}</span></h2>
<details class="test-card fail" open>
<summary>
<span class="badge fail">FAIL</span>
<span class="step-id">step-id-here</span>
<span class="evidence">expected → actual</span>
</summary>
<div class="body">
<dl>
<dt>URL</dt><dd>http://localhost:3000/path</dd>
<dt>Action</dt><dd>What was done</dd>
<dt>Expected</dt><dd>What should have happened</dd>
<dt>Actual</dt><dd>What happened instead</dd>
</dl>
<div class="suggestion">Fix: description of suggested fix</div>
<div class="screenshot">
<img src="data:image/png;base64,..." alt="Screenshot of failure">
<div class="caption">step-id.png — captured at moment of failure</div>
</div>
</div>
</details>
</div>
<!-- Passed test card (collapsed by default) -->
<div class="section">
<h2>Passed <span class="count">{{PASS_COUNT}}</span></h2>
<details class="test-card pass">
<summary>
<span class="badge pass">PASS</span>
<span class="step-id">step-id-here</span>
<span class="evidence">evidence summary</span>
</summary>
<div class="body">
<dl>
<dt>URL</dt><dd>http://localhost:3000/path</dd>
<dt>Evidence</dt><dd>What was observed</dd>
</dl>
</div>
</details>
</div>
# Convert screenshot to base64 data URI
base64 -i .context/ui-test-screenshots/step-id.png | tr -d '\n'
# Use as: src="data:image/png;base64,<output>"
Read each screenshot file referenced in STEP_FAIL markers, base64-encode it, and embed it as an <img src="data:image/png;base64,..."> in the corresponding test card. For STEP_PASS, only embed a screenshot if one was explicitly taken (e.g., baseline screenshots).
.context/ui-test-report.html:# Write the generated HTML
cat > .context/ui-test-report.html << 'REPORT_EOF'
<!DOCTYPE html>
...generated report...
REPORT_EOF
# Open it for the reviewer
open .context/ui-test-report.html # macOS
# xdg-open .context/ui-test-report.html # Linux
Report saved to .context/ui-test-report.html and offer to open it.Rules:
open by default; passed cards are collapsedApply these to every interactive element you test. Read references/adversarial-patterns.md for the full pattern library (forms, modals, navigation, error states, keyboard accessibility).
These produce structured data, not judgment calls. Use them as the strongest form of assertion.
| Check | What it catches | Assertion |
|---|---|---|
| axe-core | WCAG violations | violations.length === 0 |
| Console errors | Runtime exceptions, failed requests | empty error array |
| Broken images | Missing/failed image loads | no images with naturalWidth === 0 |
| Form labels | Inputs without accessible labels | every input has hasLabel: true |
For the exact browse eval recipes, read references/browser-recipes.md.
No diff, no plan — just open the app and try to break it. Use this when the user says "test my app", "find bugs", or "QA this site."
package.json to detect the framework, then open the root URL and snapshot to see what's thereDon't try to be systematic about coverage. Just explore like a user would, but with the intent to break things. The agent is good at this — let it roam.
/does-not-exist) — is it custom or default?Run independent test groups concurrently using named browse sessions (BROWSE_SESSION=<name>). Each session gets its own browser. Works with both local and remote mode.
Use when testing multiple pages or categories and you want faster wall clock time.
Read references/parallel-testing.md for the full workflow: session setup, agent fan-out, cookie-sync for auth, and result merging.
Check whether changed UI matches the rest of the app visually. Read references/design-consistency.md when doing visual or design checks.
| Category | How | Assertion type |
|---|---|---|
| Accessibility | axe-core + keyboard nav | Deterministic (violation count) |
| Visual Quality | Screenshot + heuristic evaluation | Visual judgment (weakest — note specifics) |
| Responsive | Viewport sweep + screenshots | Visual + deterministic (overflow check) |
| Console Health | Console capture eval | Deterministic (error count) |
| UX Heuristics | Snapshot + Laws of UX + Nielsen's | Structured judgment (cite specific heuristic) |
| Error States | Navigate to empty/error states | Before/after comparison |
| Data Display | Snapshot on tables/dashboards | Element match (column count, formatting) |
| Design Consistency | Screenshot baseline + changed page comparison | Visual judgment (cite specific property) |
| Exploratory | Free navigation + adversarial testing | Before/after + judgment |
Reference guides (load on demand):
For worked examples with exact commands, read EXAMPLES.md if you need to see the assertion protocol in action.
browse screenshot immediately on STEP_FAIL, save to .context/ui-test-screenshots/<step-id>.pngbrowse env local first for reproducible runs; use --auto-connect only when existing local state is requiredbrowse stop when done — for parallel runs, stop every named sessionbrowse stop, retry. For zombies: pkill -f "browse.*daemon"curl http://localhost:<port> — ask user to start itbrowse eval with await fails: Use .then() instead — browse eval doesn't support top-level awaitbrowse snapshot again — refs change on page updatebrowse wait load or browse wait selector ".expected" before snapshotting/ first, then click through--context <id>, try --stealthbrowse command uses BROWSE_SESSION=<name> — without it, commands go to the default sessionBROWSE_SESSION=<name> browse stop. For zombies: pkill -f "browse.*<name>.*daemon"