Browser automation via agent-browser CLI. Use when you need to navigate websites, verify deployed UI, test web apps, read online documentation, scrape data, fill forms, capture baseline screenshots before design work, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign".
Automates web browsers to navigate, interact with elements, and capture screenshots for testing and data tasks.
/plugin marketplace add gmickel/gmickel-claude-marketplace/plugin install flow-next@gmickel-claude-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/advanced.mdreferences/auth.mdreferences/commands.mdreferences/debugging.mdreferences/proxy.mdreferences/session-management.mdreferences/snapshot-refs.mdBrowser automation via Vercel's agent-browser CLI. Runs headless by default; use --headed for visible window. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
# Check installed + print version
command -v agent-browser >/dev/null 2>&1 && agent-browser --version || echo "MISSING: npm i -g agent-browser && agent-browser install"
Always run the version check at the start of a browser session. agent-browser iterates quickly — check for updates if the version is more than a week old:
npm view agent-browser version # Latest published
agent-browser open https://example.com
agent-browser snapshot -i # Interactive elements with refs
agent-browser click @e1
agent-browser wait --load networkidle # Wait for SPA to settle
agent-browser snapshot -i # Re-snapshot after change
Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
# Chain open + wait + snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
When to chain: Use && when you don't need intermediate output (e.g., open + wait + screenshot). Run separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload
agent-browser close # Close browser (aliases: quit, exit)
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive only (recommended)
agent-browser snapshot -i -C # Include cursor-interactive (divs with onclick, cursor:pointer)
agent-browser snapshot -i --json # JSON for parsing
agent-browser snapshot -c # Compact (remove empty)
agent-browser snapshot -d 3 # Limit depth
agent-browser snapshot -s "#main" # Scope to selector
agent-browser click @e1 # Click
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser dblclick @e1 # Double-click
agent-browser fill @e1 "text" # Clear + fill input
agent-browser type @e1 "text" # Type without clearing
agent-browser press Enter # Key press
agent-browser press Control+a # Key combination
agent-browser keydown Shift # Hold key down
agent-browser keyup Shift # Release key
agent-browser hover @e1 # Hover
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck
agent-browser select @e1 "option" # Dropdown
agent-browser select @e1 "a" "b" # Multi-select
agent-browser scroll down 500 # Scroll direction + pixels
agent-browser scrollintoview @e1 # Scroll element visible
agent-browser drag @e1 @e2 # Drag and drop
agent-browser upload @e1 file.pdf # Upload files
agent-browser get text @e1 # Element text
agent-browser get value @e1 # Input value
agent-browser get html @e1 # Element HTML
agent-browser get attr href @e1 # Attribute
agent-browser get title # Page title
agent-browser get url # Current URL
agent-browser get count "button" # Count matches
agent-browser get box @e1 # Bounding box (x, y, width, height)
agent-browser get styles @e1 # Computed styles (font, color, bg)
agent-browser is visible @e1 # Check visibility
agent-browser is enabled @e1 # Check enabled
agent-browser is checked @e1 # Check checkbox state
agent-browser wait @e1 # Wait for element visible
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Success" # Wait for text (-t)
agent-browser wait --url "**/dashboard" # Wait for URL pattern (-u)
agent-browser wait --load networkidle # Wait for network idle (-l)
agent-browser wait --fn "window.ready" # Wait for JS condition (-f)
agent-browser screenshot # Viewport to temp dir
agent-browser screenshot out.png # Save to file
agent-browser screenshot --full # Full page
agent-browser screenshot --annotate # Annotated with numbered element labels
agent-browser pdf out.pdf # Save as PDF
Compare accessibility tree or visual state before/after changes:
# Snapshot diff: compare current vs last snapshot
agent-browser snapshot -i # Baseline
agent-browser click @e2 # Action
agent-browser diff snapshot # See what changed
# Snapshot diff: compare vs saved file
agent-browser diff snapshot --baseline before.txt
# Visual pixel diff
agent-browser diff screenshot --baseline before.png
# Compare two URLs
agent-browser diff url https://staging.example.com https://prod.example.com
agent-browser diff url <url1> <url2> --wait-until networkidle
agent-browser diff url <url1> <url2> --selector "#main"
agent-browser diff url <url1> <url2> --screenshot # Visual diff
diff snapshot uses +/- like git diff. diff screenshot produces a diff image with changed pixels in red + mismatch percentage.
Alternative when you know the element (no snapshot needed):
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact # Exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" hover
Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN. Also caches refs — interact immediately without separate snapshot.
agent-browser screenshot --annotate
# Output includes image path + legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"
agent-browser click @e2 # Use ref from annotated screenshot
Use when: unlabeled icon buttons, visual-only elements, canvas/charts (invisible to text snapshots), or spatial reasoning needed.
Use eval to run JS in the browser. Shell quoting can corrupt complex expressions — use --stdin or -b to avoid issues.
# Simple expressions: regular quoting OK
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Alternative: base64 encoding (bypasses all shell escaping)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
Rules of thumb:
eval 'expression' with single quoteseval --stdin <<'EVALEOF'eval -b with base64Parallel isolated browsers (see auth.md for multi-user auth):
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
Auto-save/restore cookies and localStorage across browser restarts:
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close # State auto-saved to ~/.agent-browser/sessions/
# Next time: state auto-loaded
agent-browser --session-name myapp open https://app.example.com/dashboard
# Encrypt state at rest
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com
# Manage saved states
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7
# Auto-discover running Chrome with remote debugging
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot
# Or explicit CDP port
agent-browser --cdp 9222 snapshot
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
# List available iOS simulators
agent-browser device list
# Launch Safari on specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Same workflow: snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # Mobile gesture
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
Requires: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest).
Real devices: Use --device "<UDID>" (UDID from xcrun xctrace list devices).
Create agent-browser.json in project root for persistent settings:
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data"
}
Priority (lowest→highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG for custom path. All CLI options map to camelCase keys (--executable-path → "executablePath").
Default Playwright timeout is 60s. For slow pages, use explicit waits:
agent-browser wait --load networkidle # Best for slow pages
agent-browser wait "#content" # Wait for specific element
agent-browser wait @e1 # Wait for ref
agent-browser wait --url "**/dashboard" # Wait after redirects
agent-browser wait --fn "document.readyState === 'complete'"
agent-browser wait 5000 # Fixed duration (last resort)
Use wait --load networkidle after open for consistently slow sites.
Add --json for machine-readable output:
agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser is visible @e1 --json
# Video recording
agent-browser record start demo.webm
# ... actions ...
agent-browser record stop
agent-browser record restart take2.webm # Stop current + start new
# Chrome DevTools profiling
agent-browser profiler start
# ... actions ...
agent-browser profiler stop trace.json
See debugging.md for details.
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Verify result
# Login once
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Later: reuse saved auth
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
More auth patterns in auth.md.
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
agent-browser snapshot -i --json
agent-browser --headed open example.com # Show browser window
agent-browser console # View console messages
agent-browser errors # View page errors
agent-browser highlight @e1 # Highlight element
agent-browser --debug open example.com # Verbose output
See debugging.md for traces, profiling, video, common issues.
Always close sessions when done to avoid leaked processes:
agent-browser close # Close default session
agent-browser --session name close # Close specific session
If previous session not closed properly, daemon may still be running. agent-browser close cleans it up.
"Browser not launched" error: Daemon stuck. Kill and retry:
pkill -f agent-browser && agent-browser open <url>
--headed not showing window: Daemon reuse bug. If daemon started headless, --headed is ignored. Kill daemon first:
agent-browser close
pkill -f "node.*daemon.js.*AGENT_BROWSER"
pkill -f "Google Chrome for Testing"
sleep 1
agent-browser open <url> --headed
Window exists but not visible (macOS):
osascript -e 'tell application "Google Chrome for Testing" to activate'
Element not found: Re-snapshot after page changes. DOM may have updated.
Ref lifecycle: Refs (@e1, @e2) are invalidated when the page changes. Always re-snapshot after clicks that navigate, form submissions, or dynamic content loading.
| Topic | File |
|---|---|
| Full command reference | commands.md |
| Snapshot refs, lifecycle, troubleshooting | snapshot-refs.md |
| Auth, OAuth, 2FA, state persistence | auth.md |
| Sessions, parallel browsers, state | session-management.md |
| Debugging, profiling, video recording | debugging.md |
| Proxy, geo-testing, rotating proxies | proxy.md |
| Network mocking, tabs, frames, dialogs, settings | advanced.md |
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.