Help us improve
Share bugs, ideas, or general feedback.
From agent-web-interface
Drives a real browser against live or staging websites. Navigate, click, type, snapshot, fill forms, and extract reliable selectors for multi-step flows (login, search, checkout, etc.).
npx claudepluginhub lespaceman/athena-workflow-marketplace --plugin agent-web-interfaceHow this skill is triggered — by the user, by Claude, or both
Slash command
/agent-web-interface:agent-web-interface <url> <goal> — e.g. https://example.com 'Add item to cart'<url> <goal> — e.g. https://example.com 'Add item to cart'The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill to open live web pages, carry out actions, move through multi-step flows, validate page state, and capture selectors for automation.
Reference for agent-browser commands to navigate pages, snapshot elements, interact (click/fill/type), extract data. For web testing, form automation, screenshots.
CLI for browser automation: navigate sites, snapshot elements for refs, fill forms, click buttons, screenshot, scrape data, test web apps. Chains commands, imports auth state.
Automates browser interactions via Chrome DevTools Protocol. Screenshots, clicks, types, navigates, reads page accessibility trees, extracts text, and executes JavaScript in web pages. Use when the user asks to interact with a website, test a web app, fill web forms, scrape web content, or automate browser tasks.
Share bugs, ideas, or general feedback.
Use this skill to open live web pages, carry out actions, move through multi-step flows, validate page state, and capture selectors for automation.
Common uses:
This skill drives the agent-web-interface MCP server — installing the skill does not install that server. If the mcp__agent-web-interface__* tools aren't available, add the server first:
claude mcp add agent-web-interface -- npx agent-web-interface@latest
For Claude Desktop / Cursor / VS Code, register it under the key agent-web-interface:
{
"mcpServers": {
"agent-web-interface": {
"command": "npx",
"args": ["agent-web-interface@latest"]
}
}
}
See the agent-web-interface README for browser modes (user, persistent, isolated) and options.
Why this skill must load before browser MCP calls: without it, browser work reliably produces brittle CSS selectors, narrative prose instead of structured observations, missed state transitions, and shallow page coverage. The selector-capture and state-observation patterns below are what the downstream exploration report, spec generation, and test-writing skills depend on.
Subagent dispatch: when a subagent is dispatched with browser MCP access, this is the first skill it should load — the exploration upstream of spec generation and test writing depends on the patterns below.
Parse the target URL and exploration goal from: $ARGUMENTS
list_pages and explicit page_id when session state may be ambiguoussnapshot output for quick orientationfind with label, kind, and region to narrow targetsget_form when the task is clearly form-drivenget_element for a chosen target, offsets, or selector extractioneidsAlways include:
find over manual scanning when snapshots are trimmed or the page is densefind aggressively with kind, label, and region before broad explorationeids as short-lived after large mutations; reacquire targets instead of assuming old ids still workget_form as a helper, not as ground truth; busy pages may contain multiple unrelated formsobservations, baseline, and diff to confirm whether an action actually changed the pageEvery navigation or action returns a <state> snapshot:
<state step="N" title="Page Title" url="https://...">
<meta view="1521x752" scroll="0,0" layer="main" />
<baseline reason="first|navigation" />
<diff type="mutation" added="N" removed="N" />
<observations>...</observations>
<region name="main">...</region>
</state>
| Element | Purpose |
|---|---|
<meta> | Viewport size, scroll position, active layer |
<baseline reason="..."> | Fresh snapshot - "first" (initial load) or "navigation" (URL change) |
<diff type="mutation"> | Incremental update with added/removed counts |
<observations> | What appeared/disappeared after the action |
<region> | Semantic page areas with interactive elements |
After actions (click, type, select), watch for changes:
<observations>
<appeared when="action">Your Bag is empty</appeared>
<appeared when="action" role="status"></appeared>
<disappeared when="action" role="status"></disappeared>
</observations>
<appeared>: New content visible after action<disappeared>: Content removed after actionrole attribute: Semantic role (status, alert, dialog)Page content is organized into semantic regions:
<region name="main">
<link id="..." href="...">Link text</link>
<btn id="...">Button text</btn>
<!-- trimmed 50 items. Use find with region=main to see all -->
</region>
<region name="nav" unchanged="true" count="90" />
main - Primary content areanav - Navigation menusheader - Page headerfooter - Page footerform - Form containersaside - Sidebarssearch - Search areasunchanged="true" count="N" - Region didn't change, shows element count<!-- trimmed N items --> - Use find with region filter to see all| Tag | Element | Key Attributes |
|---|---|---|
<link> | Hyperlink | id, href |
<btn> | Button | id, val, enabled |
<rad> | Radio button | id, val, checked, focused |
<sel> | Dropdown/select | id, expanded, focused |
<elt> | Input/generic | id, type, val, focused, enabled, selected |
| Attribute | Meaning |
|---|---|
id | Element ID (eid) - use this to target the element |
enabled="false" | Element is disabled (common in sequential forms) |
checked="true" | Radio/checkbox is selected |
focused="true" | Element has keyboard focus |
expanded="true" | Dropdown is open |
selected="true" | Option/tab is selected |
val | Element value |
Many sites use progressive enablement: later options stay disabled until earlier choices are made.
<!-- Step 1: Model selection enabled -->
<rad id="model1" val="pro">iPhone 17 Pro</rad>
<rad id="color1" enabled="false" val="silver">Silver</rad> <!-- disabled -->
<!-- After selecting model, colors become enabled -->
<rad id="model1" checked="true" val="pro">iPhone 17 Pro</rad>
<rad id="color1" val="silver">Silver</rad> <!-- now enabled -->
Common places this appears:
Strategy: If you see enabled="false", work upward to identify and complete the prerequisite step before continuing.
<result type="find" page_id="..." snapshot_id="..." count="N">
<match eid="abc123"
kind="button|link|radio|checkbox|textbox|combobox|heading|image"
label="Button text"
region="main|nav|header|footer"
selector="role=button[name="..."]"
visible="true"
enabled="true"
href="..." />
</result>
kind: Element type filterlabel: Case-insensitive substring matchregion: Restrict to semantic arealimit: Max results (default 10)include_readable: Include text content (default true)<node eid="abc123" kind="link" region="main" group="tbody-28"
x="147.875" y="11.5" w="97.97" h="16.5"
display="inline" zone="top-left">
Element label text
<selector primary='role=link[name="..."]' />
<attrs href="..." />
</node>
primary: Best Playwright selectorx, y, w, h, zonegroup: Logical grouping (for tables, lists)<forms page="page-id">
<form id="form-xxx" intent="search|login|signup|checkout" completion="100%">
<input eid="748" purpose="search">Search Wikipedia</input>
<combobox eid="750" purpose="selection" filled="true">EN</combobox>
<button eid="820" type="submit" primary="true">Search</button>
<next eid="748" reason="Optional field" />
</form>
</forms>
intent: Form purpose (search, login, checkout, etc.)completion: Percentage fillednext: Suggested next field to fill with reason<result type="list_pages" status="success">
<pages count="N">
<page page_id="page-xxx" url="https://..." title="Page Title" />
</pages>
</result>
Use page_id to target specific browser tabs.
The browser persists across conversation sessions — tabs from prior sessions remain open. On a new session, there is no "current" page; actions without page_id may target an arbitrary tab.
When encountering a "no page/session" error or resuming from a prior session:
list_pages to see all open tabs with page_id, URL, and titlepage_id explicitly to all subsequent calls (snapshot, find, click, etc.)Caveats:
list_pages shows the URL at open time. For SPAs, use snapshot with page_id to see actual current state.page_id to target the correct one.list_pages instead of relying on prior turn memory.<error>Field not found in any form: abc123</error>
Common errors:
When this happens:
find or get_form from the latest stateeids<canvas> elements render pixels, not DOM nodes — standard selectors don't work inside them. Use these tools for canvas-based UIs (drawing apps, games, visualizations):
inspect_canvas — the key tool. Pass a canvas eid and it auto-detects the rendering library (Fabric.js, Konva, PixiJS, Phaser, Three.js, EaselJS, or raw canvas), queries the scene graph for objects with positions/sizes/labels, and returns an annotated screenshot with coordinate grid overlay and bounding boxes. Supports configurable grid_spacing (use 10px for precise handle targeting).click with eid + x/y — click at offset relative to canvas top-left (e.g., select a shape)drag with eid + source/target coordinates — drag within canvas (e.g., move objects, scale/rotate handles)screenshot with eid — capture just the canvas to visually verify stateWorkflow: find → get_element (position) → inspect_canvas (discover objects) → click/drag (interact) → re-inspect to verify.
find when snapshot shows <!-- trimmed --><baseline> vs <diff> to know if you have full or partial statepage_id when working across sessions or with multiple tabseids/agent-web-interface https://airbnb.com Walk through the search and booking flow for stays in Tokyo
/agent-web-interface https://apple.com/store Configure an iPhone and add it to the bag, then summarize the steps
/agent-web-interface https://developer.mozilla.org Find the Fetch API docs and note how the search flow behaves
/agent-web-interface https://example.com/login Extract the login form selectors and field purposes