Agent Web Interface
Agent Web Interface is an MCP server that gives AI agents a compact, semantic interface to the browser.
Instead of exposing the full DOM or accessibility tree, it returns structured page snapshots: visible regions, readable content, interactive elements, stable element IDs, form context, screenshots, canvas inspection, and network activity. Agents can then navigate and act on pages using semantic IDs instead of brittle selectors or massive context dumps.
It is built for coding agents, browser agents, QA agents, research agents, and automation workflows that need reliable web interaction without wasting tokens on low-signal browser internals.
Why this exists
Browser automation is easy for scripts and hard for LLM agents.
Traditional browser tools expose either raw DOM, full accessibility trees, screenshots, or low-level selectors. That works for deterministic code, but it is inefficient for language models. The model has to spend context and reasoning budget separating useful UI intent from implementation noise.
Agent Web Interface changes the interface boundary.
The browser still runs through Puppeteer and Chrome DevTools Protocol, but the agent sees a smaller, more semantic representation of the page:
- What regions exist on the page
- What the user can read
- What the user can interact with
- Which elements are visible, enabled, selected, expanded, or required
- Which stable
eid should be used for the next action
- What changed after the previous action
The goal is not to mirror the browser. The goal is to expose the page in the shape an agent can reason about.
The core abstraction
Agent Web Interface turns a browser page into an agent-readable snapshot.
A snapshot contains compact semantic information such as:
- Page regions: header, navigation, main content, footer
- Interactive elements: buttons, links, textboxes, checkboxes, radios, comboboxes
- Readable content: headings, paragraphs, alerts, labels
- Element state: visible, enabled, checked, selected, expanded, focused
- Layout hints: bounding boxes and screen zones
- Stable element IDs:
eid values that can be reused by action tools
Agents act on these IDs:
{
"eid": "btn-sign-in"
}
rather than reasoning from fragile CSS selectors or repeatedly scanning a large DOM tree.
This makes browser use more predictable for agents because observation and action are connected through a stable semantic contract.
What it is
Agent Web Interface is:
- An MCP server for browser automation
- A semantic observation layer over Puppeteer and CDP
- A compact page representation for LLM agents
- A stable
eid-based action interface
- A toolset for navigation, interaction, forms, screenshots, canvas, readability, and network inspection
Agent Web Interface is not:
- A replacement for Puppeteer
- A general-purpose browser
- A visual testing framework
- A scraping framework
- A CAPTCHA or anti-bot bypass tool
Puppeteer and CDP remain the execution layer. Agent Web Interface changes what the agent sees and how it decides what to do next.
How it works
At a high level:
- The agent calls a browser tool through MCP.
- Agent Web Interface controls Chrome through Puppeteer and CDP.
- The current page is reduced into semantic regions, readable content, and actionable elements.
- The agent receives a compact snapshot instead of a raw browser dump.
- The agent acts using stable element IDs.
- Agent Web Interface waits for the page to stabilize and returns the updated state.
This keeps browser lifecycle, page representation, and action execution separated.
AI Agent
↓ MCP
Agent Web Interface
↓ semantic snapshots + stable eids
Puppeteer / Chrome DevTools Protocol
↓
Chrome / Chromium
Example agent loop
A typical browser-agent loop looks like this:
- The agent calls
navigate with a URL.
- Agent Web Interface returns a compact page snapshot.
- The agent calls
find to locate a semantic element, such as a “Sign in” button or an email field.
- The agent calls
click, type, select, or press using the returned eid.
- Agent Web Interface waits for the page to stabilize and returns the updated snapshot.
- The agent continues from the changed page state instead of re-reading the entire DOM.
This gives the model a browser interaction loop based on semantic state transitions rather than raw page internals.
Example user journey
The following example shows how an agent might use Agent Web Interface inside a dashboard-style web app.
Task:
Find the failed payment from client@example.com, open it, add an internal note saying “Customer contacted. Waiting for bank confirmation.”, and confirm the note was saved.
1. Agent navigates to the payments dashboard
Tool call:
{
"tool": "navigate",
"input": {
"url": "https://dashboard.example.com/payments"
}
}
Tool response: