Agent Web Interface

Agent Web Interface is an MCP server that gives AI agents a compact, semantic interface to the browser.

Instead of exposing the full DOM or accessibility tree, it returns structured page snapshots: visible regions, readable content, interactive elements, stable element IDs, form context, screenshots, canvas inspection, and network activity. Agents can then navigate and act on pages using semantic IDs instead of brittle selectors or massive context dumps.

It is built for coding agents, browser agents, QA agents, research agents, and automation workflows that need reliable web interaction without wasting tokens on low-signal browser internals.

Why this exists

Browser automation is easy for scripts and hard for LLM agents.

Traditional browser tools expose either raw DOM, full accessibility trees, screenshots, or low-level selectors. That works for deterministic code, but it is inefficient for language models. The model has to spend context and reasoning budget separating useful UI intent from implementation noise.

Agent Web Interface changes the interface boundary.

The browser still runs through Puppeteer and Chrome DevTools Protocol, but the agent sees a smaller, more semantic representation of the page:

What regions exist on the page
What the user can read
What the user can interact with
Which elements are visible, enabled, selected, expanded, or required
Which stable eid should be used for the next action
What changed after the previous action

The goal is not to mirror the browser. The goal is to expose the page in the shape an agent can reason about.

The core abstraction

Agent Web Interface turns a browser page into an agent-readable snapshot.

A snapshot contains compact semantic information such as:

Page regions: header, navigation, main content, footer
Interactive elements: buttons, links, textboxes, checkboxes, radios, comboboxes
Readable content: headings, paragraphs, alerts, labels
Element state: visible, enabled, checked, selected, expanded, focused
Layout hints: bounding boxes and screen zones
Stable element IDs: eid values that can be reused by action tools

Agents act on these IDs:

{
  "eid": "btn-sign-in"
}

rather than reasoning from fragile CSS selectors or repeatedly scanning a large DOM tree.

This makes browser use more predictable for agents because observation and action are connected through a stable semantic contract.

What it is

Agent Web Interface is:

An MCP server for browser automation
A semantic observation layer over Puppeteer and CDP
A compact page representation for LLM agents
A stable eid-based action interface
A toolset for navigation, interaction, forms, screenshots, canvas, readability, and network inspection

Agent Web Interface is not:

A replacement for Puppeteer
A general-purpose browser
A visual testing framework
A scraping framework
A CAPTCHA or anti-bot bypass tool

Puppeteer and CDP remain the execution layer. Agent Web Interface changes what the agent sees and how it decides what to do next.

How it works

At a high level:

The agent calls a browser tool through MCP.
Agent Web Interface controls Chrome through Puppeteer and CDP.
The current page is reduced into semantic regions, readable content, and actionable elements.
The agent receives a compact snapshot instead of a raw browser dump.
The agent acts using stable element IDs.
Agent Web Interface waits for the page to stabilize and returns the updated state.

This keeps browser lifecycle, page representation, and action execution separated.

AI Agent
   ↓ MCP
Agent Web Interface
   ↓ semantic snapshots + stable eids
Puppeteer / Chrome DevTools Protocol
   ↓
Chrome / Chromium

Example agent loop

A typical browser-agent loop looks like this:

The agent calls navigate with a URL.
Agent Web Interface returns a compact page snapshot.
The agent calls find to locate a semantic element, such as a “Sign in” button or an email field.
The agent calls click, type, select, or press using the returned eid.
Agent Web Interface waits for the page to stabilize and returns the updated snapshot.
The agent continues from the changed page state instead of re-reading the entire DOM.

This gives the model a browser interaction loop based on semantic state transitions rather than raw page internals.

Example user journey

The following example shows how an agent might use Agent Web Interface inside a dashboard-style web app.

Task:

Find the failed payment from client@example.com, open it, add an internal note saying “Customer contacted. Waiting for bank confirmation.”, and confirm the note was saved.

1. Agent navigates to the payments dashboard

Tool call:

{
  "tool": "navigate",
  "input": {
    "url": "https://dashboard.example.com/payments"
  }
}

Tool response:

agent-web-interface

Popularity

What's Inside

README