Skill

abp-browser

ALWAYS use this skill when navigating websites, browsing the web, or interacting with web pages. Provides tool usage guide, best practices, and debugging tips for the ABP browser MCP tools.

From agent-browser-protocol

Install

Run in your terminal

npx claudepluginhub theredsix/abp-npm --plugin agent-browser-protocol

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Similar Skills

dispatching-parallel-agents

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

137.7k

executing-plans

Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.

superpowers

137.7k

brainstorming

7 files

Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.

superpowers

137.7k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitFeb 24, 2026

Actions

View Source View Plugin View on GitHub View README

ABP Browser Control Guide

How ABP Works

ABP pauses JavaScript and virtual time between your actions. The page is frozen until your next tool call.

When you call any action tool (browser_action, browser_scroll, browser_navigate, etc.):

ABP resumes JS execution
ABP dispatches your action(s)
ABP waits ~500ms for the page to settle (rendering, network, scripts)
ABP captures a screenshot automatically
ABP re-pauses JS execution
You receive the response with the screenshot

One tool call = one complete turn. Screenshots are included automatically with every action response. There is no need to take a separate screenshot after performing an action.

Batching Actions

browser_action accepts 1-3 actions per call. Batch common workflows to reduce round-trips:

Click, type, submit: [{mouse_click, x, y}, {keyboard_type, text}, {keyboard_press, key: ENTER}]
Click and type: [{mouse_click, x, y}, {keyboard_type, text}]
Single click: [{mouse_click, x, y}]

Actions execute sequentially with a 20ms pause between each. One screenshot is taken after all actions complete.

Do NOT batch scrolling — use browser_scroll separately.

Waiting for Slow Content

Sometimes 500ms isn't enough for the page to finish loading (AJAX, animations, redirects). When the screenshot shows incomplete content:

Call browser_screenshot to wait and observe. It runs the same resume-wait-capture-pause cycle without performing any action, giving the page another chance to settle. Repeat until the content appears.

Markup Overlays

Pass markup to browser_screenshot to see visual overlays on the page. Available overlays:

clickable — Green outlines around clickable elements (links, buttons, onclick handlers)
typeable — Orange outlines around text inputs and textareas
scrollable — Purple dashed outlines around scrollable containers
selected — Blue outline around the currently focused element
grid — Red coordinate grid at 100px intervals with viewport coordinate labels

Use grid to map viewport coordinates for targeting clicks. Use clickable and typeable to identify interactive elements visually.

Tool Reference (15 tools)

All tab_id parameters are optional and default to the active tab.

Input:

browser_action — 1-3 actions: mouse_click (x, y), keyboard_type (text), keyboard_press (key, modifiers?), mouse_hover (x, y), mouse_drag (start_x, start_y, end_x, end_y). Keys are ALL-CAPS (ENTER, TAB, ESCAPE, CONTROL, META, etc.). Abbreviations accepted: CTRL, CMD, ESC, DEL.
browser_scroll — x, y (where wheel fires), delta_x?, delta_y? (positive=down/right)
browser_slider — orientation (horizontal/vertical), track bounds, current position, min, max, target_value. Calculates and executes drag automatically.

Navigation:

browser_navigate — url? OR action? (back, forward, reload)
browser_tabs — action? (list, new, close, info, activate, stop; default: list), tab_id?, url?

Observation:

browser_screenshot — markup?, disable_markup?, format?
browser_javascript — expression (required). Data extraction and DOM inspection ONLY — do NOT use for interaction; prefer browser_action.
browser_text — selector?

Situational:

browser_dialog — action? (check, accept, dismiss; default: check), prompt_text?
browser_downloads — action? (list, status, cancel, content; default: list), download_id?, state?, limit?, max_size?
browser_files — chooser_id (required), files?, content_files?, path?, cancel?
browser_select_picker — Respond to pending select/dropdown popups

Browser:

browser_get_status — no params
browser_shutdown — timeout_ms?

Debugging

Use browser_get_status to check if the browser is ready and connected.

Session data (history database, screenshots) is stored in a session directory. To find the paths, query the REST API:

curl http://localhost:8222/api/v1/browser/session-data

The session directory contains:

<session_dir>/
├── history.db           # SQLite database with sessions, actions, events
└── screenshots/         # Auto-saved before/after WebP screenshots per action

Query the database:

-- Recent actions
SELECT id, type, status, url, error FROM actions ORDER BY id DESC LIMIT 10;
-- Events for an action
SELECT * FROM events WHERE action_id = <id>;
-- Screenshot paths
SELECT screenshot_before_path, screenshot_after_path FROM actions WHERE id = <id>;

Tips

browser_javascript uses expression as its parameter name (not script)
browser_scroll requires x, y coordinates where the mouse wheel fires — target the element center
Scroll direction: delta_y positive = scroll down, negative = scroll up
JS is paused between actions — timers and animations don't advance until your next tool call
Key names are ALL-CAPS: ENTER, TAB, ESCAPE, BACKSPACE, ARROWUP, ARROWDOWN, etc.
Modifier keys for keyboard_press: SHIFT, CONTROL, ALT, META (or abbreviations CTRL, CMD, OPT)