From web-bench
Methodology for executing a single WebBench benchmark task via browser automation. Triggers: "execute task", "run task", "perform benchmark task", "browser task". Interprets the natural-language task description, defines the required browser actions, and specifies what final evidence to capture (for example screenshot + snapshot). Records an execution trace with actions taken and errors encountered. Does NOT evaluate success — use evaluate-task for that.
npx claudepluginhub lespaceman/athena-workflow-marketplace --plugin web-benchThis skill is limited to using the following tools:
Execute a single WebBench task using a browser-capable calling context. This skill does not own browser MCP tools; it defines the execution protocol the caller should follow while navigating, clicking, typing, and extracting data.
Automates browser tasks like form filling, data extraction, and multi-step web workflows using Yutori's Navigator agent. Useful for interacting with websites via clicks, typing, or navigation.
Automates Playwright browser tasks: navigates URLs, captures screenshots and accessibility snapshots, interacts with UI elements via click/type/fill, reports findings for E2E testing, deployments, and audits.
Provides cloud browser automation via Browser Use API for AI-driven web scraping, form filling, multi-step tasks, and screenshots when local browser unavailable.
Share bugs, ideas, or general feedback.
Execute a single WebBench task using a browser-capable calling context. This skill does not own browser MCP tools; it defines the execution protocol the caller should follow while navigating, clicking, typing, and extracting data.
You receive a single task object:
{"id": 42, "url": "https://acehardware.com", "category": "READ", "task": "Navigate to the news section and summarize..."}
Before any browser interaction by the calling context:
date +%s%3N
Save this as start_time_ms. You will need it for the result record.
navigate → task.url
The calling context should wait for the page to load and take an initial snapshot to understand the page structure.
Read the task description carefully. Classify it:
| Category | What to Do |
|---|---|
| READ | Navigate to the right page/section, extract the requested information. Your "result" is the extracted data. |
| CREATE | Fill forms, create accounts/entries/posts as described. Your "result" is confirmation the item was created. |
| UPDATE | Find existing data and modify it as described. Your "result" is confirmation the update was applied. |
| DELETE | Find and remove the specified item. Your "result" is confirmation of deletion. |
| FILE_MANIPULATION | Download the specified file. Your "result" is the filename and confirmation of download. |
Work through the task step by step:
snapshot or find to understand the current page stateclick, type, select, etc.)Key principles:
find with kind filters to locate interactive elements (buttons, links, textboxes)snapshot to get a full page state when you need orientationscreenshot to visually verify state when snapshots are ambiguous| Obstacle | Strategy |
|---|---|
| Cookie consent banner | Find and click "Accept" or "Close" |
| Login required | Record as blocker — do not attempt to create accounts or guess credentials |
| CAPTCHA | Record as blocker — cannot solve programmatically |
| Paywall | Record as blocker |
| Geo-restricted content | Record as blocker |
| Site down / 404 | Record as error |
| Pop-up / overlay blocking interaction | Dismiss it, then continue |
After completing the task (or hitting a blocker):
date +%s%3N
Save as end_time_ms. Compute duration_ms = end_time_ms - start_time_ms.
Construct a structured trace of what happened:
{
"task_id": 42,
"actions": [
{"step": 1, "action": "navigate", "target": "https://acehardware.com", "result": "loaded"},
{"step": 2, "action": "click", "target": "News section link", "result": "navigated to /news"},
{"step": 3, "action": "extract", "target": "headline text", "result": "Black Friday Deals..."}
],
"final_url": "https://acehardware.com/news",
"blockers": [],
"extracted_data": "The headline is 'Black Friday Deals'...",
"duration_ms": 34200
}
Return the execution trace to the calling context (system prompt / run-benchmark skill). Do NOT write results to disk — that is the system prompt's responsibility after evaluation.