Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.
/plugin marketplace add numman-ali/n-skills/plugin install zai-cli@n-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
package.jsonreferences/scraping.mdscripts/start-relay.tsscripts/start-server.tsserver.shsrc/client.tssrc/index.tssrc/relay.tssrc/snapshot/__tests__/snapshot.test.tssrc/snapshot/browser-script.tssrc/snapshot/index.tssrc/snapshot/inject.tssrc/types.tstsconfig.jsonvitest.config.tsBrowser automation that maintains page state across script executions. Write small, focused scripts to accomplish tasks incrementally. Once you've proven out part of a workflow and there is repeated work to be done, you can write a script to do the repeated work in a single execution.
getAISnapshot() to discover elements and selectSnapshotRef() to interact with themTwo modes available. Ask the user if unclear which to use.
Launches a new Chromium browser for fresh automation sessions.
./skills/dev-browser/server.sh &
Add --headless flag if user requests it. Wait for the Ready message before running scripts.
Connects to user's existing Chrome browser. Use this when:
Important: The core flow is still the same. You create named pages inside of their browser.
Start the relay server:
cd skills/dev-browser && npm i && npm run start-extension &
Wait for Waiting for extension to connect... followed by Extension connected in the console. To know that a client has connected and the browser is ready to be controlled.
Workflow:
client.page("name") just like the normal mode to create new pages / connect to existing ones.If the extension hasn't connected yet, tell the user to launch and activate it. Download link: https://github.com/SawyerHood/dev-browser/releases
Run all scripts from
skills/dev-browser/directory. The@/import alias requires this directory's config.
Execute scripts inline using heredocs:
cd skills/dev-browser && npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
// Create page with custom viewport size (optional)
const page = await client.page("example", { viewport: { width: 1920, height: 1080 } });
await page.goto("https://example.com");
await waitForPageLoad(page);
console.log({ title: await page.title(), url: page.url() });
await client.disconnect();
EOF
Write to tmp/ files only when the script needs reuse, is complex, or user explicitly requests it.
"checkout", "login", not "main"await client.disconnect() - pages persist on serverpage.evaluate() runs in browser - no TypeScript syntaxFollow this pattern for complex tasks:
Code passed to page.evaluate() runs in the browser, which doesn't understand TypeScript:
// ✅ Correct: plain JavaScript
const text = await page.evaluate(() => {
return document.body.innerText;
});
// ❌ Wrong: TypeScript syntax will fail at runtime
const text = await page.evaluate(() => {
const el: HTMLElement = document.body; // Type annotation breaks in browser!
return el.innerText;
});
For scraping large datasets, intercept and replay network requests rather than scrolling the DOM. See references/scraping.md for the complete guide covering request capture, schema discovery, and paginated API replay.
const client = await connect();
// Get or create named page (viewport only applies to new pages)
const page = await client.page("name");
const pageWithSize = await client.page("name", { viewport: { width: 1920, height: 1080 } });
const pages = await client.list(); // List all page names
await client.close("name"); // Close a page
await client.disconnect(); // Disconnect (pages persist)
// ARIA Snapshot methods
const snapshot = await client.getAISnapshot("name"); // Get accessibility tree
const element = await client.selectSnapshotRef("name", "e5"); // Get element by ref
The page object is a standard Playwright Page.
import { waitForPageLoad } from "@/client.js";
await waitForPageLoad(page); // After navigation
await page.waitForSelector(".results"); // For specific elements
await page.waitForURL("**/success"); // For specific URL
await page.screenshot({ path: "tmp/screenshot.png" });
await page.screenshot({ path: "tmp/full.png", fullPage: true });
Use getAISnapshot() to discover page elements. Returns YAML-formatted accessibility tree:
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- main:
- list:
- listitem:
- link "Article Title" [ref=e8]
- link "328 comments" [ref=e9]
- contentinfo:
- textbox [ref=e10]
- /placeholder: "Search"
Interpreting refs:
[ref=eN] - Element reference for interaction (visible, clickable elements only)[checked], [disabled], [expanded] - Element states[level=N] - Heading level/url:, /placeholder: - Element propertiesInteracting with refs:
const snapshot = await client.getAISnapshot("hackernews");
console.log(snapshot); // Find the ref you need
const element = await client.selectSnapshotRef("hackernews", "e2");
await element.click();
Page state persists after failures. Debug with:
cd skills/dev-browser && npx tsx <<'EOF'
import { connect } from "@/client.js";
const client = await connect();
const page = await client.page("hackernews");
await page.screenshot({ path: "tmp/debug.png" });
console.log({
url: page.url(),
title: await page.title(),
bodyText: await page.textContent("body").then((t) => t?.slice(0, 200)),
});
await client.disconnect();
EOF
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.