Help us improve
Share bugs, ideas, or general feedback.
From canary
Documents the Canary sandbox scripting API for browser automation: opening pages, clicking, filling forms, extracting text, taking screenshots, and persisting data between steps.
npx claudepluginhub wizenheimer/canary --plugin canaryHow this skill is triggered — by the user, by Claude, or both
Slash command
/canary:canary-scriptingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Canary scripts are plain **async JavaScript** run in a QuickJS sandbox with a Playwright-like API.
Drives a real browser for one-off tasks: navigate, click, fill, scrape, screenshot, and return results. Use for quick automation, scraping, form filling, or checking a site without recording.
Drives a real browser against live or staging websites. Navigate, click, type, snapshot, fill forms, and extract reliable selectors for multi-step flows (login, search, checkout, etc.).
Reference for agent-browser commands to navigate pages, snapshot elements, interact (click/fill/type), extract data. For web testing, form automation, screenshots.
Share bugs, ideas, or general feedback.
Canary scripts are plain async JavaScript run in a QuickJS sandbox with a Playwright-like API.
Both canary-browser run (one-off) and canary run --session (recorded step) execute the same way:
top-level await, with browser, console, and the file helpers available as globals.
goto, locator, evaluate, waitForSelector, screenshot)User says: "navigate to a site and get the title in canary" or "how do I read text off the page?"
Use a named page so it persists across steps, then goto and evaluate/locator. See Quick start.
User says: "click the login button", "fill the search box", "scrape the headlines"
page.locator(selector) then .click() / .fill(value) / .textContent(); or page.evaluate(fn) to pull structured data in one round-trip.
User says: "take a screenshot" or "what's the saveScreenshot signature?"
const buf = await page.screenshot({ fullPage: true }); await saveScreenshot(buf, "home.png"); — note buffer first, and that saveScreenshot is a top-level global, not browser.saveScreenshot.
User says: "I don't know the selectors", "what's on this page?", "explore before acting"
(await page.snapshotForAI()).full → an aria outline of the page. Read it to pick a role/text selector, then act. See Observing the page.
const page = await browser.getPage("main"); // named, persistent page
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
console.log(await page.title());
const headings = await page.evaluate(() =>
[...document.querySelectorAll("h1, h2")].map((h) => h.textContent.trim())
);
console.log(JSON.stringify(headings));
await page.locator("a.more").click();
const buf = await page.screenshot({ fullPage: false });
await saveScreenshot(buf, "page.png"); // saveScreenshot(buffer, name)
const page = await browser.getPage("main");
const snap = await page.snapshotForAI(); // { full, incremental? }
console.log(page.url(), await page.title());
console.log(snap.full); // aria outline — pick a role/text selector from this
// then act: await page.getByRole("button", { name: "Continue" }).click();
// after changes, page.snapshotForAI({ track: "main" }) returns just the incremental diff
page.snapshotForAI() returns { full, incremental? } — full is an aria outline of the
page: roles, accessible names, [ref=eN] markers on actionable nodes. Read it to pick a
semantic selector — page.getByRole("button", { name: "Continue" }),
page.getByText("Sign in") — then act.{ track?, depth?, timeout? }: re-run page.snapshotForAI({ track: "main" }) after
the page changes to get just the incremental diff; { depth: N } caps the tree on huge
pages; timeout bounds the walk.page.locator("aria-ref=e12") works for an immediate action in the same script only — refs go
stale across steps and after navigations. Prefer re-deriving a semantic selector.(await page.snapshotForAI()).full to see what
is there, pick a semantic selector from it (getByRole, getByText), then interact. Never
guess selectors blind.Every script gets these globals:
browser — pre-connected browser handle (see the script API)console — log / info / warn / error, captured per runsetTimeout / clearTimeout — basic timerssaveScreenshot(buffer, name) — save a screenshot buffer (async — await it)writeFile(name, data) / readFile(name) — small-file persistence (async — await them)newPage() tabs are closed when each script ends.writeFile("state.json", JSON.stringify(x)) in one step,
JSON.parse(await readFile("state.json")) in the next.references/REFERENCE.md).import/require. Inline any helpers.For the complete API — every page/locator/browser method, signatures, the per-step screenshot rule, and sandbox limits — see references/REFERENCE.md.