From aradotso-trending-skills-37
Controls macOS applications with Pi agents using semantic Accessibility API targets and optional screenshots for GUI automation including clicks, typing, scrolling, and window management.
npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37This skill uses the workspace's default tool permissions.
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Skill by ara.so — Daily 2026 Skills collection.
pi-computer-use gives Pi agents a semantic computer-use surface for visible macOS windows. It prefers Accessibility (AX) targets (like @e1) over raw coordinates, returns semantic state after every action, and attaches screenshots only when AX coverage is too weak.
pi install git:github.com/injaneity/pi-computer-use#v0.2.1
Pin to a specific version:
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1
npm install @injaneity/pi-computer-use
# or pin a version
npm install @injaneity/pi-computer-use@0.2.1
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-use
On first session, macOS will prompt for permissions for:
~/.pi/agent/helpers/pi-computer-use/bridge
Grant both:
Three components:
extensions/computer-use.ts) — registers public tools and /computer-use commandsrc/bridge.ts) — manages window state, AX refs, fallback policy, batching, execution metadatanative/macos/bridge.swift) — talks to macOS Accessibility, ScreenCaptureKit, AppKit, CoreGraphics| Tool | Purpose |
|---|---|
list_apps | List running apps |
list_windows | List windows for an app |
screenshot | Capture window + return AX state |
click | Click element or coordinate |
double_click | Double-click element or coordinate |
move_mouse | Move cursor |
drag | Drag from point to point |
scroll | Scroll element or coordinate |
keypress | Press key combination |
type_text | Type raw text |
set_text | Replace element value via AX |
wait | Pause execution |
arrange_window | Position/resize window |
computer_actions | Batch multiple actions |
Always start a session with screenshot to select the controlled window and obtain AX refs:
// 1. Discover apps and windows if target is ambiguous
list_apps()
list_windows({ app: "Safari" })
// 2. Select the window and get AX state
screenshot({ window: "@w1" })
// 3. Act on AX refs returned from screenshot
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })
AX refs like @e1, @e2 are returned by screenshot and carry capability metadata:
canSetValue — supports set_textcanPress — supports clickcanFocus — can receive focuscanScroll — supports scrolladjust — supports value adjustment// Click by AX ref — no coordinates needed
click({ ref: "@e1" })
// Scroll a specific element
scroll({ ref: "@e3", scrollY: 600 })
// Replace text field value atomically
set_text({ ref: "@e2", text: "hello world" })
Use coordinates only when no suitable AX target exists. Always include stateId from the latest screenshot to guard against stale state:
click({ x: 320, y: 180, stateId: "abc123" })
Use computer_actions to batch obvious sequential steps. One semantic state update is returned after all actions:
computer_actions({
stateId: "abc123",
actions: [
{ type: "click", ref: "@e1" },
{ type: "set_text", ref: "@e2", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})
Each action in the result includes execution metadata:
stealth — background-safe AX path (no focus takeover)default — required focus or raw event fallback// List windows for a specific app
list_windows({ app: "Finder" })
// Target a specific window in all subsequent calls
screenshot({ window: "@w2" })
// Arrange window by preset
arrange_window({ window: "@w1", preset: "left-half" })
// Arrange window with explicit frame
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })
Control when screenshots are attached with the image option:
screenshot({ window: "@w1", image: "auto" }) // default: attach when AX coverage is weak
screenshot({ window: "@w1", image: "always" }) // always attach
screenshot({ window: "@w1", image: "never" }) // never attach, AX state only
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = address bar (from AX state)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })
screenshot({ window: "@w1" })
// Use refs from AX state
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // Submit button
keypress({ keys: ["Cmd", "T"] }) // New tab
keypress({ keys: ["Cmd", "Shift", "N"] }) // New incognito window
keypress({ keys: ["Escape"] })
scroll({ ref: "@e2", scrollY: 800 }) // Scroll element down
scroll({ ref: "@e2", scrollY: -400 }) // Scroll up
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })
Enable strict AX mode to prevent focus changes, raw pointer events, raw keyboard events, and cursor takeover. All actions must succeed via background-safe AX paths:
// Via config (see Configuration section)
// Actions will report `stealth` in execution metadata when successful
Strict mode errors will surface if an action requires foreground focus and strict mode is active.
Inspect effective config in Pi:
/computer-use
Config can be set via config files or environment variable overrides. Key options:
| Option | Description |
|---|---|
image | "auto" | "always" | "never" — screenshot attachment mode |
strictAX | Enable background-safe strict AX mode |
browser | Browser-aware targeting preference |
See docs/configuration.md for full config file format and environment variable overrides.
# Install dependencies
npm install
# Run checks
npm test
# Run local checkout without loading installed copy
pi --no-extensions -e .
# Default QA benchmark
npm run benchmark:qa
# Full benchmark (may open apps)
npm run benchmark:qa:full
See benchmarks/README.md for metrics, regression policy, and comparison workflow.
Re-run and grant both Accessibility and Screen Recording to:
~/.pi/agent/helpers/pi-computer-use/bridge
On macOS, go to System Settings → Privacy & Security → Accessibility and Screen Recording.
Take a fresh screenshot to get updated stateId and new refs before acting. Stale-action detection uses stateId to reject outdated coordinates or refs.
Use list_windows({ app: "Safari" }) (or Chrome/Firefox) first, then explicitly pass window: "@wN" to screenshot and subsequent actions.
An action failed to complete via background-safe AX path. Either disable strict mode or identify an AX ref with canPress/canSetValue that supports the background path.
Ensure Pi installed the native helper:
ls ~/.pi/agent/helpers/pi-computer-use/bridge
If missing, reinstall: pi install git:github.com/injaneity/pi-computer-use#v0.2.1
@e1, @e2, …) — semantic element handles from macOS Accessibility API, stable within a state@w1, @w2, …) — stable handles from list_windows