From latestaiagents
Build browser/desktop automation agents using Claude's Computer Use capability — screen-taking, clicking, typing. Covers the reference container, virtualization safety, task decomposition, and when to use computer-use vs API integration. Use this skill when building agents that operate GUIs (browsers, legacy apps), automating workflows without APIs, or QA/testing agents. Activate when: Claude computer use, browser automation, desktop agent, screen control, computer_20250124, click and type agent.
npx claudepluginhub latestaiagents/agent-skills --plugin skills-authoringThis skill uses the workspace's default tool permissions.
**Computer Use lets Claude control a virtual computer — take screenshots, move cursor, click, type. Use it when there's no API, not as a first resort.**
Runs bounded UI automation loops in Claude Code using screenshot→decide→act cycles with click/type/scroll tools, safety guardrails, and step limits for desktop/browser control.
Builds AI agents for human-like computer interaction via screen viewing, cursor control, clicking, and typing. Covers Anthropic Computer Use, OpenAI Operator/CUA, open-source with sandboxing/security focus.
Automates desktop GUI workflows using Claude's Computer Use API for screenshot capture, mouse/keyboard control. Useful for GUI testing, form filling, and visual app interactions without CLI.
Share bugs, ideas, or general feedback.
Computer Use lets Claude control a virtual computer — take screenshots, move cursor, click, type. Use it when there's no API, not as a first resort.
const tools = [
{ type: "computer_20250124", name: "computer", display_width_px: 1280, display_height_px: 800, display_number: 1 },
{ type: "text_editor_20250124", name: "str_replace_editor" }, // optional
{ type: "bash_20250124", name: "bash" }, // optional
];
const response = await client.beta.messages.create(
{
model: "claude-sonnet-4-6",
max_tokens: 4096,
tools,
messages: [{ role: "user", content: "Open the browser and find Claude's API pricing page." }],
},
{ headers: { "anthropic-beta": "computer-use-2025-01-24" } },
);
The model returns tool_use blocks with actions like screenshot, left_click, type, key, mouse_move, scroll.
Anthropic ships a reference Docker container (ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest) with:
Don't run this on a host with real user data. It's a demo. For production, use a sandboxed cloud VM (Firecracker, Vercel Sandbox, cloud-hypervisor).
async function run(goal: string) {
const messages = [{ role: "user", content: goal }];
while (true) {
const response = await client.beta.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 4096,
tools,
messages,
}, { headers: { "anthropic-beta": "computer-use-2025-01-24" } });
messages.push({ role: "assistant", content: response.content });
if (response.stop_reason !== "tool_use") return response;
const results = [];
for (const block of response.content) {
if (block.type === "tool_use") {
const output = await executeAction(block.name, block.input); // your VM controller
results.push({
type: "tool_result",
tool_use_id: block.id,
content: output.screenshot ? [{ type: "image", source: output.screenshot }] : output.text,
});
}
}
messages.push({ role: "user", content: results });
}
}
Every action returns a fresh screenshot so the model sees the result. Image tokens add up fast.
Computer Use is slow and error-prone on long sequences. Break work into sub-goals:
Bad: "Go to Example.com, find the pricing page, fill out the demo form with fake data, submit."
Good: Step 1: "Navigate to example.com/pricing"
Step 2: "Locate the demo request button"
Step 3: "Fill the form: name=..., email=..."
Step 4: "Submit"
Verify each step's screenshot before moving on. Your app orchestrates — the model doesn't need to do everything in one loop.
Don't use Computer Use for high-volume work. It's a specialty tool.
The biggest threat: the agent reads text on screen, some of which is attacker-controlled (email content, web pages). Injection can redirect it to click "Send $1000 to X".
Mitigations: