Help us improve
Share bugs, ideas, or general feedback.
From claude-code-expert
Guides GUI automation with computer use: when to use over shell/MCP/browser tools, visual validation for native apps, regression workflows, and verification patterns.
npx claudepluginhub markus41/claude --plugin claude-code-expertHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-code-expert:computer-useclaude-opus-4-6This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Computer use lets Claude interact with GUIs: click buttons, fill forms, take screenshots, and navigate native apps. This is powerful but expensive and slow — use it only when a more precise tool doesn't exist.
Automates GUI interactions via screen capture, mouse clicks, typing, scrolling for UI testing, visual verification, and non-browser apps. Bridges Playwright to user browsers using extensions or CDP endpoints.
Automates desktop GUI workflows using Claude's Computer Use API for screenshot capture, mouse/keyboard control. Useful for GUI testing, form filling, and visual app interactions without CLI.
Verifies UI changes visually for mobile (Maestro), web (Playwright), and macOS (Peekaboo). Includes availability probes, path selection, screenshot capture, and dev-server preflight.
Share bugs, ideas, or general feedback.
Computer use lets Claude interact with GUIs: click buttons, fill forms, take screenshots, and navigate native apps. This is powerful but expensive and slow — use it only when a more precise tool doesn't exist.
Before reaching for computer use, exhaust these options first:
| Task | Prefer This | Over Computer Use |
|---|---|---|
| API endpoint testing | Bash + curl | Clicking through UI |
| Database inspection | MCP postgres/sqlite | Navigating admin UI |
| File operations | Read/Write/Edit | Drag-and-drop UI |
| Web scraping | Firecrawl MCP | Screenshot + parse |
| Browser automation | Playwright MCP | Computer use click |
| CI status | GitHub API / gh CLI | Browser navigation |
| Log inspection | Bash + grep | Terminal screenshot |
Rule: If you can express the task as a shell command or API call, do that. Computer use is the fallback for GUI-only workflows.
Testing a desktop app that has no API or CLI interface.
# Example: Validate Electron app UI after a build
Take a screenshot of the app after launch.
Click the "New Project" button.
Verify the dialog opens with the correct fields.
Fill in project name: "Test Project 2026"
Click Create and verify the project appears in the list.
Detecting layout regressions that unit tests can't catch.
# Workflow:
1. Take baseline screenshot of the current UI state
2. Apply the change
3. Take comparison screenshot
4. Highlight pixel differences > 1%
5. Human reviews diff
Admin panels, legacy enterprise software, and embedded UIs with no API.
# Example: Generate a report from a legacy admin panel
Navigate to: http://admin.internal/reports
Click: "Export" → "CSV" → "Last 30 days"
Wait for download
Move file to: /tmp/report-{date}.csv
Mobile simulator or desktop app testing that requires visual interaction.
# Example: iOS simulator validation
Launch: xcrun simctl launch booted com.example.MyApp
Take screenshot
Verify: "Welcome" text is visible in the header
Tap: "Get Started" button (coordinates or element description)
Verify: onboarding screen loads
Computer use output is inherently visual and unstructured. Always verify results with a structured check after GUI actions:
After each GUI action:
1. Take a screenshot
2. Verify the expected visual state (specific text, element position, color)
3. If verification fails: log "FAIL: {what was expected vs. what was seen}"
4. If unsure: take another screenshot from a wider viewport
At the end:
- List each action and its verification result
- Count: {N} actions taken, {M} verified OK, {K} failed
| Confidence | Verification | Action |
|---|---|---|
| HIGH | Text matches exactly / element found by ID | Proceed |
| MEDIUM | Visual match but element found by position | Log and proceed |
| LOW | Can't find element / ambiguous screenshot | Stop, report to human |
Computer use can cause irreversible actions (delete files, send emails, submit forms). Apply these guardrails:
Keep screenshots of:
For complex GUI flows, describe the steps and ask for confirmation before executing:
Before I click "Submit", here's what will happen:
- Form data: {summary}
- This action cannot be undone
- Proceeding? (yes/no)
For web UIs, Playwright MCP is almost always better than computer use:
| Playwright MCP | Computer Use | |
|---|---|---|
| Reliability | High (DOM-based) | Medium (pixel-based) |
| Speed | Fast | Slow (screenshot per action) |
| Testability | Scriptable, repeatable | Hard to reproduce exactly |
| Cost | Low | High (vision model per screenshot) |
| Works on | Web browsers | Any visual surface |
Use Playwright MCP for: Web app testing, scraping, form automation on websites.
Use Computer Use for: Native desktop apps, embedded UIs, legacy apps with no API.
Computer use is expensive:
Estimate before using: If a GUI flow has N steps, expect N × (screenshot tokens + generation tokens). For flows > 20 steps, consider whether a shell/API approach exists.
Computer use requires the Claude Desktop app (not CLI or Web). The Desktop app has the screen capture and input simulation capabilities that CLI lacks.
CLI: ❌ Computer use not available
Web: ❌ Computer use not available
Desktop: ✅ Computer use available