From vibin
This skill should be used when the user wants to drive, control, screenshot, run PowerShell on, install software on, or interact with the agent-os Windows 11 sandbox VM. Triggers include: "run X on agent-os", "screenshot agent-os", "PowerShell on agent-os", "spin up agent-os", "drive the windows VM", "the windows sandbox", or "winbox". Also applies for web-dev browser verification against the agent-os Chrome endpoint. Does not apply for the user's personal Windows on steamy (use the nircmd skill). The sandbox is Claude's alone — install software, change settings, run shells freely without asking for confirmation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vibin:agent-osThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A real Windows 11 desktop reserved for Claude, running on host `tootie` as the **`agent-os`** VM (container name `agent-os-win11`, image `dockur/windows`). "Winbox" is only the historical nickname; the skill name and official name are **agent-os**. Both `agent-os` and `winbox` remain trigger phrases for compatibility.
A real Windows 11 desktop reserved for Claude, running on host tootie as the agent-os VM (container name agent-os-win11, image dockur/windows). "Winbox" is only the historical nickname; the skill name and official name are agent-os. Both agent-os and winbox remain trigger phrases for compatibility.
Drive it through Windows-MCP (CursorTouch/Windows-MCP) — an MCP server installed inside the VM that exposes native click/type/shell/clipboard/filesystem/registry tools as mcp__windows-mcp__*.
The sandbox is Claude's. Install software, write to the registry, kill processes — don't ask first. Only think twice about actions that escape into host tootie (Docker daemon, mounted volumes, host network).
The previous version of this skill drove the VM through agent-browser against http://tootie:8006's noVNC canvas. That path still works as a visual fallback, but Windows-MCP is the new primary surface because:
Type reliably handles full strings including shifted symbols (!@#$%^&*(), :, ", etc.). The noVNC Shift+<digit> bug is gone.Snapshot returns interactive elements with ids — Claude can target controls by name, not by reasoning about pixels.PowerShell runs commands directly; no Win+R, no canvas focus juggling.press loop.Reach for noVNC at http://tootie:8006 only when you need to see the desktop visually for debugging (e.g. confirming a screenshot that Windows-MCP returned), or if Windows-MCP is unreachable.
When the task is web development, browser verification, screenshots, or page interaction, use this order unless the user explicitly asks for a specific machine or browser session:
Do not use Windows-MCP just because it is available when agent-browser can do a generic web check more directly. Do use Windows-MCP when the task depends on installed Windows software, the sandbox desktop, the filesystem, registry, native dialogs, or a browser profile inside agent-os.
Windows-MCP is exposed inside agent-os over HTTP + Bearer token and registered in ~/.claude.json as the windows-mcp server (Tailscale address; the bearer token lives in that config, not in this skill). Claude Code reaches it automatically — there is nothing to start.
If the server is unreachable: SSH into tootie, confirm the VM container is up (docker ps --format '{{.Names}}' | grep agent-os; the container name is agent-os-win11), and that the MCP service inside is listening. The container persists /storage across restarts so installed software survives.
SSH-only fallback for a native GUI. If neither Windows-MCP nor noVNC is reachable but ssh agent-os works, you can still launch + screenshot a native GUI — but only from the interactive console session (session 1). Plain SSH lands in session 0, which has no window station: a GUI .exe launched there crashes with This operation requires an interactive window station (os error 1459) and PowerShell CopyFromScreen returns a blank frame. Launch + capture via a schtasks /it scheduled task instead (prefix MSYS_NO_PATHCONV=1 so git-bash doesn't mangle the /flags); confirm the interactive session with ssh agent-os 'query user' (expect docker console 1 Active). Full recipe: plugins/testing/skills/desktop-app-testing/references/ssh-fallback-capture.md.
All tools are namespaced mcp__windows-mcp__<Name>. Use them directly — they're loaded when the windows-mcp server is up.
Screenshot — fast capture: returns cursor position, active/open windows, and a PNG. Use this before deciding what to do; it's the cheapest way to orient.Snapshot — full desktop state with interactive element ids (accessibility tree). Slower than Screenshot, but you get handles you can click by name instead of pixel coordinates. Use this whenever you'd otherwise be reasoning about coordinates.Click — click at (x, y) screen coordinates. Origin is top-left of the primary display.Move — move pointer to (x, y), or set drag=True for a drag-to.Scroll — vertical/horizontal scroll on the whole window or a region.MultiSelect — select multiple files/folders/checkboxes, optionally with Ctrl-held.Type — type text into a field. Handles full strings; takes an optional flag to clear existing text first. Use this instead of the old winbox_type per-char loop.Shortcut — keyboard shortcuts like Ctrl+c, Alt+Tab, Win+R. Use modifier names verbatim.MultiEdit — fill several input fields at once (each entry is (x, y, text)).App — launch an app from the Start menu, then optionally resize/move its window or switch between open apps. The fastest way to "open Edge" / "open Notepad".Process — list running processes or terminate by PID/name. Use to clean up runaways or check whether a service is alive.PowerShell — execute PowerShell. The single biggest speedup over the old GUI flow. Use this whenever the task can be expressed as a command: file ops, package installs (winget/choco), service queries, registry tweaks, network introspection.Clipboard — read or set Windows clipboard. Cleaner than typing for long/sensitive strings, and round-trips paste targets that don't accept Type.FileSystem — read/write/list files on the guest filesystem directly. Skip PowerShell for plain CRUD on files.Registry — read, write, delete, list registry values and keys. Use when a setting needs to persist or unlock a feature flag.Notification — send a Windows toast (title + message). Handy as a "I finished, look at me" cue when running long tasks.Wait — pause for N seconds. Use sparingly: prefer polling Screenshot/Snapshot over fixed sleeps. Useful right after App launch when the window hasn't painted yet.Scrape — pull text content from the current webpage (when a browser is foregrounded). Convenience for reading what's on screen in Edge/Chrome without OCR.App {"name": "Notepad"}
# wait until Notepad's title bar paints
Wait {"seconds": 1}
Type {"text": "hello from claude"}
Shortcut {"keys": "Ctrl+s"}
PowerShell {"command": "Get-Process | Where-Object {$_.CPU -gt 10} | Select-Object Name,CPU,Id -First 10 | ConvertTo-Json"}
You get stdout back as text. JSON-out makes the result trivial to parse. Use PowerShell for anything that's expressible as a command — it sidesteps every GUI hazard.
When a GUI app must be launched, driven, and screenshotted, run the automation through Windows-MCP's desktop-attached PowerShell, not through a plain SSH session. SSH can start the process, but it may not be able to foreground the app or send reliable synthetic input into the interactive desktop.
Use a PowerShell harness with WScript.Shell for launch/focus/input:
$ws = New-Object -ComObject WScript.Shell
$null = $ws.Run('"C:\path\to\app.exe"', 1, $false)
Start-Sleep -Seconds 2
$null = $ws.AppActivate('Window Title')
$ws.SendKeys('status{ENTER}')
Observed gotchas from the Axon Palette session:
Start-Process can launch the app but may leave synthetic keyboard input unreliable for GPUI windows. WScript.Shell.Run plus AppActivate worked better.Unblock-File on copied .exe/.ps1 files and set process env SEE_MASK_NOZONECHECKS=1 before launch when shelling into child executables.Snapshot/Screenshot fail because the Windows-MCP Python environment is missing cv2, capture screenshots inside PowerShell with System.Drawing.Graphics.CopyFromScreen and the target window rect from user32!GetWindowRect.SendKeys('<command>{ENTER}') was the reliable path for Axon Palette.Snapshot {} # returns elements with ids and labels
Click {"x": 412, "y": 287} # use coordinates returned for the element you want
Snapshot beats Screenshot whenever you need to interact — element coordinates come from the accessibility tree, not vision.
PowerShell {"command": "winget install --id Microsoft.PowerToys --silent --accept-package-agreements --accept-source-agreements"}
Clipboard {"action": "set", "text": "<your long or symbol-heavy string>"}
Click {"x": ..., "y": ...} # focus the field
Shortcut {"keys": "Ctrl+v"}
Notification {"title": "agent-os", "message": "winget install finished"}
Registry {"action": "write", "path": "HKCU\\Software\\YourApp", "name": "Setting", "type": "REG_SZ", "value": "x"}
When something looks wrong and you need eyeballs, the old path still works:
http://tootie:8006/vnc.html?autoconnect=1&resize=remoteagent-browser (see git history of this file for the canvas/dispatch helpers).Treat this strictly as a visual debugger. Once you've identified the problem, fix it through Windows-MCP — don't fall back into the per-char press loop just because noVNC is open.
These predate Windows-MCP but remain useful where the MCP path is awkward:
/oem install-time drop folder. Host path /home/jmagar/compose/windows/oem is mounted as \\host.lan\Data only during initial Windows install/OOBE. Once setup completes, the SMB share is gone. Use only for first-boot provisioning.tootie:33890. Exposed by the agent-os-win11 container in addition to noVNC. No agent-side RDP client installed today; install freerdp if a real interactive session is ever needed (now that Windows-MCP exists, the case for this is weaker).tootie:2222. The container forwards host port 2222 → guest port 22. Confirmed exposed on the running container; whether sshd is actually running and configured inside the guest depends on first-boot provisioning. If it answers, this is the cleanest scripted side-channel — no PowerShell round-trip through the MCP server.agent-os-win11 container persists /storage across restarts. Installed software, registry edits, and most filesystem state survive reboots.App/PowerShell, then ScheduleWakeup or move on. Come back with a Screenshot to check progress.PowerShell over clicking. It's faster, more reliable, and leaves a paper trail in the command rather than in pixel coordinates.Type — round-trip through Clipboard so they don't end up in screenshots or logs of the typing stream.npx claudepluginhub jmagar/dendrite --plugin vibinCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.