From plugin-windows-mcp
This skill should be used when the user asks to "click a button", "type text", "scroll a page", "move the mouse", "press a keyboard shortcut", "interact with GUI elements", "automate mouse clicks", "fill in a text field", "use keyboard shortcuts on Windows", or needs to perform any direct screen interaction using Windows MCP tools.
How this skill is triggered — by the user, by Claude, or both
Slash command
/plugin-windows-mcp:screen-interactionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill covers the six tools for direct GUI interaction: Click, Type, Scroll, Move, Shortcut, and Wait. These tools form the foundation of all visual Windows automation.
This skill covers the six tools for direct GUI interaction: Click, Type, Scroll, Move, Shortcut, and Wait. These tools form the foundation of all visual Windows automation.
Never click, type, or scroll blindly. Always capture the screen state first:
Snapshot to get interactive element IDs and coordinatesScreenshot or another SnapshotPerform mouse clicks at screen coordinates.
Key parameters: x, y (required), button (left/right/middle), double_click (boolean)
Patterns:
Workflow:
Snapshot -> identify element coordinates -> Click(x, y) -> Screenshot to verify
Input text into the currently focused element.
Key parameters: text (required), clear (boolean — clears field before typing)
Patterns:
Click on field, then Type(text="value")Click on field, then Type(text="new value", clear=true)Important: Always ensure the target field has focus (via Click) before calling Type. Without focus, text may be typed into the wrong element.
Scroll vertically or horizontally within a window or region.
Key parameters: direction (up/down/left/right), amount, x/y (coordinates)
Patterns:
x/y coordinates inside that panelamount valuesamount, then verify with ScreenshotBest practice: Specify coordinates to target a particular scrollable area. Without coordinates, scrolling applies to the window under the cursor.
Move the mouse pointer, with optional drag support.
Key parameters: x, y (required), drag (boolean)
Patterns:
Move(x, y) — triggers tooltips, hover menus, highlightsClick at source, then Move(x, y, drag=true) to destinationClick on slider handle, Move with drag=trueExecute keyboard shortcuts — often more reliable than GUI clicks.
Key parameters: keys (required — e.g., "ctrl+c", "alt+tab")
Essential Windows shortcuts:
| Shortcut | Action |
|---|---|
ctrl+c / ctrl+v | Copy / Paste |
ctrl+z / ctrl+y | Undo / Redo |
ctrl+s | Save |
ctrl+a | Select all |
alt+tab | Switch window |
alt+f4 | Close window |
win+d | Show desktop |
win+e | Open File Explorer |
ctrl+shift+esc | Task Manager |
win+r | Run dialog |
ctrl+w | Close tab |
ctrl+t | New tab (browsers) |
When to prefer Shortcut over Click:
ctrl+s)alt+tab)ctrl+c, ctrl+v, ctrl+a)Pause execution to allow the UI to catch up.
Key parameter: duration (seconds)
Recommended wait times:
| Scenario | Duration |
|---|---|
| Between rapid UI actions | 0.2-0.5s |
| After clicking a menu item | 0.3-0.5s |
| After switching tabs/windows | 0.5-1s |
| After launching an application | 1-3s |
| After triggering a dialog/popup | 0.5-1s |
| After saving a large file | 1-2s |
Prefer verification over long waits. Instead of Wait(5), use Wait(1) + Screenshot to check if the UI has updated, then proceed or wait more.
Snapshot -> identify fields
Click(field1_x, field1_y) -> Type(text="value1")
Click(field2_x, field2_y) -> Type(text="value2")
Click(submit_x, submit_y)
Screenshot -> verify submission
Click(menu_x, menu_y) -> Wait(0.3)
Snapshot -> identify menu items
Click(item_x, item_y)
App(switch to source) -> Wait(0.5)
Shortcut("ctrl+a") -> Shortcut("ctrl+c")
App(switch to target) -> Wait(0.5)
Click(target_field_x, target_field_y)
Shortcut("ctrl+v")
Snapshot -> identify source and destination
Click(source_x, source_y)
Move(dest_x, dest_y, drag=true)
Screenshot -> verify result
Shortcut("ctrl+z") to undo if possible, then retryShortcut("ctrl+z"), click the correct field, retryWait(0.3) before clicking itemsSnapshot to identify dialog buttons, dismiss or respondFor detailed interaction patterns and advanced techniques:
references/patterns.md — Common multi-step interaction patterns, edge cases, and troubleshootingnpx claudepluginhub mustafaakben/plugin-windows-mcp --plugin plugin-windows-mcpAutomates GUI interactions via screen capture, mouse clicks, typing, scrolling for UI testing, visual verification, and non-browser apps. Bridges Playwright to user browsers using extensions or CDP endpoints.
Controls desktop GUI as a fallback when APIs, CLIs, file editing, and browser automation are unavailable or have failed. Clicks, types, reads screen, and drives native apps on Windows/macOS/Linux.
Controls macOS GUI applications via mouse automation, keyboard input, screenshots, image recognition, and AppleScript execution.