From plugin-windows-mcp
This skill should be used when the user asks to "take a screenshot", "capture the screen", "identify elements on screen", "get a snapshot", "extract DOM content", "find interactive elements", "analyze screen state", "what is on the screen", or needs to observe and analyze the current Windows desktop state using Screenshot or Snapshot tools.
npx claudepluginhub mustafaakben/plugin-windows-mcp --plugin plugin-windows-mcpThis skill uses the workspace's default tool permissions.
This skill covers the two observation tools — Screenshot and Snapshot — which are the foundation of every automation workflow. Before acting on the screen, always observe it first.
Verifies tests pass on completed feature branch, presents options to merge locally, create GitHub PR, keep as-is or discard; executes choice and cleans up worktree.
Guides root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Writes implementation plans from specs for multi-step tasks, mapping files and breaking into TDD bite-sized steps before coding.
This skill covers the two observation tools — Screenshot and Snapshot — which are the foundation of every automation workflow. Before acting on the screen, always observe it first.
| Criteria | Screenshot | Snapshot |
|---|---|---|
| Speed | Fast | Slower |
| Visual image | Yes | Yes |
| Cursor position | Yes | Yes |
| Active window info | Yes | Yes |
| Interactive element IDs | No | Yes |
| Element coordinates | No | Yes |
| Scrollable regions | No | Yes |
| DOM extraction | No | Yes (with use_dom) |
Decision rule:
ScreenshotSnapshotSnapshot with use_dom=TrueFast desktop capture returning a visual image, cursor position, and active window information.
Takes no parameters — captures the entire desktop.
When to use:
Output includes:
Full state capture with interactive element identification.
Key parameter: use_dom (boolean, default: false)
When to use:
Click, Type, or interaction — to locate targetsOutput includes:
use_dom=True)use_dom=True)Enable DOM extraction when automating web browsers. This extracts the page's HTML structure, enabling:
Performance note: DOM extraction adds latency. Only enable when specifically needed for web automation.
Snapshot -> scan element list for target button
-> note its (x, y) coordinates
-> Click(x, y)
Snapshot -> scan element list for input/text field elements
-> identify the field by its label or position
-> Click(field_x, field_y) -> Type(text="value")
Perform action -> Screenshot
-> check if expected visual change occurred
-> if not, retry or take a different approach
Snapshot -> enumerate all elements
-> categorize by type (buttons, fields, labels)
-> identify the target interaction area
-> plan the sequence of actions
Snapshot(use_dom=True) -> read DOM content
-> extract text, links, or structured data
-> process the extracted information
use_dom=True when web content analysis is specifically requiredMultiple monitors: Screenshot and Snapshot capture the primary display. To interact with secondary monitors, specify coordinates in that monitor's range.
Overlapping windows: The Snapshot returns elements for all visible windows. Focus the target window first using App(switch) before taking a snapshot.
Dynamic content: Elements that load asynchronously may not appear immediately. Use Wait(0.5-1s) before Snapshot if content is still loading.
High DPI displays: Coordinates are in screen pixels. The tools handle DPI scaling automatically.
For advanced capture techniques and analysis workflows:
references/analysis-guide.md — Advanced screen analysis, multi-window strategies, and element identification patterns