From soloflow
Verifies UI changes visually for mobile (Maestro), web (Playwright), and macOS (Peekaboo). Includes availability probes, path selection, screenshot capture, and dev-server preflight.
How this skill is triggered — by the user, by Claude, or both
Slash command
/soloflow:visual-verifyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides patterns for visually verifying UI changes. For mobile, SoloFlow **prefers Maestro MCP** and automatically falls back to the **Maestro CLI** when MCP is unreachable. For web, it uses the **Playwright MCP** server (no CLI fallback). For native macOS apps, it **prefers Peekaboo MCP** and falls back to the **Peekaboo CLI** when MCP is unreachable.
This skill provides patterns for visually verifying UI changes. For mobile, SoloFlow prefers Maestro MCP and automatically falls back to the Maestro CLI when MCP is unreachable. For web, it uses the Playwright MCP server (no CLI fallback). For native macOS apps, it prefers Peekaboo MCP and falls back to the Peekaboo CLI when MCP is unreachable.
Pick a path once per verification run — never mix MCP and CLI Maestro calls in the same run (both own port 7001; mixing causes contention). The same single-path rule applies to Peekaboo even though it has no port collision: two concurrent UI driver calls against the same window race regardless of transport.
Before any Maestro path selection or device probe, check whether the project's dev server (Metro for React Native, Vite for web app shells, etc.) is reachable. The most common reason a visual_mobile run produces nothing actionable is that Metro is offline — the dev-client launches into the Expo Dev Launcher screen and can't load the task's JS changes. The verifier used to discover this only after a full Maestro setup chain (MCP probe → simulator wakefulness → device pick → screenshot), then emit a generic "Metro bundler offline" finding per task. Short-circuit that here with a single config-driven probe.
Run:
node "${CLAUDE_PLUGIN_ROOT}/scripts/sprint/probe-dev-server.js" --probe-only
Parse the JSON online field:
{ "online": null, "skipped": true } — verification.dev_server.enabled is false. The preflight is opt-in; continue to Path Selection below. Existing behavior unchanged.{ "online": true } — dev server reachable. Continue to Path Selection.{ "online": false, ... } — dev server offline. Stop immediately:
visual_mobile: skipped_metro_offline in the verification result (replaces the old skipped_unable for this specific failure mode — surfaces in done-report frontmatter and rolls up in sprint-closer's per-task mobile bucket).{name} unreachable at {probe_url} — start it (e.g., \npx expo start --dev-client`) before running visual verification.(readnameandprobe_urlfromverification.dev_serverconfig or rerun the probe without--probe-only` to get them).This preflight runs only on the visual_mobile path. The web (Playwright) path has its own availability section below and does not depend on the dev server (Playwright launches its own browser).
Before the platform-based Path Selection below, check whether the project is a Chromium-driveable target and the user has opted into routing UI verification through Playwright. For Electron/Tauri the Playwright _electron API drives the actual shipped renderer; for Expo Web / Capacitor a file-pattern guard skips the preference when the task touches native-divergent code so iOS/Android-only regressions are not masked.
Resolve the toggle. Run:
node "${CLAUDE_PLUGIN_ROOT}/scripts/config/resolve.js" \
--key verification.visual_prefer_playwright --fallback false
If the value resolves to anything other than true, skip this pre-step entirely and fall through to the platform-based Path Selection below.
Read the cached project type. The per-sprint detection runs once in sprint-initiator and stashes the result in .soloflow/active/sprints/{sprint.id}/sprint.json under playwright_target. Read it:
node -e 'const p=JSON.parse(require("fs").readFileSync(process.argv[1])); console.log(JSON.stringify(p.playwright_target||null));' .soloflow/active/sprints/{sprint.id}/sprint.json
If the field is missing or kind is null → fall through to Path Selection. The project isn't Playwright-driveable; preference is a no-op.
Check visual_web and Playwright tool availability. Resolve verification.visual_web (fallback false). If false, OR mcp__playwright__* is not in your available-tools list, OR a lightweight Playwright probe errors → emit ONE config-gap queue entry (see §"Unavailable-but-preferred" below) and fall through to Path Selection.
CLAUDE.md E2E gate override (precedence). Native-verification gates in the project's CLAUDE.md win over the Playwright preference. If the current task's files_owned overlaps any file the project's CLAUDE.md E2E Verification Gates section mandates for Maestro / Peekaboo verification, skip the preference and fall through. The native gate must be honored.
Native-divergence guard (Expo Web / Capacitor only). If playwright_target.kind is expo-web or capacitor AND any file in files_owned matches one of:
*.ios.{ts,tsx,js,jsx}*.android.{ts,tsx,js,jsx}*.native.{ts,tsx,js,jsx}Platform from react-native, OR react-native-gesture-handler, OR any of expo-camera, expo-notifications, expo-local-authentication, expo-secure-store, expo-linking→ skip the preference (fall through to platform-based selection). These cases need the native driver to catch regressions in branches that don't exist on web. Electron and Tauri are exempt — the renderer Playwright drives IS the renderer that ships, so there is no divergence to worry about.
Commit the path. If all gates passed, set USE_PLAYWRIGHT=true and PLAYWRIGHT_TARGET={kind} for the rest of the run. Skip the Maestro / Peekaboo path selection entirely. Continue to Playwright (Web) Availability below for the final lightweight probe, then run the verification through mcp__playwright__*. Report visual_mobile / visual_macos as skipped_by_preference (with the kind in the reason) if the project's visual_mobile / visual_macos toggles were true — see verifier outcome classification.
When visual_prefer_playwright=true AND playwright_target.kind is non-null AND a Playwright availability check fails (e.g., visual_web=false, mcp__playwright__* unbound, or npx missing), emit ONE entry to .soloflow/human-review-queue.md with dedup_key: visual_prefer_playwright_unavailable and severity: low, then fall through to platform-based selection (don't double-skip). The dedup_key collapses multi-task sprints to one row.
Playwright drives the binary directly via _electron.launch({ args: [<main path>] }) — no dev server. Resolve the main path in this order:
verification.visual_electron_main (config).package.json#main.out/main.js, dist/main.js, electron/main.js (first existing wins).If none resolves, emit skipped_unable with reason "could not locate Electron main; set verification.visual_electron_main" and the verifier proceeds to Level 3.
Before running any mobile verification, probe which path to use and record the decision for the rest of the run:
mcp__maestro__* availability. If the tool surface is bound to this session (your available-tools list contains mcp__maestro__list_devices, mcp__maestro__run_flow, etc.) AND a lightweight call succeeds, set USE_MAESTRO_MCP=true.
mcp__maestro__list_devices. It returns quickly and surfaces any device/automation problems up-front.USE_MAESTRO_MCP=false and continue to step 2.which maestro via Bash. If found, probe for a booted device:
IOS=$(xcrun simctl list devices booted 2>/dev/null | grep -c Booted || true)
AND=$(adb devices 2>/dev/null | awk '$2=="device"' | wc -l | tr -d ' ' || true)
If at least one device is booted, use the Maestro CLI Patterns (fallback) section below.skipped_unable with reason "mcp__maestro__* not bound and CLI not installed / no device booted" and proceed. (The verifier agents know to escalate this via the config-gap recipe.)Why the single-decision model: maestro mcp (the MCP server) and maestro test (the CLI) both bind port 7001. Switching paths mid-run — or running an MCP call while a CLI call is in flight — causes unpredictable failures. Choose the path up front and stay on it.
Playwright has no CLI fallback.
which npx via Bash. If not found, skip web verification.mcp__playwright__* probe (e.g., a noop browser_install check). If the MCP server is unreachable, skip web verification.Before running any macOS verification, probe which path to use and record the decision for the rest of the run. Same single-path rule as Maestro:
mcp__peekaboo__* availability. If the tool surface is bound to this session (mcp__peekaboo__see, mcp__peekaboo__click, etc. in your available-tools list) AND a lightweight call succeeds, set USE_PEEKABOO_MCP=true.
mcp__peekaboo__permissions. It returns quickly and surfaces missing Accessibility / Screen Recording grants up-front.USE_PEEKABOO_MCP=false and continue to step 2.which peekaboo via Bash. If found, probe permissions:
peekaboo permissions
If the output reports both Accessibility and Screen Recording as granted, use the Peekaboo CLI Patterns (fallback) below.skipped_unable with reason "mcp__peekaboo__* not bound and peekaboo CLI not installed or required permissions not granted" and proceed. (The verifier agents know to escalate this via the config-gap recipe.)Permission grants. Peekaboo (either path) needs the parent process (Terminal / Claude Code) granted both Accessibility and Screen Recording in System Settings → Privacy & Security. Without them, every action errors. Grants survive process restarts but must be re-applied after major macOS updates.
All MCP interactions go through mcp__maestro__* tools. Most interaction tools require a device_id you obtain from list_devices or start_device.
mcp__maestro__list_devices()
→ returns an array of devices (iOS simulators, Android emulators, physical devices).
Pick the first booted device's device_id. If none are booted, call mcp__maestro__start_device (optionally passing platform: "ios" or "android") to boot one and capture the returned device_id.
Cache this device_id locally for the rest of the run — every subsequent call needs it.
inspect_view_hierarchymcp__maestro__inspect_view_hierarchy(device_id)
→ returns the current screen's hierarchy as CSV (≈50 tokens — much cheaper than the CLI's plain-text dump at 200–600 tokens).
Use this for: confirming buttons exist, checking text content, verifying layout structure, reading accessibility labels.
take_screenshotScreenshots only when acceptance criteria require checking visual appearance (colors, images, animations, styling).
mcp__maestro__take_screenshot(device_id)
→ returns a screenshot image.
Cap at resolved verification.visual_screenshot_budget (fallback: 3) screenshots per run to manage token cost. Resolve per the recipe in docs/CUSTOMIZATION.md#config-resolution.
run_flow_files (preferred for existing flows)Runs one or more pre-written Maestro YAML flow files.
mcp__maestro__run_flow_files(device_id, flow_files=["maestro/signin-happy-path.yaml"], env={...})
→ returns execution results (pass/fail per step with messages).
Resolve verification.visual_maestro_flow_dirs per the config recipe (fallback: ["maestro/", ".maestro/", "test/maestro/"]) and discover flow files with Glob, then pass matching paths.
run_flow (inline YAML)The MCP replacement for the CLI's ephemeral-flow pattern. Pass the YAML body directly — no tmp file, no rm cleanup.
mcp__maestro__run_flow(
device_id,
flow_yaml="appId: com.example.myapp\n---\n- launchApp\n- tapOn: \"Sign In\"\n- inputText: \"test@example.com\"\n- tapOn: \"Continue\"",
env={...}
)
After landing on the target screen, call inspect_view_hierarchy (or take_screenshot) to verify state.
appId resolution order (same as CLI):
verification.visual_mobile_app_id from .soloflow/config.json.verification.visual_maestro_flow_dirs for the first appId: line:
grep -h '^appId:' maestro/*.yaml .maestro/*.yaml test/maestro/*.yaml 2>/dev/null | head -1
skipped_unable and message: "cannot determine appId; set verification.visual_mobile_app_id or add a Maestro flow with appId."run_flow)mcp__maestro__launch_app(device_id, appId) — start the app.mcp__maestro__tap_on(device_id, text=..., id=..., index=..., use_fuzzy_matching=...) — tap. Supports fuzzy match, index disambiguation, and state filters (enabled, checked, focused, selected).mcp__maestro__input_text(device_id, text) — type into the focused field.mcp__maestro__back(device_id) — back button.mcp__maestro__stop_app(device_id, appId) — stop the app.Prefer run_flow for multi-step sequences — it's serialized correctly by Maestro and captures better failure context than composing individual MCP calls.
mcp__maestro__check_flow_syntax(flow_yaml) — validate a YAML flow before running it. Useful when constructing ad-hoc flows.mcp__maestro__cheat_sheet() — returns the Maestro command cheat sheet. Refresh recall when needed.mcp__maestro__query_docs(question) — ask the Maestro docs a natural-language question.Animations. Include waitForAnimationToEnd inside flows you pass to run_flow / run_flow_files, or insert a brief wait before capturing hierarchy/screenshots — otherwise you may capture a mid-transition frame.
Use only when the MCP probe in Path Selection failed. All CLI verification is invoked via Bash. The CLI talks to an already-booted iOS simulator or Android emulator via idb_companion / adb.
maestro hierarchymaestro hierarchy dumps the current view hierarchy as plain text (~200–600 tokens depending on screen complexity — more expensive than the MCP's CSV, still cheaper than screenshots).
maestro hierarchy > /tmp/sf-maestro-hier-$$.txt
# Then Read the file (or pipe through grep for specific testIDs/labels)
Use this for: confirming buttons exist, checking text content, verifying layout structure, reading accessibility labels.
Screenshots only when acceptance criteria require checking visual appearance that hierarchy data cannot answer: colors, images, animations, visual styling.
iOS simulator:
SHOT=$(mktemp /tmp/sf-shot-XXXXXX.png)
xcrun simctl io booted screenshot "$SHOT"
sips -Z 1400 "$SHOT" > /dev/null
# Then Read $SHOT as an image
Android emulator:
SHOT=$(mktemp /tmp/sf-shot-XXXXXX.png)
adb exec-out screencap -p > "$SHOT"
sips -Z 1400 "$SHOT" > /dev/null # or: convert "$SHOT" -resize 1400x "$SHOT" on Linux
# Then Read $SHOT as an image
Downsizing to 1400px longest edge keeps the image readable while managing token cost.
Multi-booted iOS: If xcrun simctl list devices booted | grep -c Booted returns ≥2, booted errors with "multiple booted devices." Pick the first UDID explicitly:
UDID=$(xcrun simctl list devices booted | awk -F'[()]' '/Booted/{print $2; exit}')
xcrun simctl io "$UDID" screenshot "$SHOT"
Cap at resolved verification.visual_screenshot_budget screenshots per verification run (fallback: 3).
maestro testmaestro test maestro/signin-happy-path.yaml
echo "exit=$?"
Exit code 0 = all steps passed. Non-zero = a step failed; stdout/stderr identifies which step and why.
List available flows with Glob, then match flow names to the feature being verified. If a relevant flow exists, prefer it over ad-hoc verification — flows are repeatable and maintained by the project.
The CLI has no one-shot command for individual taps/inputs. Ad-hoc interactions use an ephemeral YAML written to a tmp path, executed with maestro test, then discarded. (The MCP's run_flow avoids this dance — this pattern is the fallback.)
FLOW=$(mktemp /tmp/sf-maestro-XXXXXX.yaml)
cat > "$FLOW" <<'EOF'
appId: com.example.myapp
---
- launchApp
- tapOn: "Sign In"
- inputText: "test@example.com"
- tapOn: "Continue"
EOF
maestro test "$FLOW" 2>&1 | tee /tmp/sf-maestro-last.log
EXIT=$?
rm -f "$FLOW"
After the ephemeral flow lands the app on the target screen, run maestro hierarchy (or a screenshot) to verify state.
appId resolution order — same as the MCP path (see above).
All MCP interactions go through mcp__peekaboo__* tools. Most interactions target a specific app — pass it as app: "AppName" or via bundle id. Path Selection (above) decides MCP vs. CLI; stay on the chosen path for the whole run.
mcp__peekaboo__app(action="launch", name="<AppName or bundle id>")
mcp__peekaboo__app(action="quit", name="<AppName>")
CLI equivalents:
peekaboo app launch "<AppName>"
peekaboo app quit "<AppName>"
seesee returns a screenshot and a structured accessibility annotation for the captured window/app. It is the macOS equivalent of inspect_view_hierarchy + take_screenshot rolled into one call — request the JSON only when you do not need the image.
mcp__peekaboo__see(app="<AppName>", mode="window")
→ returns image + element list with role, title, frame, identifier.
CLI:
peekaboo see --app "<AppName>" --json-output > /tmp/sf-peekaboo-$$.json
Use this for: confirming buttons exist, checking text content, verifying layout structure, reading accessibility labels.
see already returns an image — there is no separate "screenshot-only" tool that costs less. To minimize token use when you do not need pixels, prefer the JSON-only CLI form (peekaboo see --json-output then parse) and avoid passing the captured image back to the model. Cap visible-screenshot captures at the resolved verification.visual_screenshot_budget (fallback: 3) for the run.
mcp__peekaboo__click(app="<AppName>", on="<element query>") — click. Element query can be role+title, identifier, or coords: "x,y".mcp__peekaboo__type(text="<text>") — type into the focused field.mcp__peekaboo__press(key="<key>") — press a single key (return, escape, tab, etc.).mcp__peekaboo__hotkey(keys=["cmd","s"]) — chord.mcp__peekaboo__scroll(app="<AppName>", direction="down", amount=3) — scroll the focused or specified region.mcp__peekaboo__menu(app="<AppName>", path=["File","New"]) — drive menu bar items by path. Indispensable for native Mac apps whose primary affordances live in the menubar.mcp__peekaboo__window(action="focus", app="<AppName>") — focus/bring forward.CLI equivalents follow the peekaboo <verb> --app "<AppName>" <args> pattern, e.g.:
peekaboo click --app "MyApp" --on "Button:Save"
peekaboo type "hello"
peekaboo hotkey "cmd,s"
peekaboo menu --app "MyApp" --path "File,New"
Prefer the higher-level menu, click on=<query>, and hotkey tools over raw coords — they are stable across window-size changes and produce better evidence in failure reports.
mcp__peekaboo__permissions — report Accessibility / Screen Recording grant status. Run once at the top of a session when MCP first errors.mcp__peekaboo__list(target="apps"|"windows") — discover what is currently running. Useful when an app: lookup by name fails.Animations. Native AppKit/SwiftUI animations are typically <500 ms but not zero. After a click/menu that triggers a sheet or transition, insert a brief wait (mcp__peekaboo__sleep(ms=500)) before the next see, otherwise you may capture a mid-transition frame.
Use only when the MCP probe in Peekaboo (macOS) Availability failed. All CLI verification is invoked via Bash. Each command exits non-zero on failure and prints a structured error.
peekaboo app launch "MyApp"
peekaboo see --app "MyApp" --json-output > /tmp/sf-peekaboo-hier-$$.json
peekaboo click --app "MyApp" --on "Button:Save"
peekaboo see --app "MyApp" --path /tmp/sf-peekaboo-shot-$$.png # writes PNG, omit --json-output for image only
peekaboo app quit "MyApp"
For screenshots captured this way, downsize before reading to manage token cost:
sips -Z 1400 /tmp/sf-peekaboo-shot-$$.png > /dev/null
The CLI does not own a port lock, but treat operations against the same window as serialized — do not run two peekaboo commands targeting the same app in parallel.
Take screenshots only when visual appearance must be verified. For content and structure checks, reading page content is more token-efficient.
Prefer cheaper operations first:
| Operation | Path | Approx tokens |
|---|---|---|
mcp__maestro__inspect_view_hierarchy | MCP | ~50 (CSV) |
maestro hierarchy | CLI | ~200–600 (plain text) |
peekaboo see --json-output (JSON only) | MCP/CLI | ~300–1500 (JSON, varies with window complexity) |
| Page content read (web) | — | variable |
| Screenshot (any path, after sizing) | MCP/CLI | ~1600 |
mcp__peekaboo__see (image + JSON) | MCP | ~2000 |
A typical verification should use 1–2 hierarchy inspections and at most verification.visual_screenshot_budget screenshots (default 3). If you find yourself taking more screenshots, reconsider whether hierarchy data or page content would suffice. MCP hierarchy is roughly 4–10× cheaper than CLI hierarchy — one concrete reason MCP is the preferred path. For macOS, prefer the JSON-only form of see (CLI with --json-output) when you only need element presence — the bundled image otherwise inflates each call to ~2000 tokens.
maestro test and maestro hierarchy hold a device lock via idb_companion; the MCP server holds the same lock. Don't run two Maestro operations in parallel against the same device — even through different tool calls.click/see/type calls against the same window race each other (focus stealing, mid-animation captures, dropped events). Pick MCP or CLI once and serialize all calls against a given app.Every visual check must map to a specific acceptance criterion from the task plan. Structure your findings as:
inspect_view_hierarchy / MCP run_flow / MCP screenshot / CLI hierarchy / CLI maestro test / screenshot / Playwright navigationnpx claudepluginhub kesteva/soloflow --plugin soloflow-devAutomates mobile app testing on iOS and Android using Maestro MCP: launches apps, interacts with UI elements, captures screenshots, runs flows, collects evidence. Use to verify implementations before completion.
Verifies frontend changes against spec acceptance criteria using Playwright MCP for browser automation. Automates spec intake, dev server/auth checks, and test runs.
Executes native UI automation on mobile devices via DSL batch scripts: tap/type/swipe elements, launch apps, verify screens, save screenshots using accessibility tree predicates. For testing apps and device interactions.