Capture annotated screenshots of a running Android app and translate UI element labels into actual tap coordinates, using `android screen capture` + `android screen resolve` and the structured `android layout` JSON. Use when the user wants to take a screenshot, automate UI interactions without hardcoding pixel coordinates, compare UI state before/after a change, or extract the view hierarchy. Triggers include "스크린샷", "screenshot", "버튼 클릭 자동화", "UI 자동화", "tap", "view hierarchy", "layout 비교", "이 버튼 좌표". Do NOT use this skill for visual regression testing of release builds — use Playwright/Espresso for that.
npx claudepluginhub kez-lab/android-custom-skillsThis skill uses the workspace's default tool permissions.
Three primitives, used in two main loops:
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Three primitives, used in two main loops:
screen capture --annotate produces a labeled image; screen resolve translates label numbers into (x, y); adb shell input tap performs the action.android layout produces a structured JSON snapshot of the on-screen view tree; --diff shows only what changed since the last snapshot.This skill assumes a device is already connected (use android-emulator or android-deploy to get there).
android screen capture --annotate -o ui.png
↓ (user/agent picks a label)
android screen resolve --screenshot=ui.png --string="input tap #5"
↓ (resolved to e.g. "input tap 540 1283")
adb shell <resolved>
Concrete sequence:
# 1. Capture an annotated screenshot
android screen capture --annotate -o ui.png
# 2. Show the user (or yourself) the image to identify which numbered label
# corresponds to the target UI element.
# 3. Resolve label → coordinates
android screen resolve --screenshot=ui.png --string="input tap #5"
# Output: input tap 540 1283
# 4. Execute the tap
adb shell input tap 540 1283
The --string argument can interleave labels with literal text: "input swipe #3 #7" resolves both #3 and #7.
The annotation numbering is generated at capture time. After any UI change (a tap that opens a new screen, a layout reflow, a snackbar appearing), the previous annotated image's label numbers no longer correspond to the same elements. Always re-capture before resolving if anything has changed since the last capture.
# 1. Snapshot baseline
android layout --pretty -o before.json
# 2. Trigger the change (user action, server response, etc.)
# 3. See only what changed
android layout --diff
Use cases:
The internal snapshot is updated each time android layout runs, so two consecutive calls without intervening UI changes will show no diff.
android screen capture --annotate -o /tmp/ui.png
adb shell $(android screen resolve --screenshot=/tmp/ui.png --string="input tap #2")
--annotate is required for resolve. A plain capture (without --annotate) has no labels to resolve against.--pretty only for human reading. For diffs and automated parsing, the compact form is more efficient.adb shell input for taps, not monkey. Monkey is for stress testing, not deterministic automation./tmp/ or build/ (gitignored) by default.| Goal | Use | Don't use | Why |
|---|---|---|---|
| Find tap coordinates without measuring pixels | screen capture --annotate + resolve | manual inspection | Labels survive DPI / orientation differences |
| Detect what changed in the UI | layout --diff | screenshot diff | Structured, ignores anti-aliasing noise |
| Full view hierarchy for offline analysis | layout --pretty -o | adb shell uiautomator dump | JSON is structured and stable; uiautomator XML is verbose |
| Scripted button clicks in tests | Espresso / Compose UI test | this skill | Tests should not depend on pixel coordinates |
| Visual regression | Paparazzi / Roborazzi | this skill | Headless, hermetic, runs in CI |
| Pixel-perfect screenshot for marketing | Studio Device Screenshot | screen capture | This skill is for automation, not asset creation |
ui.png after taps have changed the screen.input tap 540 1283 in a script — different device → wrong button.--annotate then trying to resolve — fails silently with bad output.android layout as a substitute for proper UI tests.adb devices.| Error | Cause | Fix |
|---|---|---|
screen capture produces empty PNG | No connected device or screen off | adb devices; adb shell input keyevent KEYCODE_WAKEUP |
resolve returns the input unchanged | Label #N doesn't exist in screenshot | Re-run screen capture --annotate, look at fresh labels |
layout returns {} | App not in foreground | adb shell am start -n <pkg>/<activity> first |
| Coordinates feel "off" by a constant | Status bar / cutout offsets in fullscreen mode | Re-capture; coordinates account for current insets |