npx claudepluginhub himattm/skills --plugin androidThis skill uses the workspace's default tool permissions.
Use a screenshot only when the JSON layout tree can't answer the question:
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.
Executes ctx7 CLI to fetch up-to-date library documentation, manage AI coding skills (install/search/generate/remove/suggest), and configure Context7 MCP. Useful for current API refs, skill handling, or agent setup.
Share bugs, ideas, or general feedback.
Use a screenshot only when the JSON layout tree can't answer the question:
android layoutlayout may fail or return partial state mid-frameresourceId or textFor everything else, use verify-android-layout first. A JSON dump is strictly cheaper than a vision-token screenshot for "did the element appear?" / "is the input focused?" / "did the text update?" style questions.
A single screenshot is a large image payload. Reading it in the main thread burns tokens fast — across an iteration loop, inline reads pollute the conversation and balloon context. Always delegate to a sub-agent. It reads the image, returns a short text answer, and the bytes never enter the main thread.
Capture to a tmp path:
android screen capture -o /tmp/<descriptive-name>.png
Add --device <serial> if multiple devices are connected.
Spawn a sub-agent (general-purpose or Explore) with model: "sonnet" and a self-contained prompt that includes:
Act on the text answer. Do NOT Read the screenshot yourself.
When you need to interact with an element that doesn't show up in android layout (or you can't identify it by resourceId):
android screen capture --annotate -o /tmp/annotated.png
This overlays numbered labels and bounding boxes on every UI element. Have the sub-agent identify the label number for the element you want, then resolve to coordinates:
adb shell input tap $(android screen resolve --screen /tmp/annotated.png --string "tap #34")
The chained form lets you tap a numbered annotation in a single command.
Read
/tmp/reader-after-hold.png. Verify: (a) a single large word is centered in the upper third with a red ORP letter, (b) the bottom inline-context strip is visible, (c) no code block is shown — we expect a paused image break with caption "Pipeline diagram". Answer in under 40 words: did all three pass? If not, which failed and what's actually visible?
Read
/tmp/annotated.png. Find the "Sign in" button — return only its label number (e.g.#7). If multiple candidates, pick the most prominent. One token answer.
Read
/tmp/webview-state.png. The page should show a logged-in user header with avatar in the top-right and a "Welcome back" greeting. Under 30 words: present or not, and what's actually shown?
The task is narrow: read one image, check 2–3 criteria, return a sentence. Sonnet is multimodal and much cheaper than Opus for this. Haiku also works for very simple criteria. Always pass model: "sonnet" when spawning the verification sub-agent — never let it default to Opus.
| Mistake | Fix |
|---|---|
| Reaching for a screenshot when JSON would answer | Try verify-android-layout first; screenshots are the fallback |
| Reading the screenshot inline "just to check quickly" | The bytes are massive even for a quick peek. Always delegate. |
| Vague criteria ("does it look right?") | Spell out what should/shouldn't be on screen and where |
| No return-format cap | Agents return long descriptions by default. Specify "under N words" or "YES/NO + one sentence" |
| Letting the sub-agent default to Opus | Pass model: "sonnet" explicitly every time |
| Forwarding the screenshot back to the main thread | Defeats the purpose. Sub-agent returns text only. |