From developer-workflow
Launches manual-tester agent to explore running web/mobile apps, discovering bugs, UX issues, crashes, edge cases, and accessibility problems without specs or test plans.
npx claudepluginhub kirich1409/krozov-ai-tools --plugin developer-workflowThis skill uses the workspace's default tool permissions.
Explore a running application to discover bugs, UX issues, and edge cases — guided by testing
Conducts time-boxed exploratory testing sessions using heuristics to uncover bugs, edge cases, invalid data issues, and undocumented behaviors, with structured documentation.
Drives Chrome browser sessions against running web apps to detect bugs, UX issues, accessibility violations, performance problems, and copy errors, outputting structured JSON findings for QA passes, smoke tests, or audits.
Audits web app UX by dogfooding as user persona: tracks emotional friction, click efficiency, resilience to back/refresh, return intent. Uses Playwright/Chrome MCP for live sites, outputs ranked reports.
Share bugs, ideas, or general feedback.
Explore a running application to discover bugs, UX issues, and edge cases — guided by testing
heuristics rather than a specification. This is fundamentally different from test-feature:
there is no spec to verify against, no pass/fail verdict, and no predefined test plan. The goal
is to find problems the team hasn't anticipated.
The skill launches the manual-tester agent in exploratory mode: instead of following a test
plan and comparing results against a spec, the agent navigates the app freely, applies heuristics
at each screen, probes edge cases, and reports everything it finds.
Determine what is being tested and how to connect.
Ask or infer from context:
If the user says "the app is already running" or provides a device/URL, use it directly.
Otherwise, follow the same launch logic as test-feature Step 2 (build, install, start
dev server, etc.).
The user may name a specific area to concentrate on: "payments flow", "settings screen", "onboarding", "the new search feature". If provided, exploration starts there and goes deep before branching out. If not provided, start from the app's entry point and explore systematically screen by screen.
Do not ask for a focus area if the user didn't mention one — just start broad.
Exploration is bounded by screen count to keep sessions productive:
| Scope | Screens | When to use |
|---|---|---|
| Quick | ~5 | Fast sanity check, single flow |
| Standard | ~15 | Default — good coverage of core flows |
| Deep | 30+ | Pre-release sweep, complex app |
Use Standard by default. Only ask for scope preference if the user's intent is ambiguous. If they say "quick check" or "just the settings screen", use Quick. If they say "full QA" or "before we release", use Deep.
Spawn the manual-tester agent with an exploratory prompt. The prompt structure differs from
test-feature — instead of a spec and test plan, it provides heuristics and exploration rules.
You are performing exploratory testing — there is no specification to verify against.
Your goal is to discover bugs, UX problems, crashes, and edge cases by navigating the app
and applying testing heuristics at every screen you visit.
## Target
[Device/URL/connection details from Step 1]
## Focus Area
[User-provided focus area, or: "None — start from the entry point and explore systematically"]
## Scope
Explore up to [N] unique screens or flows. Maintain a running list of visited screens.
Stop when you reach the boundary or have covered all reachable screens.
## Exploration Heuristics
At each screen, apply these checks:
**Visibility of system status** — Does the app show loading indicators, progress bars,
success confirmations, error messages? Try triggering a slow operation and watch.
**Error handling** — Enter invalid input in every field you find. Try submitting empty forms.
What happens with no network? Does the app show helpful error messages or fail silently?
**Navigation consistency** — Does the back button work as expected? Are there dead ends?
Can you reach the same screen from multiple paths and get consistent results?
**State preservation** — Rotate the device (mobile) or resize the browser window (web).
Background and foreground the app. Is state preserved across these transitions?
**Input edge cases** — For each input field you encounter, try ONE of these:
- A string longer than 200 characters
- Special characters: emoji (😀), RTL text (مرحبا), HTML tags (<b>test</b>)
- Empty submission (leave required fields blank)
Do not try all three on every field — pick the one most likely to cause trouble.
**Empty states** — Navigate to screens that display lists or data. What happens when
there is no data? Is there a meaningful empty state, or does the screen look broken?
**Performance** — Note any visible lag, janky animations, slow transitions, or unresponsive
UI. You don't need to measure precisely — just flag what feels wrong.
**Visual consistency** — Compare the current screen to others you've seen. Are fonts, spacing,
colors, and alignment consistent? Flag anything that looks out of place.
**Accessibility basics** — Are there buttons without labels? Touch targets that look too small
(below ~44×44 dp on mobile)? Text that's hard to read against its background?
## Reporting
Report issues in two categories:
**Bugs** — Use the standard BUG format (BUG-[SESSION_ID]-[n]) for anything that is clearly
wrong: crashes, broken functionality, visual defects, data loss, incorrect behavior.
**Observations** — Use a new OBSERVATION format for things that aren't clearly bugs but are
noteworthy: confusing UX, inconsistent patterns, surprisingly slow transitions, missing
feedback, questionable design choices. These are "a reasonable user might struggle here"
findings.
OBSERVATION-[SESSION_ID]-[n]: [Title]
Screen: [where you saw it]
Details: [what you noticed and why it matters to users]
Heuristic: [which heuristic flagged this — e.g., "error handling", "visual consistency"]
**Coverage log** — After each screen, add it to your running coverage list with a one-line
note of what heuristics you applied and what (if anything) you found.
Do NOT produce a pass/fail verdict or a ship/no-ship recommendation.
There is no spec to verify against — you are discovering, not judging.
Let the agent run its full cycle. Do not interfere unless it asks a question or hits a P0 blocker (P0 escalation rule still applies — a crash or data loss warrants immediate attention even in exploratory mode).
When the manual-tester agent completes, process its output into a structured report.
## Exploratory Testing Report
**Date:** [date]
**App:** [name, version, or URL]
**Device / Browser:** [device name + OS version, or browser + viewport]
**Focus Area:** [area, or "General — full app exploration"]
**Scope:** [Quick / Standard / Deep] — [N] screens visited
---
### Coverage Map
| # | Screen / Flow | Heuristics Applied | Issues |
|---|---------------|-------------------|--------|
| 1 | [screen name] | [which checks were done] | BUG-..., OBS-... or "—" |
| 2 | ... | ... | ... |
---
### Bugs ([n] total)
P0 Blockers: [n] | P1 Major: [n] | P2 Minor: [n] | P3 Cosmetic: [n]
[Full BUG entries in standard manual-tester format, ordered P0-first]
---
### Observations ([n] total)
[Full OBSERVATION entries, grouped by heuristic category]
---
### Summary
[2-3 sentence overall quality assessment based on what was found]
Based on findings, guide the user toward the right next action:
P0 blockers found — Critical issues need fixing before further testing. After fixes,
re-run exploratory-test on the affected area, or use test-feature with a spec for targeted
verification.
P1/P2 bugs found, no blockers — The app is functional but has issues. Consider creating
a test plan for the affected areas using generate-test-plan, then verifying fixes with
test-feature.
Only P3 bugs or observations — No critical issues. The app is in reasonable shape for the
areas explored. If pre-release confidence is needed, consider a spec-based verification pass
with test-feature.
Nothing found — The explored scope looks clean. Consider increasing scope to Deep, focusing on a specific area the team is concerned about, or moving on to spec-based verification.
When the user fixes reported bugs and wants to check again: