Skill

manual-testing

Verifies a feature works by running the real app, not by trusting green tests. Use when a backpressured loop reaches the before-done gate (Phase 3) and needs to confirm new behavior end-to-end — curling live API endpoints and driving a real browser via Playwright — including the cheap unhappy paths, before the task is called done.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/backpressured:manual-testing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**You are the machine that uses the feature the way a client or a person will, so a human isn't the first to find out it doesn't actually work.** Automated green is necessary, not sufficient: tests exercise the code, not the *running system*. Wiring, integration, env, and the error paths break in ways a passing unit test never sees. This is the "run it for real" gate.

SKILL.md

74 lines · ~1.9k tokens

Stats

LanguageJavaScript

Stars22

MaintenanceExcellent

Last CommitMay 31, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Manual Testing

Overview

You are the machine that uses the feature the way a client or a person will, so a human isn't the first to find out it doesn't actually work. Automated green is necessary, not sufficient: tests exercise the code, not the running system. Wiring, integration, env, and the error paths break in ways a passing unit test never sees. This is the "run it for real" gate.

Core principle: you only pass this on what you personally observed in the running app. Not "tests pass, so it works." Not "I curled the happy path." You booted it, you drove it, you saw each acceptance criterion — and the cheap failure cases — behave. Evidence, not inference.

When to Use

A backpressured loop's Phase 3, after automated checks are green, before the task is done. Always run it here — once before done, every time, even when the suite (including integration and end-to-end tests) is green. Automated tests are never a substitute (see Boundaries). The only thing that removes this gate is an explicit manual-testing opt-out in BACKPRESSURE.md's Skip section. It's slower than tests, so don't run it every iteration — mid-loop, run it only when something warrants it; but the before-done gate is not optional.

Not for: replacing automated tests or per-iteration checks (this is the end gate, not every patch). Gate by what exists: an API → curl it; a UI → drive the browser; genuinely nothing runnable (or an explicit BACKPRESSURE.md opt-out) → skip with a note saying why. A green test suite is not a reason to skip.

Step 1 — Get the app running, for real

Discover how to run it yourself — package.json scripts, Makefile, docker-compose, README, .env.example (or use project-specific run skills if they exist). Then actually boot it: start it in the background, tail the logs until it's genuinely healthy ("listening on :PORT", no startup errors), and note the URLs.

Set up the minimum real dependencies to exercise the feature: bring up the DB, run migrations, seed at least one real row, and obtain a valid auth token and a real id if the feature needs them. If it genuinely can't boot (missing secret, broken dep), report BLOCKED — do not fake it or skip to "tests pass."

Step 2 — Walk each acceptance criterion through the running system

Exercise every acceptance criterion against the live app, not the test suite.

API: curl -i (status line visible) each endpoint. Check the body shape, headers, and content the consumer depends on — not just a 2xx. (CSV is actually CSV with the right Content-Type, not an empty body or a stack trace.)
UI: drive a real browser via the Playwright MCP — navigate, fill, click the actual flow. Don't substitute a curl for a user action.
Transient states flicker. A state like "button disabled while submitting" is too fast to catch by default — throttle the network (or observe the disabled attribute toggling) so you actually see it, rather than assuming it.

Step 3 — Hit the cheap unhappy paths

The failure cases are where manual testing earns its keep over automated tests — and they're trivial by hand. Exercise the ones in the acceptance criteria and the obvious ones around them: error responses (400 / 401 / 403 / 404), empty or invalid input, missing fields, the permission-denied case. Confirm the app returns the status and the user-facing behavior the feature promises (the inline error renders; the toast shows), not a 500 or a silent no-op.

Step 4 — The UI: confirm function always, judge appearance only without a reference

Two different things — don't conflate them:

Functional state — always yours, reference or not. Confirm the states the criteria promise actually happen: the submit button's disabled toggles, the inline error text renders, the toast appears. Capture screenshots / DOM observations of these as evidence regardless of whether a design reference exists — this is functional verification, not styling.
Appearance — conditional. No design reference provided → take sanity screenshots and look for obvious breakage (blank page, broken layout, missing element). Judge only that it isn't visibly broken; something that merely looks off (placement, wording) is a finding, not a failure. A Figma/Linear reference exists → styling fidelity is not your job — defer it to [[visual-review]]; you still confirm the feature functions (the bullet above).

Step 5 — Record the evidence, then judge

"Tested manually, works" is not a passing check. Record the actual curl commands and their status/bodies, what you clicked in the browser, and what the screenshots showed. Passing = you personally observed, in the running app, every acceptance criterion and the cheap unhappy paths behaving correctly. Anything wrong is unresolved work: fix it and re-run, or stop and name the blocker — never wave it through or call it done off green tests.

Boundaries

vs [[visual-review]]: manual-testing = does it work; visual-review = does it match the design. Functional verification is always manual-testing's; only appearance judgment splits by whether a reference exists (Step 4).
vs the reviewer subagent: the reviewer reads the diff; you run it.
vs automated tests: this is the for-real pass that catches what tests miss — a complement, never a substitute.

Common rationalizations

Rationalization	Reality
"Tests pass, so it works"	Tests exercise the code, not the running system. Wiring, env, and integration fail outside them. Run it.
"I curled the happy path, ship it"	You didn't drive the UI or hit a single error path. The happy path is the case you already knew worked.
"Manual testing is slow, skip it this once"	It's the before-done gate, not an optional extra. Run it every time, unless `BACKPRESSURE.md`'s Skip section opts out — an ad-hoc "just this once" is not an opt-out.
"Integration/E2E tests already drive the app, so this is covered"	They run automated, against the code, and share automated tests' blind spots. This gate still runs unless `BACKPRESSURE.md` explicitly opts out of manual testing.
"The disabled-while-submitting state is too fast to see"	Throttle the network and observe it. "Too fast to see" is not "verified".
"No design reference, so I can't check the UI"	You can still confirm it renders and works. Take sanity screenshots; look for obvious breakage.
"I tested it manually, it's fine"	With no recorded commands/responses/screenshots, that's an assertion, not a check.

Red flags — STOP

Calling the task done off green tests without running the app.
Curl happy-path only — never drove the UI, never hit an error path.
Claiming a manual pass with no recorded evidence of what you ran and saw.
Skipping the gate "just this once" (the only legitimate skip is an explicit opt-out in BACKPRESSURE.md's Skip section).
Judging UI styling as pass/fail with no design reference (that's a finding, or [[visual-review]]'s job — not a manual-testing failure).

manual-testing

Popularity

Invocation

Context Preview

SKILL.md

manual-testing

Popularity

Invocation

Context Preview

SKILL.md

Manual Testing

Overview

When to Use

Step 1 — Get the app running, for real

Step 2 — Walk each acceptance criterion through the running system

Step 3 — Hit the cheap unhappy paths

Step 4 — The UI: confirm function always, judge appearance only without a reference

Step 5 — Record the evidence, then judge

Boundaries

Common rationalizations

Red flags — STOP

Similar Skills

Manual Testing

Overview

When to Use

Step 1 — Get the app running, for real

Step 2 — Walk each acceptance criterion through the running system

Step 3 — Hit the cheap unhappy paths

Step 4 — The UI: confirm function always, judge appearance only without a reference

Step 5 — Record the evidence, then judge

Boundaries

Common rationalizations

Red flags — STOP

Similar Skills