Help us improve
Share bugs, ideas, or general feedback.
From agentsystem-core
Generates Playwright end-to-end tests for user flows like sign-in, form submit, payments, and navigation. Detects existing Playwright setup, inherits config, and uses role-based selectors.
npx claudepluginhub agentsystemlabs/core --plugin agentsystem-coreHow this skill is triggered — by the user, by Claude, or both
Slash command
/agentsystem-core:add-e2e-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **User-question protocol:** Whenever this skill needs the user to pick between options, confirm an action, or answer a multiple-choice prompt, you MUST call the `AskUserQuestion` tool to render a proper interactive picker. Do NOT print numbered options as plain text and wait for the user to type a number — that produces a degraded UX. Free-form questions (open-ended typing) may be asked in pr...
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
Share bugs, ideas, or general feedback.
User-question protocol: Whenever this skill needs the user to pick between options, confirm an action, or answer a multiple-choice prompt, you MUST call the
AskUserQuestiontool to render a proper interactive picker. Do NOT print numbered options as plain text and wait for the user to type a number — that produces a degraded UX. Free-form questions (open-ended typing) may be asked in prose, but any time you would write "1) … 2) … 3) …", useAskUserQuestioninstead.
Browser tests are expensive — slow, flaky-prone, and dependent on a live server. They earn their cost only when they exercise the real user path. Anything that can be tested at the unit or integration layer should be tested there instead.
Confirm the flow with the user in one sentence: which page does the user start on, what action do they take, what observable outcome proves it worked? If any of those three are vague, ask before writing the test.
If the flow is essentially a function call dressed up as a UI interaction (e.g., "test that this util returns the right value"), redirect: that's a unit test, not an e2e test.
Exit: start URL, user action, observable outcome are written down.
Check, in order:
@playwright/test in package.json devDependencies → installed.playwright.config.ts / .js / .mts at repo root → configured.tests/ or e2e/ directory with existing *.spec.ts files → conventions to inherit.If Playwright is not installed:
Playwright not detected.
Proposed setup: npm init playwright@latest
This will:
- install @playwright/test
- create playwright.config.ts
- create example tests in tests/
- download browser binaries (~170MB)
Approve? (y/n)
Wait for y. Do not auto-install. If the user declines, stop.
Exit: Playwright is installed and configured; the test directory is identified.
Read 1–2 existing spec files. Extract:
tests/, e2e/, colocated)<flow>.spec.ts, <flow>.e2e.ts)test.beforeEach, auth.setup.ts, playwright/fixtures.ts)storageState from a setup project, login-per-test, cookies set in fixtures)If no existing tests, defaults:
tests/ (or e2e/ if tests/ is unit tests).<flow>.spec.ts.tests/auth.setup.ts that logs in once and saves storageState, referenced from playwright.config.ts.Exit: convention notes recorded.
Before writing the real flow, write one trivial Playwright test:
test('app loads', async ({ page }) => {
await page.goto('/')
await expect(page).toHaveTitle(/.+/)
})
Start the dev server (or rely on webServer config), run the smoke test:
npx playwright test <path-to-smoke> --reporter=list
It must pass. If it fails: stop. Debug the baseURL, the dev server start, the webServer config — do not write more tests against a broken harness.
Exit: one smoke test passes against the running app.
Build the test from the user's three sentences:
test('<observable outcome>', async ({ page }) => {
await page.goto('<start url>')
// user actions — prefer role-based selectors
await page.getByLabel('Email').fill('test@example.com')
await page.getByLabel('Password').fill('correct-horse')
await page.getByRole('button', { name: 'Sign in' }).click()
// observable outcome
await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible()
})
Selector preference, in order:
getByRole(...) (with accessible name)getByLabel(...) for form inputsgetByText(...) for non-interactive contentgetByTestId(...) only when the others failFor assertions, always use Playwright's expect (auto-retries until timeout). Do not use Node's assert or chai — those don't retry and produce flaky tests for normal async UI updates.
If the flow needs a logged-in user, use the storageState fixture from Phase 3, not an in-test login (slow, flaky, and you already test login separately).
Exit: the test is written and passes when run against the dev server.
Run the test 3 times in a row:
npx playwright test <path> --repeat-each 3
If any run fails: the test is flaky as-is. Common causes:
await on a network-dependent assertion → use expect(...).toBeVisible() (auto-retries) instead of expect(await locator.isVisible()).toBe(true).waitForTimeout instead of waiting on a condition.await page.waitForURL(...) before asserting on the new page.Fix and re-run until 3-of-3 pass.
Exit: 3 consecutive runs green.
E2E test added: <path>
Flow: <one-line description>
Run: npx playwright test <path>
Stable: 3/3 runs passed
Note: this test starts the dev server; CI will need to run it the same way (or
against a deployed preview).
NEVER use page.waitForTimeout(<ms>) to wait for UI to update.
Instead: use expect(...).toBeVisible() / toHaveText() / waitForURL() / waitForResponse() — Playwright auto-retries until the timeout.
Why: fixed timeouts are the #1 source of e2e flake. They pass on a fast machine, fail on a slow one, and produce noise that erodes trust in the suite. Auto-retrying assertions describe what you're actually waiting for.
NEVER assert against await locator.isVisible() (returns a boolean once).
Instead: await expect(locator).toBeVisible() (retries the locator until visible or timeout).
Why: the boolean form runs once at the moment you call it — if the element appears 50ms later, your test fails for no reason. The expect-with-locator form is the whole point of Playwright's API.
NEVER select by CSS class or implementation detail (.btn-primary, [data-react-id="..."]).
Instead: use getByRole/getByLabel/getByText/getByTestId in that order of preference.
Why: CSS selectors break on every visual refactor that changes class names; the test fails not because behavior regressed but because the selector did. Role-based selectors track what the user actually perceives.
NEVER auto-install Playwright.
Instead: propose npm init playwright@latest and wait for y.
Why: Playwright downloads ~170MB of browser binaries and adds a non-trivial CI surface. The user owns that decision.
NEVER write an e2e test for logic that has no UI surface.
Instead: redirect to write-tests for an integration test that hits the function/API directly.
Why: e2e tests are 10–100× slower and more flaky than equivalent integration tests. Using them for non-UI logic burns CI time and adds flake without any of the e2e-specific value (real browser, real user input).
NEVER skip the smoke test gate. Instead: write one trivial passing test, run it, and only then write the real flow. Why: if the harness, baseURL, or dev server is misconfigured, every subsequent test fails for the same root cause — wasted effort. One green smoke test proves the loop closes.
NEVER share auth state by performing login inside each test.
Instead: use a storageState setup project that logs in once and saves cookies; reference it from each test that needs a logged-in user.
Why: in-test login multiplies the test runtime by N (slow), races on shared user state, and conflates "is login broken" with "is the flow under test broken" when something fails.
NEVER bundle multiple unrelated flows into one test. Instead: one test per flow. If two flows share setup, share the setup via a fixture. Why: a 200-line test that signs up, then creates a project, then invites a teammate fails opaquely — when it goes red, no one can tell which step broke. Shorter tests fail fast and point at the regression.