Help us improve
Share bugs, ideas, or general feedback.
From regression-test
Use when the user asks to "test the UI", "screenshot the page", "verify the site still works", "check the deployment", "run regression tests", or "audit the look and feel" against a web-accessible URL. Discovers and runs existing test suites (npm test, dotnet test, pytest), then performs functional checks and visual screenshots at desktop/tablet/mobile viewports via Playwright MCP, and produces a markdown report. Skip for unit/integration tests with no UI, pure backend changes, or when no URL is reachable.
npx claudepluginhub marcelroozekrans/superpowers-extensions --plugin regression-testHow this skill is triggered — by the user, by Claude, or both
Slash command
/regression-test:regression-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<HARD-GATE>
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Write tool (or the Playwright MCP screenshot calls) to
actually create the files. Narrating "I've captured the screenshots"
or "the report has been generated" without a tool call leaves nothing
on disk and gives the user a false PASS signal.Read it back and confirm the summary table,
per-page findings, and recommendations sections are present.Glob the screenshot directory and confirm the
expected number of files (3 viewports × N pages) is present.If the artifacts are missing at the verify step, the previous step did not complete — return to it and retry the tool call. Do not proceed.
This skill requires the Microsoft Playwright MCP server (@playwright/mcp) with the --caps=testing flag, which enables browser_verify_* assertion tools and browser_generate_locator used in Phase 3b.
Auto-configured: When installed via claude plugin install regression-test, the .mcp.json automatically configures the playwright MCP server with npx @playwright/mcp@latest --caps=testing. No manual setup needed.
Manual install (if not using the plugin):
claude mcp add playwright -- npx @playwright/mcp@latest --caps=testing
Optional flags:
# Headed mode (see the browser window) + testing assertions
claude mcp add playwright -- npx @playwright/mcp@latest --caps=testing --headless=false
# All capabilities (testing + PDF export + vision-based coordinates)
claude mcp add playwright -- npx @playwright/mcp@latest --caps=testing,pdf,vision
For full configuration options (browser choice, viewport defaults, proxy, storage state), see: https://github.com/microsoft/playwright-mcp
This skill defines a structured, repeatable process for regression testing web applications using the Microsoft Playwright MCP server (@playwright/mcp). The core principle is simple and non-negotiable:
"Never ship without seeing every page at every viewport."
Regression testing is the last line of defense before code reaches users. It combines three complementary strategies: running existing automated tests to catch known regressions, performing functional checks through real browser interaction to catch behavioral issues, and conducting visual evaluation through screenshots at multiple viewport sizes to catch layout, styling, and aesthetic problems.
This skill treats every page as important and every viewport as a potential source of breakage. Skipping a page or viewport is how bugs ship. The process is designed to be thorough, systematic, and documented with evidence.
When this skill is activated, begin with:
"Starting regression testing. I'll work through four phases: discovery, existing tests, browser-based functional and visual checks, and reporting. I'll ask for the application URL and any credentials when needed."
Invoke this skill in any of the following situations:
Use this checklist to track progress through the four phases:
The following Graphviz diagram illustrates the full regression testing workflow:
digraph regression_test {
rankdir=TB;
node [shape=box, style=rounded, fontname="Helvetica"];
edge [fontname="Helvetica", fontsize=10];
discovery [label="Phase 1:\nDiscovery"];
has_tests [label="Has existing\ntests?", shape=diamond];
run_tests [label="Phase 2:\nRun Existing Tests"];
ask_url [label="Ask user\nfor URL"];
navigate [label="Phase 3a:\nNavigate to URL"];
auth_check [label="Login form\ndetected?", shape=diamond];
credentials [label="Ask credentials\n& login"];
functional [label="Phase 3b:\nFunctional Checks"];
visual [label="Phase 3c:\nVisual Evaluation"];
more_pages [label="More pages\nto test?", shape=diamond];
report [label="Phase 4:\nGenerate Report"];
summary [label="Conversation\nSummary"];
discovery -> has_tests;
has_tests -> run_tests [label="Yes"];
has_tests -> ask_url [label="No"];
run_tests -> ask_url;
ask_url -> navigate;
navigate -> auth_check;
auth_check -> credentials [label="Yes"];
auth_check -> functional [label="No"];
credentials -> functional;
functional -> visual;
visual -> more_pages;
more_pages -> functional [label="Yes"];
more_pages -> report [label="No"];
report -> summary;
}
The discovery phase gathers information about the project's testing infrastructure, application structure, and how to access the running application. This phase is entirely local and does not require a browser.
Before running discovery, check for an existing recent report:
Glob docs/regression-report-*.md — list any prior reports.
If a report from within the last 30 minutes exists (compare the timestamp in the filename regression-report-YYYY-MM-DD-HHmm.md to current time), present the user with a choice:
"A recent regression report exists at
docs/regression-report-YYYY-MM-DD-HHmm.md(NN minutes old). Re-run anyway, or use the existing report?"
regression-test immediately and let the caller (audit-milestone, ui-review, pre-push-review) consume the existing report.If no recent report exists OR the prior report is older than 30 minutes, proceed without prompting.
This dedup is important when regression-test is invoked transitively — audit-milestone, ui-workflow ui-review, and pre-push-review all call regression-test, and a milestone closeout can plausibly trigger three runs back-to-back. Without dedup, that's three full screenshot sweeps for unchanged code. The 30-minute window is short enough that real changes between calls won't be hidden, and long enough that closeout chains won't re-run unnecessarily.
Search for test framework configuration files using the following globs:
**/jest.config.* , **/jest.setup.* -- Jest**/vitest.config.* -- Vitest**/playwright.config.* -- Playwright**/cypress.config.* , **/cypress.json -- Cypress**/.mocharc.* , **/mocha.opts -- Mocha**/karma.conf.* -- Karma**/nightwatch.conf.* -- Nightwatch**/wdio.conf.* -- WebdriverIOFor detailed framework detection logic, patterns, and runner commands, see test-framework-detection.md.
Search for test files using common naming patterns:
**/*.test.{js,ts,jsx,tsx}**/*.spec.{js,ts,jsx,tsx}**/__tests__/**/*.{js,ts,jsx,tsx}**/e2e/**/*.{js,ts,jsx,tsx}**/tests/**/*.{js,ts,jsx,tsx}Examine package.json for test-related scripts. Look for keys containing test, e2e, spec, cypress, playwright, or jest. These are the preferred way to run tests because they include project-specific configuration.
Search the codebase for route definitions to build a list of pages to test:
<Route path=, createBrowserRouter, useRoutespages/ or app/ directoriesroutes: arrays, path: propertiesRouterModule.forRoot, loadChildrenapp.get(, router.get(, @GetMapping, @RequestMappingCheck for the application URL in this order:
package.json scripts (dev, start, serve).env files with PORT, HOST, or URL variableshttp://localhost:3000If existing test suites were discovered in Phase 1, run them before proceeding to browser-based testing. Existing tests catch known regressions quickly and cheaply.
Prefer package.json scripts -- Always use npm test, npm run e2e, or similar scripts rather than invoking test runners directly. These scripts include the correct configuration, environment variables, and flags.
Include reporter flags for better output parsing:
--verbose --no-coverage (readable output, skip coverage to save time)--reporter=verbose--reporter=list--reporter spec--reporter specCapture all output -- Record exit code, stdout, and stderr for every test run. Parse the output to extract:
Record pass/fail/skip counts -- Store these for inclusion in the final report.
Continue on failure -- If tests fail, record the failures but continue to Phase 3. Failed automated tests do not block browser-based testing; both sources of information are valuable.
Skip if none found -- If no test frameworks or test files were discovered in Phase 1, skip this phase entirely and proceed to Phase 3. Note in the report that no existing tests were found.
Before testing pages, establish a browser session and handle any authentication requirements.
Ask the user for the application URL if not already determined in Phase 1. Confirm the URL is accessible.
Navigate to the application using browser_navigate with the provided URL.
Detect login requirements by calling browser_snapshot and examining the page content. Look for indicators such as:
/login, /signin, /auth pathsAsk the user for credentials if a login form is detected. Never guess or hardcode credentials. Prompt clearly:
"I detected a login form. Please provide the username/email and password to proceed with testing."
Authenticate using browser_fill_form to enter the credentials into the detected form fields, then browser_click to submit the login form.
Verify authentication by calling browser_wait_for to confirm the login succeeded. Wait for a post-login indicator such as a dashboard heading, navigation menu, user avatar, or the absence of the login form.
Perform functional checks on every identified page. These checks verify that pages load correctly, function properly, and are free from errors.
For each page in the route list:
Navigate to the page using browser_navigate.
Wait for content using browser_wait_for to ensure the page has fully loaded. Wait for a key content element such as a heading, main content area, or data table.
Capture page structure using browser_snapshot to obtain the accessibility tree. Review the snapshot for:
Check for console errors using browser_console_messages with level "error". Record any JavaScript errors, failed resource loads, or runtime exceptions. Console errors are significant findings.
Check network requests using browser_network_requests to identify:
Test interactive elements where applicable. For forms, use browser_fill_form with test data to verify fields accept input. For buttons and links, use browser_click to verify they respond. For dropdowns and menus, verify they open and close properly.
Verify key assertions using the Playwright MCP testing tools:
browser_verify_text_visible -- Confirm expected text is displayed on the pagebrowser_verify_element_visible -- Confirm key UI elements are present and visiblebrowser_verify_value -- Confirm form fields and inputs contain expected valuesbrowser_verify_list_visible -- Confirm lists (navigation, menus, data lists) render correctlyGenerate locators for important elements using browser_generate_locator when you need stable selectors for test assertions or to reference elements across pages.
Record all findings for each page: errors, warnings, structural issues, and functional problems.
Visual evaluation is the most distinctive capability of this skill. For every page, capture screenshots at every viewport size and evaluate them for visual quality.
| Viewport | Width | Height | Represents |
|---|---|---|---|
| Desktop | 1920 | 1080 | Standard desktop/laptop monitor |
| Tablet | 768 | 1024 | iPad and similar tablets |
| Mobile | 375 | 812 | iPhone and similar smartphones |
For each page, iterate through all three viewports:
Set the viewport using browser_resize with the appropriate width and height from the table above.
Capture a viewport screenshot using browser_take_screenshot with default settings (visible viewport area). This shows what the user sees immediately upon landing.
Capture a full-page screenshot using browser_take_screenshot with fullPage: true. This reveals the complete page content including below-the-fold areas.
Evaluate the screenshots by examining each captured image. Assess the following visual criteria:
For the complete rubric and scoring criteria, see visual-criteria.md.
All screenshots must be saved to a timestamped directory using the following convention:
docs/regression-screenshots/YYYY-MM-DD-HHmm/{page}-{viewport}.png
Where:
YYYY-MM-DD-HHmm is the current date and time (e.g., 2026-03-01-1430){page} is a slugified version of the page name or route (e.g., home, dashboard, settings-profile){viewport} is the viewport name in lowercase (e.g., desktop, tablet, mobile)Examples:
docs/regression-screenshots/2026-03-01-1430/home-desktop.pngdocs/regression-screenshots/2026-03-01-1430/home-tablet.pngdocs/regression-screenshots/2026-03-01-1430/home-mobile.pngdocs/regression-screenshots/2026-03-01-1430/dashboard-desktop.pngFull-page screenshots append -full to the viewport name:
docs/regression-screenshots/2026-03-01-1430/home-desktop-full.pngAfter all pages have been tested at all viewports, generate a comprehensive markdown report.
Save the report to:
docs/regression-report-YYYY-MM-DD-HHmm.md
Where YYYY-MM-DD-HHmm matches the timestamp used for screenshots.
The report must include the following sections:
| Metric | Value |
|---|---|
| Date | YYYY-MM-DD HH:mm |
| Application URL | (url) |
| Pages Tested | (count) |
| Viewports Tested | (count) |
| Existing Tests Passed | (count) |
| Existing Tests Failed | (count) |
| Console Errors Found | (count) |
| Network Errors Found | (count) |
| Visual Issues Found | (count) |
| Overall Status | PASS / FAIL / WARN |
If existing tests were run in Phase 2, include:
For each page tested, include:
A prioritized list of issues found during testing, ordered by severity:
After generating the report, provide a concise summary directly in the conversation. This summary gives the user an immediate understanding of the results without opening the report file.
Include the following in the conversation summary:
These are mistakes that compromise the quality of a regression test. If you notice yourself doing any of these, stop and correct course:
Skipping pages -- Every discovered route must be tested. Do not skip pages because they "look similar" or "probably haven't changed." Each page can have unique layout, data, and rendering behavior.
Not checking all viewports -- Every page must be tested at all three viewport sizes (Desktop, Tablet, Mobile). Desktop-only testing misses the majority of responsive layout bugs.
Ignoring console errors -- Console errors are always significant. Do not dismiss them as "just warnings" or "not related to UI." Every error must be recorded and reported.
Not asking for credentials when authentication is detected -- If the browser snapshot shows a login form, you must ask the user for credentials. Do not attempt to bypass authentication or skip authenticated pages.
Rushing visual evaluation -- Each screenshot must be examined carefully. Do not generate a screenshot and immediately mark the page as "looks fine." Evaluate every criterion: layout, spacing, typography, color, responsiveness, completeness, and polish.
Generating a report without visiting pages -- The report must be based on actual browser visits and screenshots. Never generate a report from assumptions, cached data, or code analysis alone. Every finding must come from a real browser session.
These are excuses that sound reasonable but lead to incomplete testing. The correct response to each is provided.
| Rationalization | Why It's Wrong | Correct Action |
|---|---|---|
| "The homepage looks fine, that's enough" | Internal pages often have different layouts, components, and data dependencies that break independently | Test every discovered route |
| "Desktop viewport is sufficient" | Over 50% of web traffic is mobile; responsive breakpoints are a common source of layout bugs | Test all three viewports for every page |
| "Those are just warnings, not errors" | Warnings often indicate deprecations, performance issues, or impending failures that affect user experience | Record all console messages at error level and report them |
| "The automated tests passed, so we can skip browser testing" | Automated tests verify specific behaviors but do not assess visual quality, layout, or aesthetic polish | Always perform browser-based visual evaluation regardless of automated test results |
| "The app needs to be running first, so I can't test" | Ask the user to start the application or help them start it; do not silently skip browser testing | Ask the user for a running URL or help start the dev server |
| "There are too many pages to test them all" | Thoroughness is the core value; skipping pages is how bugs ship to production | Test every page; if truly excessive, confirm a subset with the user first |
| "Taking screenshots at every viewport makes the process too long" | Screenshots are the evidence that proves testing was done; they are the most valuable artifact of the process | Capture every screenshot; the time investment is worth the confidence gained |
Use this table for a fast reminder of what each phase involves and which MCP tools are needed.
| Phase | Key Actions | MCP Tools Used |
|---|---|---|
| Phase 1: Discovery | Detect frameworks, find tests, grep routes, determine URL | (none -- uses file search and grep) |
| Phase 2: Existing Tests | Run test suites, capture output, record results | (none -- uses shell commands) |
| Phase 3a: Setup & Auth | Navigate to URL, detect login, enter credentials | browser_navigate, browser_snapshot, browser_fill_form, browser_click, browser_wait_for |
| Phase 3b: Functional Checks | Load pages, check structure, find errors, verify assertions | browser_navigate, browser_wait_for, browser_snapshot, browser_console_messages, browser_network_requests, browser_fill_form, browser_click, browser_verify_text_visible, browser_verify_element_visible, browser_verify_value, browser_verify_list_visible, browser_generate_locator |
| Phase 3c: Visual Evaluation | Resize viewport, capture screenshots, evaluate visuals | browser_resize, browser_take_screenshot |
| Phase 4: Reporting | Generate markdown report with findings and screenshots | (none -- writes markdown file) |
This skill is designed to complement — not replace — the superpowers workflow skills. Here is how they fit together:
| Superpowers Skill | Relationship | Notes |
|---|---|---|
verification-before-completion | This skill provides evidence. Verification-before-completion requires running verification commands and confirming output before success claims. A completed regression test report with screenshots is strong verification evidence. | The regression report satisfies the evidence requirement for visual and functional verification. |
finishing-a-development-branch | Run before finishing. Regression testing is a quality gate that should complete before deciding how to integrate the branch. | Provides confidence that the branch hasn't introduced visual or functional regressions. |
pre-push-review | Can be invoked by pre-push-review. During Phase 5 of pre-push-review, this skill is optionally invoked for browser-based regression testing if a web UI is available. | This skill can also run standalone outside of pre-push-review. |
The following companion documents provide detailed criteria and detection logic referenced throughout this skill: