Skill

regression-test

Use when the user asks to "test the UI", "screenshot the page", "verify the site still works", "check the deployment", "run regression tests", or "audit the look and feel" against a web-accessible URL. Discovers and runs existing test suites (npm test, dotnet test, pytest), then performs functional checks and visual screenshots at desktop/tablet/mobile viewports via Playwright MCP, and produces a markdown report. Skip for unit/integration tests with no UI, pure backend changes, or when no URL is reachable.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/regression-test:regression-test

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

<HARD-GATE>

Supporting Files

test-framework-detection.mdvisual-criteria.md

SKILL.md

478 lines · ~6.7k tokens(exceeds 5k compaction limit)

Stats

LanguageJavaScript

Parent stars2

MaintenanceGood

Last CommitMay 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Regression Test Skill

This skill produces two artifacts on disk: screenshots under `docs/regression-screenshots/YYYY-MM-DD-HHmm/` and a markdown report at `docs/regression-report-YYYY-MM-DD-HHmm.md`. For both:

Use the Write tool (or the Playwright MCP screenshot calls) to actually create the files. Narrating "I've captured the screenshots" or "the report has been generated" without a tool call leaves nothing on disk and gives the user a false PASS signal.
VERIFY each artifact by re-reading before announcing the verdict:
- For the report: Read it back and confirm the summary table, per-page findings, and recommendations sections are present.
- For screenshots: Glob the screenshot directory and confirm the expected number of files (3 viewports × N pages) is present.
Only then announce the conversation summary. An announcement before the artifacts exist on disk is the failure mode this gate prevents.

If the artifacts are missing at the verify step, the previous step did not complete — return to it and retry the tool call. Do not proceed.

Prerequisites

This skill requires the Microsoft Playwright MCP server (@playwright/mcp) with the --caps=testing flag, which enables browser_verify_* assertion tools and browser_generate_locator used in Phase 3b.

Auto-configured: When installed via claude plugin install regression-test, the .mcp.json automatically configures the playwright MCP server with npx @playwright/mcp@latest --caps=testing. No manual setup needed.

Manual install (if not using the plugin):

claude mcp add playwright -- npx @playwright/mcp@latest --caps=testing

Optional flags:

# Headed mode (see the browser window) + testing assertions
claude mcp add playwright -- npx @playwright/mcp@latest --caps=testing --headless=false

# All capabilities (testing + PDF export + vision-based coordinates)
claude mcp add playwright -- npx @playwright/mcp@latest --caps=testing,pdf,vision

For full configuration options (browser choice, viewport defaults, proxy, storage state), see: https://github.com/microsoft/playwright-mcp

Overview

This skill defines a structured, repeatable process for regression testing web applications using the Microsoft Playwright MCP server (@playwright/mcp). The core principle is simple and non-negotiable:

"Never ship without seeing every page at every viewport."

Regression testing is the last line of defense before code reaches users. It combines three complementary strategies: running existing automated tests to catch known regressions, performing functional checks through real browser interaction to catch behavioral issues, and conducting visual evaluation through screenshots at multiple viewport sizes to catch layout, styling, and aesthetic problems.

This skill treats every page as important and every viewport as a potential source of breakage. Skipping a page or viewport is how bugs ship. The process is designed to be thorough, systematic, and documented with evidence.

Announce Line

When this skill is activated, begin with:

"Starting regression testing. I'll work through four phases: discovery, existing tests, browser-based functional and visual checks, and reporting. I'll ask for the application URL and any credentials when needed."

When to Use

Invoke this skill in any of the following situations:

Before deployment -- A release is being prepared and the application needs a final quality gate before going live.
After UI changes -- CSS, layout, component, or design-system changes have been made and need visual verification across viewports.
Visual checks -- Someone wants to confirm that the application looks correct, is visually polished, and has no rendering issues.
After dependency upgrades -- Libraries, frameworks, or tooling have been updated and the application needs to be verified for unintended side effects.
When asked to regression test or smoke test -- The user explicitly requests a regression test, smoke test, or browser-based verification pass.
Visual quality check -- The user wants an expert-level aesthetic review of spacing, typography, color consistency, alignment, and responsiveness.

Checklist

Use this checklist to track progress through the four phases:

Phase 1: Discovery -- Detect test frameworks, find test files, identify routes and application URL
Phase 2: Existing Test Execution -- Run any existing automated test suites and capture results
Phase 3: Browser-Based Testing -- Authenticate, perform functional checks, and conduct visual evaluation for every page at every viewport
Phase 4: Reporting -- Generate a comprehensive markdown report with screenshots, findings, and recommendations

Process Flow

The following Graphviz diagram illustrates the full regression testing workflow:

digraph regression_test {
    rankdir=TB;
    node [shape=box, style=rounded, fontname="Helvetica"];
    edge [fontname="Helvetica", fontsize=10];

    discovery [label="Phase 1:\nDiscovery"];
    has_tests [label="Has existing\ntests?", shape=diamond];
    run_tests [label="Phase 2:\nRun Existing Tests"];
    ask_url [label="Ask user\nfor URL"];
    navigate [label="Phase 3a:\nNavigate to URL"];
    auth_check [label="Login form\ndetected?", shape=diamond];
    credentials [label="Ask credentials\n& login"];
    functional [label="Phase 3b:\nFunctional Checks"];
    visual [label="Phase 3c:\nVisual Evaluation"];
    more_pages [label="More pages\nto test?", shape=diamond];
    report [label="Phase 4:\nGenerate Report"];
    summary [label="Conversation\nSummary"];

    discovery -> has_tests;
    has_tests -> run_tests [label="Yes"];
    has_tests -> ask_url [label="No"];
    run_tests -> ask_url;
    ask_url -> navigate;
    navigate -> auth_check;
    auth_check -> credentials [label="Yes"];
    auth_check -> functional [label="No"];
    credentials -> functional;
    functional -> visual;
    visual -> more_pages;
    more_pages -> functional [label="Yes"];
    more_pages -> report [label="No"];
    report -> summary;
}

Phase 1: Discovery

The discovery phase gathers information about the project's testing infrastructure, application structure, and how to access the running application. This phase is entirely local and does not require a browser.

Recent-report dedup check

Before running discovery, check for an existing recent report:

Glob docs/regression-report-*.md — list any prior reports.
If a report from within the last 30 minutes exists (compare the timestamp in the filename regression-report-YYYY-MM-DD-HHmm.md to current time), present the user with a choice:

"A recent regression report exists at docs/regression-report-YYYY-MM-DD-HHmm.md (NN minutes old). Re-run anyway, or use the existing report?"
- If the user chooses to re-use, exit regression-test immediately and let the caller (audit-milestone, ui-review, pre-push-review) consume the existing report.
- If the user chooses to re-run, proceed with discovery as normal.
If no recent report exists OR the prior report is older than 30 minutes, proceed without prompting.

This dedup is important when regression-test is invoked transitively — audit-milestone, ui-workflow ui-review, and pre-push-review all call regression-test, and a milestone closeout can plausibly trigger three runs back-to-back. Without dedup, that's three full screenshot sweeps for unchanged code. The 30-minute window is short enough that real changes between calls won't be hidden, and long enough that closeout chains won't re-run unnecessarily.

Detect Test Frameworks

Search for test framework configuration files using the following globs:

**/jest.config.* , **/jest.setup.* -- Jest
**/vitest.config.* -- Vitest
**/playwright.config.* -- Playwright
**/cypress.config.* , **/cypress.json -- Cypress
**/.mocharc.* , **/mocha.opts -- Mocha
**/karma.conf.* -- Karma
**/nightwatch.conf.* -- Nightwatch
**/wdio.conf.* -- WebdriverIO

For detailed framework detection logic, patterns, and runner commands, see test-framework-detection.md.

Find Test Files

Search for test files using common naming patterns:

**/*.test.{js,ts,jsx,tsx}
**/*.spec.{js,ts,jsx,tsx}
**/__tests__/**/*.{js,ts,jsx,tsx}
**/e2e/**/*.{js,ts,jsx,tsx}
**/tests/**/*.{js,ts,jsx,tsx}

Read package.json Scripts

Examine package.json for test-related scripts. Look for keys containing test, e2e, spec, cypress, playwright, or jest. These are the preferred way to run tests because they include project-specific configuration.

Grep for Routes

Search the codebase for route definitions to build a list of pages to test:

React Router: <Route path=, createBrowserRouter, useRoutes
Next.js: files in pages/ or app/ directories
Vue Router: routes: arrays, path: properties
Angular: RouterModule.forRoot, loadChildren
Generic: app.get(, router.get(, @GetMapping, @RequestMapping

Determine Application URL

Check for the application URL in this order:

Ask the user if they have a running instance
Look for dev server configuration in package.json scripts (dev, start, serve)
Check for .env files with PORT, HOST, or URL variables
Default assumption: http://localhost:3000

Phase 2: Existing Test Execution

If existing test suites were discovered in Phase 1, run them before proceeding to browser-based testing. Existing tests catch known regressions quickly and cheaply.

Execution Strategy

Prefer package.json scripts -- Always use npm test, npm run e2e, or similar scripts rather than invoking test runners directly. These scripts include the correct configuration, environment variables, and flags.
Include reporter flags for better output parsing:
- Jest: --verbose --no-coverage (readable output, skip coverage to save time)
- Vitest: --reporter=verbose
- Playwright: --reporter=list
- Cypress: --reporter spec
- Mocha: --reporter spec
Capture all output -- Record exit code, stdout, and stderr for every test run. Parse the output to extract:
- Total number of tests
- Number of passed tests
- Number of failed tests
- Number of skipped tests
- Names of failing tests and their error messages
Record pass/fail/skip counts -- Store these for inclusion in the final report.
Continue on failure -- If tests fail, record the failures but continue to Phase 3. Failed automated tests do not block browser-based testing; both sources of information are valuable.
Skip if none found -- If no test frameworks or test files were discovered in Phase 1, skip this phase entirely and proceed to Phase 3. Note in the report that no existing tests were found.

Phase 3a: Setup and Authentication

Before testing pages, establish a browser session and handle any authentication requirements.

Steps

Ask the user for the application URL if not already determined in Phase 1. Confirm the URL is accessible.
Navigate to the application using browser_navigate with the provided URL.
Detect login requirements by calling browser_snapshot and examining the page content. Look for indicators such as:
- Login, sign-in, or authentication forms
- Username/email and password fields
- Redirects to /login, /signin, /auth paths
- "Please log in" or similar messages
Ask the user for credentials if a login form is detected. Never guess or hardcode credentials. Prompt clearly:

"I detected a login form. Please provide the username/email and password to proceed with testing."
Authenticate using browser_fill_form to enter the credentials into the detected form fields, then browser_click to submit the login form.
Verify authentication by calling browser_wait_for to confirm the login succeeded. Wait for a post-login indicator such as a dashboard heading, navigation menu, user avatar, or the absence of the login form.

Phase 3b: Functional Checks

Perform functional checks on every identified page. These checks verify that pages load correctly, function properly, and are free from errors.

Per-Page Procedure

For each page in the route list:

Navigate to the page using browser_navigate.
Wait for content using browser_wait_for to ensure the page has fully loaded. Wait for a key content element such as a heading, main content area, or data table.
Capture page structure using browser_snapshot to obtain the accessibility tree. Review the snapshot for:
- Missing or empty headings
- Broken navigation links
- Missing form labels
- Empty content areas
- Incorrect page titles
Check for console errors using browser_console_messages with level "error". Record any JavaScript errors, failed resource loads, or runtime exceptions. Console errors are significant findings.
Check network requests using browser_network_requests to identify:
- Failed API calls (4xx, 5xx status codes)
- CORS errors
- Missing resources (404s)
- Slow responses
Test interactive elements where applicable. For forms, use browser_fill_form with test data to verify fields accept input. For buttons and links, use browser_click to verify they respond. For dropdowns and menus, verify they open and close properly.
Verify key assertions using the Playwright MCP testing tools:
- browser_verify_text_visible -- Confirm expected text is displayed on the page
- browser_verify_element_visible -- Confirm key UI elements are present and visible
- browser_verify_value -- Confirm form fields and inputs contain expected values
- browser_verify_list_visible -- Confirm lists (navigation, menus, data lists) render correctly
Generate locators for important elements using browser_generate_locator when you need stable selectors for test assertions or to reference elements across pages.

Record all findings for each page: errors, warnings, structural issues, and functional problems.

Phase 3c: Visual Evaluation

Visual evaluation is the most distinctive capability of this skill. For every page, capture screenshots at every viewport size and evaluate them for visual quality.

Viewport Definitions

Viewport	Width	Height	Represents
Desktop	1920	1080	Standard desktop/laptop monitor
Tablet	768	1024	iPad and similar tablets
Mobile	375	812	iPhone and similar smartphones

Per-Page, Per-Viewport Procedure

For each page, iterate through all three viewports:

Set the viewport using browser_resize with the appropriate width and height from the table above.
Capture a viewport screenshot using browser_take_screenshot with default settings (visible viewport area). This shows what the user sees immediately upon landing.
Capture a full-page screenshot using browser_take_screenshot with fullPage: true. This reveals the complete page content including below-the-fold areas.
Evaluate the screenshots by examining each captured image. Assess the following visual criteria:
- Layout -- Are elements properly aligned? Is the grid system working? Are there overlapping elements, broken columns, or unexpected gaps?
- Spacing -- Is padding and margin consistent? Are elements too cramped or too spread out? Does whitespace feel balanced?
- Typography -- Are font sizes appropriate for the viewport? Is text readable? Are headings properly sized relative to body text? Is there any text overflow or truncation?
- Color -- Is the color scheme consistent? Are there sufficient contrast ratios? Do interactive elements have visible hover/focus states?
- Responsiveness -- Does the layout adapt properly to the viewport? Are images scaled correctly? Does horizontal scrolling occur where it should not? Are touch targets large enough on mobile?
- Completeness -- Is all expected content visible? Are images loading? Are icons rendering? Are there any placeholder or missing elements?
- Polish -- Does the page feel professionally finished? Are there rough edges, misaligned elements, inconsistent borders, or visual artifacts?

For the complete rubric and scoring criteria, see visual-criteria.md.

Screenshot Storage

All screenshots must be saved to a timestamped directory using the following convention:

docs/regression-screenshots/YYYY-MM-DD-HHmm/{page}-{viewport}.png

Where:

YYYY-MM-DD-HHmm is the current date and time (e.g., 2026-03-01-1430)
{page} is a slugified version of the page name or route (e.g., home, dashboard, settings-profile)
{viewport} is the viewport name in lowercase (e.g., desktop, tablet, mobile)

Examples:

docs/regression-screenshots/2026-03-01-1430/home-desktop.png
docs/regression-screenshots/2026-03-01-1430/home-tablet.png
docs/regression-screenshots/2026-03-01-1430/home-mobile.png
docs/regression-screenshots/2026-03-01-1430/dashboard-desktop.png

Full-page screenshots append -full to the viewport name:

docs/regression-screenshots/2026-03-01-1430/home-desktop-full.png

Phase 4: Reporting

After all pages have been tested at all viewports, generate a comprehensive markdown report.

Report Location

Save the report to:

docs/regression-report-YYYY-MM-DD-HHmm.md

Where YYYY-MM-DD-HHmm matches the timestamp used for screenshots.

Report Structure

The report must include the following sections:

Summary Table

Metric	Value
Date	YYYY-MM-DD HH:mm
Application URL	(url)
Pages Tested	(count)
Viewports Tested	(count)
Existing Tests Passed	(count)
Existing Tests Failed	(count)
Console Errors Found	(count)
Network Errors Found	(count)
Visual Issues Found	(count)
Overall Status	PASS / FAIL / WARN

Existing Test Results

If existing tests were run in Phase 2, include:

Framework name and version
Command used to run tests
Pass/fail/skip counts
List of failing tests with error messages
Full output in a collapsed details block

Page-by-Page Results

For each page tested, include:

Page name and URL
Functional check results (console errors, network errors, structural issues)
Screenshots at each viewport (embedded as markdown images)
Visual evaluation notes per viewport
Severity rating: Critical, Major, Minor, or Pass

Recommendations

A prioritized list of issues found during testing, ordered by severity:

Critical -- Broken functionality, missing content, JavaScript errors blocking interaction
Major -- Layout breakage at specific viewports, failed network requests, accessibility issues
Minor -- Spacing inconsistencies, minor visual imperfections, non-blocking warnings
Suggestions -- Polish improvements, best-practice recommendations, enhancement ideas

Conversation Summary

After generating the report, provide a concise summary directly in the conversation. This summary gives the user an immediate understanding of the results without opening the report file.

Include the following in the conversation summary:

Overall status -- A single-word verdict: PASS, FAIL, or WARN
Issue counts -- Number of critical, major, and minor issues found
Top 3 findings -- The three most important things the user should know about, briefly described
Report path -- The full file path to the generated markdown report so the user can review the detailed findings

Red Flags

These are mistakes that compromise the quality of a regression test. If you notice yourself doing any of these, stop and correct course:

Skipping pages -- Every discovered route must be tested. Do not skip pages because they "look similar" or "probably haven't changed." Each page can have unique layout, data, and rendering behavior.
Not checking all viewports -- Every page must be tested at all three viewport sizes (Desktop, Tablet, Mobile). Desktop-only testing misses the majority of responsive layout bugs.
Ignoring console errors -- Console errors are always significant. Do not dismiss them as "just warnings" or "not related to UI." Every error must be recorded and reported.
Not asking for credentials when authentication is detected -- If the browser snapshot shows a login form, you must ask the user for credentials. Do not attempt to bypass authentication or skip authenticated pages.
Rushing visual evaluation -- Each screenshot must be examined carefully. Do not generate a screenshot and immediately mark the page as "looks fine." Evaluate every criterion: layout, spacing, typography, color, responsiveness, completeness, and polish.
Generating a report without visiting pages -- The report must be based on actual browser visits and screenshots. Never generate a report from assumptions, cached data, or code analysis alone. Every finding must come from a real browser session.

Common Rationalizations

These are excuses that sound reasonable but lead to incomplete testing. The correct response to each is provided.

Rationalization	Why It's Wrong	Correct Action
"The homepage looks fine, that's enough"	Internal pages often have different layouts, components, and data dependencies that break independently	Test every discovered route
"Desktop viewport is sufficient"	Over 50% of web traffic is mobile; responsive breakpoints are a common source of layout bugs	Test all three viewports for every page
"Those are just warnings, not errors"	Warnings often indicate deprecations, performance issues, or impending failures that affect user experience	Record all console messages at error level and report them
"The automated tests passed, so we can skip browser testing"	Automated tests verify specific behaviors but do not assess visual quality, layout, or aesthetic polish	Always perform browser-based visual evaluation regardless of automated test results
"The app needs to be running first, so I can't test"	Ask the user to start the application or help them start it; do not silently skip browser testing	Ask the user for a running URL or help start the dev server
"There are too many pages to test them all"	Thoroughness is the core value; skipping pages is how bugs ship to production	Test every page; if truly excessive, confirm a subset with the user first
"Taking screenshots at every viewport makes the process too long"	Screenshots are the evidence that proves testing was done; they are the most valuable artifact of the process	Capture every screenshot; the time investment is worth the confidence gained

Quick Reference

Use this table for a fast reminder of what each phase involves and which MCP tools are needed.

Phase	Key Actions	MCP Tools Used
Phase 1: Discovery	Detect frameworks, find tests, grep routes, determine URL	(none -- uses file search and grep)
Phase 2: Existing Tests	Run test suites, capture output, record results	(none -- uses shell commands)
Phase 3a: Setup & Auth	Navigate to URL, detect login, enter credentials	`browser_navigate`, `browser_snapshot`, `browser_fill_form`, `browser_click`, `browser_wait_for`
Phase 3b: Functional Checks	Load pages, check structure, find errors, verify assertions	`browser_navigate`, `browser_wait_for`, `browser_snapshot`, `browser_console_messages`, `browser_network_requests`, `browser_fill_form`, `browser_click`, `browser_verify_text_visible`, `browser_verify_element_visible`, `browser_verify_value`, `browser_verify_list_visible`, `browser_generate_locator`
Phase 3c: Visual Evaluation	Resize viewport, capture screenshots, evaluate visuals	`browser_resize`, `browser_take_screenshot`
Phase 4: Reporting	Generate markdown report with findings and screenshots	(none -- writes markdown file)

Relationship to Superpowers Skills

This skill is designed to complement — not replace — the superpowers workflow skills. Here is how they fit together:

Superpowers Skill	Relationship	Notes
`verification-before-completion`	This skill provides evidence. Verification-before-completion requires running verification commands and confirming output before success claims. A completed regression test report with screenshots is strong verification evidence.	The regression report satisfies the evidence requirement for visual and functional verification.
`finishing-a-development-branch`	Run before finishing. Regression testing is a quality gate that should complete before deciding how to integrate the branch.	Provides confidence that the branch hasn't introduced visual or functional regressions.
`pre-push-review`	Can be invoked by pre-push-review. During Phase 5 of pre-push-review, this skill is optionally invoked for browser-based regression testing if a web UI is available.	This skill can also run standalone outside of pre-push-review.

Supporting References

The following companion documents provide detailed criteria and detection logic referenced throughout this skill:

visual-criteria.md -- Detailed rubric for visual evaluation scoring, including layout, spacing, typography, color, responsiveness, completeness, and polish criteria with severity classifications.
test-framework-detection.md -- Comprehensive test framework detection patterns, configuration file locations, runner commands, and reporter flag reference for all supported frameworks.

regression-test

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

regression-test

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Regression Test Skill

Prerequisites

Overview

Announce Line

When to Use

Checklist

Process Flow

Phase 1: Discovery

Recent-report dedup check

Detect Test Frameworks

Find Test Files

Read package.json Scripts

Grep for Routes

Determine Application URL

Phase 2: Existing Test Execution

Execution Strategy

Phase 3a: Setup and Authentication

Steps

Phase 3b: Functional Checks

Per-Page Procedure

Phase 3c: Visual Evaluation

Viewport Definitions

Per-Page, Per-Viewport Procedure

Screenshot Storage

Phase 4: Reporting

Report Location

Report Structure

Summary Table

Existing Test Results

Page-by-Page Results

Recommendations

Conversation Summary

Red Flags

Common Rationalizations

Quick Reference

Relationship to Superpowers Skills

Supporting References

Similar Skills

Regression Test Skill

Prerequisites

Overview

Announce Line

When to Use

Checklist

Process Flow

Phase 1: Discovery

Recent-report dedup check

Detect Test Frameworks

Find Test Files

Read package.json Scripts

Grep for Routes

Determine Application URL

Phase 2: Existing Test Execution

Execution Strategy

Phase 3a: Setup and Authentication

Steps

Phase 3b: Functional Checks

Per-Page Procedure

Phase 3c: Visual Evaluation

Viewport Definitions

Per-Page, Per-Viewport Procedure

Screenshot Storage

Phase 4: Reporting

Report Location

Report Structure

Summary Table

Existing Test Results

Page-by-Page Results