Skill

flaky-test-debugger

Debugs and fixes flaky Playwright E2E tests using LLM reports from GitHub Actions and Datadog. Use for investigating intermittent failures, triaging flakiness, or stabilizing tests.

Playwright

GitHub Actions

testing

npx claudepluginhub clipboardhealth/core-utils --plugin core

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Work through these phases in order. Skip phases only when you already have the information they produce.

Supporting Assets

references/datadog-apm-traces.mdscripts/fetch-llm-report.sh

SKILL.md

Similar Skills

fix-e2e

Investigates and fixes failing Playwright E2E tests using captured action data, screenshots, DOM snapshots, network requests, and console output.

playwright-autopilot

fix

9.2k

Diagnose and fix failing or flaky Playwright tests using a taxonomy of timing/async, isolation, environment, and infrastructure issues.

1 file

flaky-test-detector

2.0k

Detects and troubleshoots flaky tests in Jest and pytest suites. Provides guidance, code, configs for unit/integration testing, mocking, and stability.

5 tools

jeremylongshore-claude-code-plugins-plus-skills

Stats

Parent Repo Stars34

Parent Repo Forks5

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Category	Signal	Timeline Pattern
Test-state leakage	Retries or earlier tests leave auth, cookies, storage, or server state behind	`attempts[]` — different outcomes across retries
Data collision	"Random" identities aren't unique enough and collide with existing users/entities	`errors[]` — duplicate key or conflict errors
Backend stale data	API returned 200 but response body shows old state	`step(action)` → `network(GET, 200)` → `step(assert) FAIL` — API succeeded but data was stale
Frontend cache stale	No network request after navigation/reload for the relevant endpoint	`step(reload)` → `step(assert) FAIL` — no intervening network call for expected endpoint
Silent network failure	CORS, DNS, or transport error prevented the request from completing	`step(action)` → `console(error: "net::ERR_FAILED")` → `step(assert) FAIL`
Render/hydration bug	API returned correct data but component didn't render it	`network(GET, 200, correct data)` → `step(assert) FAIL` — no console errors
Environment / infra	Transient 5xx, timeouts, DNS/network instability	`network` entries with 5xx status; `consoleMessages[]` with connection errors
Locator / UX drift	Selector is valid but brittle against small UI changes	`errors[]` — locator/selector text in error message

Score	Meaning	Criteria
5	Certain	Root cause is directly visible in artifacts (e.g., assertion diff shows stale data, network response confirms 5xx, screenshot shows error banner)
4	High confidence	Evidence strongly supports the diagnosis but one link in the chain is inferred rather than observed (e.g., timeline shows the right sequence but no Datadog trace to confirm backend behavior)
3	Moderate confidence	Evidence is consistent with the diagnosis but alternative explanations remain plausible. Flag the alternatives explicitly
2	Low confidence	Limited evidence, mostly reasoning from code patterns rather than observed artifacts. Recommend gathering more data before committing to a fix
1	Speculative	No direct evidence for the root cause. The fix is a best guess. Recommend reproducing the failure locally or adding instrumentation before proceeding

Category	Signal	Timeline Pattern
Test-state leakage	Retries or earlier tests leave auth, cookies, storage, or server state behind	`attempts[]` — different outcomes across retries
Data collision	"Random" identities aren't unique enough and collide with existing users/entities	`errors[]` — duplicate key or conflict errors
Backend stale data	API returned 200 but response body shows old state	`step(action)` → `network(GET, 200)` → `step(assert) FAIL` — API succeeded but data was stale
Frontend cache stale	No network request after navigation/reload for the relevant endpoint	`step(reload)` → `step(assert) FAIL` — no intervening network call for expected endpoint
Silent network failure	CORS, DNS, or transport error prevented the request from completing	`step(action)` → `console(error: "net::ERR_FAILED")` → `step(assert) FAIL`
Render/hydration bug	API returned correct data but component didn't render it	`network(GET, 200, correct data)` → `step(assert) FAIL` — no console errors
Environment / infra	Transient 5xx, timeouts, DNS/network instability	`network` entries with 5xx status; `consoleMessages[]` with connection errors
Locator / UX drift	Selector is valid but brittle against small UI changes	`errors[]` — locator/selector text in error message

Score	Meaning	Criteria
5	Certain	Root cause is directly visible in artifacts (e.g., assertion diff shows stale data, network response confirms 5xx, screenshot shows error banner)
4	High confidence	Evidence strongly supports the diagnosis but one link in the chain is inferred rather than observed (e.g., timeline shows the right sequence but no Datadog trace to confirm backend behavior)
3	Moderate confidence	Evidence is consistent with the diagnosis but alternative explanations remain plausible. Flag the alternatives explicitly
2	Low confidence	Limited evidence, mostly reasoning from code patterns rather than observed artifacts. Recommend gathering more data before committing to a fix
1	Speculative	No direct evidence for the root cause. The fix is a best guess. Recommend reproducing the failure locally or adding instrumentation before proceeding

flaky-test-debugger

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

flaky-test-debugger

Tool Access

Preview

Supporting Assets

SKILL.md

Phase 1: Triage Snapshot

Fetch the LLM Report

Phase 2: Quick Classification

Phase 3: Analyze LLM Report

3a: Walk the Timeline

3b: Compare pass vs fail (flaky tests)

3c: Identify failing tests

3d: Examine attempts for retry patterns

3e: Inspect network activity and extract trace IDs

3f: Review test steps

Phase 4: Evidence Standard

Confidence Score

Phase 5: Fix Decision Tree

Phase 6: Verification

Output Format

Similar Skills

Help us improve

Phase 1: Triage Snapshot

Fetch the LLM Report

Phase 2: Quick Classification

Phase 3: Analyze LLM Report

3a: Walk the Timeline

3b: Compare pass vs fail (flaky tests)

3c: Identify failing tests

3d: Examine attempts for retry patterns

3e: Inspect network activity and extract trace IDs

3f: Review test steps

Phase 4: Evidence Standard

Confidence Score

Phase 5: Fix Decision Tree

Phase 6: Verification

Output Format