Help us improve
Share bugs, ideas, or general feedback.
From mthines-agent-skills
Analyzes Playwright trace.zip archives to diagnose flaky tests, slow selectors, network bottlenecks, and hung actions with evidence-ranked fixes.
npx claudepluginhub mthines/agent-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/mthines-agent-skills:playwright-trace-analyzerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn a Playwright `trace.zip` into a ranked, evidence-backed report of
references/flake-patterns.mdreferences/performance-patterns.mdrules/action-timing.mdrules/confidence-loop.mdrules/console-and-errors.mdrules/flake-diagnosis.mdrules/input-detection.mdrules/measurement-methodology.mdrules/network-analysis.mdscripts/fetch-gh-run.mjsscripts/trace-diff.mjsscripts/trace-extract.mjsscripts/trace-summary.mjstemplates/analysis-report.mdGuides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Turn a Playwright trace.zip into a ranked, evidence-backed report of
flakes, slow steps, and root causes.
Index file. Detailed extraction rules, analysis playbooks, and report templates live under
rules/,references/, andtemplates/. Load only what the current phase needs — the body ofSKILL.mdis a thin orchestrator.
The user passes one or more of:
| Input | Detection signal |
|---|---|
| GitHub Actions run URL | Matches https://github.com/<owner>/<repo>/actions/runs/<id> — fetch artifacts via gh run download |
trace.zip archive | Magic bytes 50 4b 03 04; entries include trace.trace, trace.network, *.png, resources/ |
| Unpacked trace directory | Contains trace.trace + trace.network (NDJSON) and a resources/ subdir |
Single trace.trace JSONL stream | NDJSON; each line has type, callId, startTime, params (e.g. before, action, after) |
Single trace.network JSONL stream | NDJSON; entries with type: "resource-snapshot" or requestEvent / responseEvent |
report.json (Playwright reporter) | Top-level config, suites, stats; complementary, never authoritative for timing |
If the user passes a report.json plus a trace.zip, treat the report as
a high-level test status map and the trace as the source of truth for
timing and network data.
If a test-results/ directory is passed, scan for the most recent
trace.zip per failed test and process them in order of failure recency.
See rules/input-detection.md for the
precise detection logic and unpack recipe.
Six phases. Do not skip a gate.
| Phase | Name | Rule file | Gate |
|---|---|---|---|
| 0 | Intake | rules/input-detection.md | Format detected, archive unpacked, trace.trace + trace.network parseable |
| 1 | Measurement frame | rules/measurement-methodology.md | Failure mode named (timeout, assertion, error, slow-but-passing) and primary metric chosen (action ms, total wall-clock, request count) |
| 2 | Hotspot extraction | rules/action-timing.md, rules/network-analysis.md, rules/console-and-errors.md | Top-N slow actions, top-N slow requests, error/console list — all with concrete numbers |
| 3 | Root-cause | rules/flake-diagnosis.md | Each hotspot mapped to a code-level cause (selector, locator, network call, app event) with file path or line where possible |
| 4 | Confidence gate | rules/confidence-loop.md | /confidence analysis ≥ 90% — else iterate (max 2 deep-dives) |
| 5 | Fix plan | templates/analysis-report.md | Report written with ranked fixes, expected impact, and verification plan |
Load on demand — do not preload.
Pass-vs-fail comparison (when given two traces of the same test):
scripts/trace-diff.mjs.
After the first pass at root-cause analysis, invoke the confidence skill
in analysis mode:
Skill(skill="confidence", args="analysis")
Apply this gate:
| Score | Action |
|---|---|
| ≥ 90% | Proceed to Phase 5 (fix plan). |
| 70–89% | Run one deeper pass: re-read the trace, expand the action's before/after snapshots, correlate with network. |
| < 70% | Surface the gap to the user with a question — do not propose changes on speculation. |
After two deep-dive iterations without reaching 90%, stop and present
findings as a hypothesis with the evidence required to confirm it. See
rules/confidence-loop.md.
page.click('text=Save')
waited 4,820ms across 3 attempts before the button became actionable"
is.before and after DOM snapshots and every poll attempt — read them.beforeEach of 40 tests./confidence returns < 90%, dig
deeper or admit uncertainty. Do not paper over a weak diagnosis with a
confident-sounding fix.page.waitForTimeout(N) without measuring the underlying
race condition.text= for getByRole(...) without checking whether the
failure was selector resolution or actionability (visibility,
pointer-events, animation).auto-wait
delays (death-by-a-thousand-cuts is the common case in real suites).node <skill_dir>/scripts/fetch-gh-run.mjs https://github.com/<owner>/<repo>/actions/runs/<id> [--out <dir>]
The script uses the gh CLI (gh run download) to fetch every artifact
whose name matches Playwright conventions (playwright-report*,
playwright-traces*, test-results*, *-traces, *-trace), unpacks
nested ZIPs, and writes a manifest of all trace.zip files discovered,
grouped by failed test where possible. Then continue with the unpacked
flow below.
If gh is not installed or unauthenticated, ask the user to download the trace artifact manually from the Actions run page and provide the local path — the trace.zip flow below is unaffected.
trace.zipUnpack and index.
node <skill_dir>/scripts/trace-extract.mjs <path/to/trace.zip> [--out <dir>]
Writes a normalised trace.trace.jsonl, trace.network.jsonl, and a
manifest of resources/snapshots into <dir> (defaults to a sibling
<name>.unpacked/).
Run the summary.
node <skill_dir>/scripts/trace-summary.mjs <dir>
Prints: total wall-clock, top-N slow actions, top-N slow requests, console errors, page errors, and the failing-action stack trace if present.
(Optional) Diff a passing trace against a failing trace.
node <skill_dir>/scripts/trace-diff.mjs <pass-dir> <fail-dir>
Surfaces actions that diverge in duration, requests present in one but not the other, and the first action where the two timelines fork.
Map suspects to source. Use
rules/flake-diagnosis.md Phases 3–4 to
go from action callId → test file/line (Playwright trace events embed
location: { file, line, column }).
The full extraction methodology (capture protocol, how to interpret the
network log, common flake shapes) is in
rules/flake-diagnosis.md. Don't preload
it — only when an input is detected.
dur (ms).responseEnd - requestStart (ms)
and status.location in the
trace event), or to an app file when the cause is in product code./confidence analysis reached ≥ 90% (or two deep-dives
recorded with the remaining uncertainty surfaced to the user).templates/analysis-report.md,
with ranked fixes, expected ms saved, and a re-run verification
step.--trace=on, compare).