From posthog
Triages PostHog Visual Review runs gating PR merges with screenshot regressions from Storybook and Playwright. Assesses real vs flaky diffs, status, history, and backlog.
npx claudepluginhub anthropics/claude-plugins-official --plugin posthogThis skill uses the workspace's default tool permissions.
Visual Review is PostHog's screenshot-regression product: CI captures storybook + playwright screenshots,
Detects visual and UI regressions via screenshot comparison and pixel-diff analysis using Playwright or Puppeteer. Captures cross-browser/viewport screenshots, categorizes layout shifts and color changes, generates diff reports for CI/CD PR checks.
Accumulates screenshots, videos, logs in .artifacts/<feature=branch>/ for visual regression, E2E results, and PR documentation. Generates structured reports with proof before declaring tasks complete.
Implements visual regression testing with screenshot comparison, diff detection, and baseline management for UI components and pages to catch CSS regressions and layout shifts.
Share bugs, ideas, or general feedback.
Visual Review is PostHog's screenshot-regression product: CI captures storybook + playwright screenshots,
diffs them against committed baseline hashes, and gates the PR until a human approves the visible changes.
A PR with visual changes carries a visual-review GitHub status check that stays red until each diffed
snapshot is approved or tolerated in the VR UI.
This skill teaches an agent how to answer the questions a human reviewer would actually ask, by chaining
the read-only VR MCP tools — instead of reaching for gh pr view and tab-hopping to the VR web UI.
Trigger this skill on any of:
visual-review GitHub check or a PR comment from the posthog-bot mentioning visual review.When the user asks for the rendered diff image itself, the VR web UI is faster — direct them there. This skill is for everything around the diff: status, scope, history, triage.
All read-only. None of these require write scopes; approval/toleration still happens in the web UI.
| Tool | Purpose |
|---|---|
posthog:visual-review-runs-list | List runs, filter by pr_number / commit_sha / branch / review_state. Start here. |
posthog:visual-review-runs-retrieve | Full detail for a single run (status, summary counts, supersession). |
posthog:visual-review-runs-snapshots-list | Per-snapshot results inside a run: identifier, result, diff %, classification, baseline + current artifact URLs. |
posthog:visual-review-runs-snapshot-history-list | A single story's last N runs across master/PRs — the flake check. |
posthog:visual-review-runs-counts-retrieve | Aggregate counts for queue triage (how many runs in needs_review, etc.). |
posthog:visual-review-runs-tolerated-hashes-list | Hashes the team has explicitly accepted as "known flake / acceptable variation". |
posthog:visual-review-repos-list | Repos (one per GitHub repo) — usually only one matters; useful for filtering. |
posthog:visual-review-repos-retrieve | Repo metadata: baseline file paths, PR-comment configuration. |
These appear in tool output and matter for interpretation:
review_state: needs_review (open, awaiting human), clean (zero diffs), processing (CI still uploading),
stale (a newer run on the same PR has superseded this one — check superseded_by_id).run_type: storybook (component snapshots) or playwright (full-page e2e snapshots).result: unchanged, changed (real diff), new (no baseline yet), removed.classification_reason: tolerated_hash (matches a known-tolerated hash, no action needed),
below_threshold (under the noise floor), exact (byte-identical), "" (real diff requiring review).review_state: pending or approved.summary: total / changed / new / removed / unchanged / unresolved / tolerated_matched —
unresolved is what's actually blocking review.The single most common job. Map a PR number to its run state in two calls.
posthog:visual-review-runs-list { pr_number: <n>, limit: 5 } — sort by created_at desc, take the latest non-stale one.summary.changed > 0 or summary.unresolved > 0, drill in:
posthog:visual-review-runs-snapshots-list { id: <run_id> } and report the changed snapshots.Report back: PR number, run UUID, review_state, summary counts, and the _posthogUrl deep link so the
user can click straight to the diff viewer.
The most useful judgment a code-aware agent can add. Combine three signals: scope match, flake history, and the actual rendered images. The agent should look at the screenshots — not just describe metadata.
Scope check — git diff master...HEAD --stat (or against the PR's base branch) → list of touched paths.
Cross-reference with posthog:visual-review-runs-snapshots-list { id } filtered to result: changed → story identifiers.
Stories are namespaced like <area>-<scene>--<story>--<theme>; e.g. scenes-app-settings-user--settings-user-profile--dark
maps to frontend/src/scenes/settings/user/.... Use this to translate story id → likely source path.
Visual inspection — for each changed snapshot, the tool result contains current_artifact.download_url
and baseline_artifact.download_url. These are pre-signed S3 URLs to PNG files; pull them and look:
curl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>"
curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"
Then Read both files (the Read tool renders images visually) and compare. Things to call out:
width / height fields). Mismatched
dimensions usually mean the story rendered to a different viewport or didn't fully render before
screenshot — a flake signal, not a regression.Flake history — run the flake check below for any story that looks suspect.
Verdict — combine all three:
Always include a one-line description of what you saw in the images — the user uses this to decide whether to trust your verdict without opening the VR UI themselves.
Once you have a suspect snapshot identifier:
posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> } → returns prior outcomes for the same story.
Verdicts:
unchanged and this run's diff is the outlier → likely a real regression caused by this PR.changed across unrelated branches/master → flaky story; recommend tolerating the hash via the UI.removed or large-jump dimension change → baseline likely stale; recommend re-baselining on master.When the user is doing housekeeping rather than asking about a specific PR:
posthog:visual-review-runs-counts-retrieve → total queue size.posthog:visual-review-runs-list { review_state: needs_review, limit: 50 } (paginate if needed).branch author or run_type to surface clusters (e.g., "12 PRs blocked on the same shared
component change" usually means a single underlying root cause to address).summary.changed > 0 over runs that are only new — new means no baseline
yet, which is usually trivial to approve; changed is the real review work.For PR-status questions, lead with the verdict in one line, then 2-4 bullets of supporting context. Always
include the _posthogUrl deep link to the run — humans need to see the rendered images to make the call,
the agent can only describe the metadata.
For triage / aggregate questions, a short table beats prose. Group by what the user is going to act on.
_posthogUrl.visual-review check is red on
a PR you're working on, that's the trigger to run this skill.result: changed. Pull the baseline and current PNGs
and look at them; metadata can only say "something changed", not whether the change is intended.