From arn-spark
This skill should be used when the user says "visual strategy", "arn visual strategy", "visual testing", "visual regression", "screenshot testing", "compare to prototype", "visual validation", "how do I test visuals", "set up visual tests", "baseline images", "screenshot comparison", "pixel diff", "visual diff", "does it match the prototype", or wants to set up visual regression testing for development — creating capture scripts, comparison scripts, and baseline images so that feature implementations are automatically compared against prototype screenshots to catch visual regressions during development.
npx claudepluginhub appsvortex/arness --plugin arn-sparkThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
Set up automated visual regression testing so that during feature development, each implemented screen can be compared pixel-by-pixel against the approved prototype screenshots. The prototype screenshots serve as baseline images — the "gold standard" of what the UI should look like. As features are built, capture scripts take screenshots of the development build and comparison scripts diff them against these baselines, catching layout breaks, color mismatches, and misplaced elements before they reach the user.
This is a conversational skill that runs in normal conversation (NOT plan mode). It uses the arn-spark-visual-test-engineer agent for proof-of-concept validation and script generation.
The primary artifacts are:
/arn-code-execute-plan and /arn-code-execute-task automatically validate UI changesThe core problem this solves: during feature development, visual regressions (button on the wrong side, layout breaks, color mismatches) go undetected until the user manually inspects the application. This skill sets up the tooling so that every UI task automatically compares the development build against the prototype.
Read the project's CLAUDE.md for a ## Arness section. If no ## Arness section exists or Arness Spark fields are missing, inform the user: "Arness Spark is not configured for this project yet. Run /arn-brainstorming to get started — it will set everything up automatically." Do not proceed without it.
Extract:
.arness/vision).arness/prototypes).arness/spikes) -- for mini-spike workspacesCheck for prototype lock (strongly recommended):
### Prototype Lock in the ## Arness sectionLOCKED.md manifest/arn-spark-prototype-lock first."Check for prototype validation evidence:
[prototypes-dir]/clickable/final-report.md[prototypes-dir]/static/final-report.md[prototypes-dir]/clickable/v[N]/showcase/screens/, journey screenshots, static showcase screenshotsCheck for architecture vision (required for stack analysis):
architecture-vision.md for framework, application type, platform targetsCheck for dev-setup document (for environment constraints):
dev-setup.md for development environment type, platforms, and CI configurationIf no architecture vision: "No architecture vision found. Describe your technology stack and target platforms so I can design a visual testing strategy."
If no prototype screenshots: "No prototype screenshots found. Visual testing needs reference images. Either run the prototype skills first, or provide screenshots manually."
Load context from architecture vision, dev-setup, and the current environment. Detect the current OS via uname. Build a constraints profile:
"Here is what I understand about your stack and environment:
Application type: [Browser app / Tauri desktop / Electron desktop / etc.] UI framework: [SvelteKit / React / Vue / etc.] Rendering context: [Browser viewport / Webview in native frame / Native window with transparency / etc.] Platform targets: [Linux, macOS, Windows] Development environment: [Native / WSL2 / Dev container / etc.] Current OS: [detected via uname]
Key constraints:
Is this accurate? Anything to add or correct?"
Wait for user confirmation.
Based on the constraints profile, propose a multi-layer strategy. Read the strategy layers guide:
Read
${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/strategy-layers-guide.md
Match the project's application type against the Layer Decision Matrix to determine the recommended layers. Present the layered approach:
"Based on your stack, here is a layered visual testing strategy:
Layer 1: [Name] (Recommended first)
Layer 2: [Name] (Optional, fills gaps)
[Layer 3 if applicable]
My recommendation: Start with Layer 1. It catches 80-90% of visual regressions (layout, component rendering, color, typography) with minimal infrastructure. Add Layer 2 when you need to validate native integration or transparency.
Ask (using AskUserQuestion):
"Which layers do you want to set up?"
Options (based on the layers presented above, e.g.):
Journey interaction detection (Layer 2 only):
If Layer 2 is selected, check whether journey interaction testing is appropriate:
If any of these conditions apply, ask the user:
Inform the user: "Layer 2 can capture static screenshots (current behavior) or walk through the app like a user using journey-based interaction testing (via UI automation APIs). Journey mode captures screenshots at each step of a user flow — login, navigation, form submission, etc."
Ask (using AskUserQuestion):
"Do you want to enable journey interaction for Layer 2?"
Options:
Record the user's choice:
Interaction: journey for Layer 2Interaction: static for Layer 2static and note that they can upgrade later via arn-spark-visual-readinessFor each selected layer that cannot be validated in the current environment (e.g., native capture from WSL2, CI-only capture):
"Layer [N] ([Name]) cannot be validated in this environment. When should it be activated?
Examples:
Activation criteria for Layer [N]:"
Wait for user input. Record the activation criteria text for each deferred layer.
Map prototype screenshots to visual test baselines. Scan for available screenshots:
[prototypes-dir]/clickable/v[N]/showcase/screens/[prototypes-dir]/clickable/v[N]/journeys/[prototypes-dir]/static/v[M]/showcase/[prototypes-dir]/locked/clickable-v[N]/If using locked prototype, prefer those screenshots as they are guaranteed stable.
Present the baseline sources:
"I found these prototype screenshots that can serve as baselines:
From clickable prototype showcase (v[N]):
| Screen | Route | Screenshot | Baseline category |
|---|---|---|---|
| [Name] | [/route] | [path] | Layout reference |
| ... | ... | ... | ... |
From journey captures (v[N]):
| Journey | Step | Screenshot | Baseline category |
|---|---|---|---|
| [Name] | [step] | [path] | Flow reference |
| ... | ... | ... | ... |
From static prototype showcase (v[M]):
| Section | Component | Screenshot | Baseline category |
|---|---|---|---|
| [Name] | [variants] | [path] | Component reference |
| ... | ... | ... | ... |
Ask (using AskUserQuestion):
"How should I organize the baselines?"
Options:
For each selected layer, run a proof-of-concept. This follows the arn-spark-spike pattern: one mini-spike per layer.
IMPORTANT: Run spikes sequentially, one at a time. Do NOT launch multiple visual-test-engineer agents in parallel or in the background. The agent needs Bash and Write tool access, which requires user permission approval. Parallel or background agents cannot surface permission prompts to the user, causing all tool calls to be denied.
"I will validate each testing layer with a mini-spike to confirm the capture and comparison tooling works in your environment:
Spike: Layer 1 -- [Layer Name]
The spike validates the tooling, not the prototype. The prototype is just a convenient target because it already has known-good screenshots to compare against.
Ready to proceed?"
For each layer:
Read the spike checklist:
Read
${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/spike-checklist.md
Read the capture script template:
Read
${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/baseline-capture-script-template.js
Determine the spike workspace: [spikes-dir]/visual-strategy-spike-layer-[N]/
Invoke the arn-spark-visual-test-engineer agent (foreground, not background) with:
Additional context for journey interaction spikes:
If the layer has Interaction: journey, provide additional context to the arn-spark-visual-test-engineer agent:
${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/journey-schema.mdThe spike must validate journey readiness in addition to the standard Layer 2 spike criteria. See spike-checklist.md in this directory for the "Layer 2 -- Journey Interaction" checklist.
Wait for agent to complete fully before proceeding.
Present results and record layer status:
status: active, validated: [date]status: active, validated: [date], caveats: [list]status: failed -- ask user whether to retry, adjust approach, or drop layer[workspace] for manual validation on the target OS."
Record: status: deferred, activation_criteria: [from Step 2], deferred_reason: [evidence from spike]Proceed to the next layer only after presenting results.
After validating layers, invoke arn-spark-visual-test-engineer again (foreground) with:
The agent produces:
scripts/visual-test-capture.mjs) -- Playwright script that navigates to each screen of the development build (via the dev server), captures screenshots at specified viewport sizes, saves to visual-tests/captures/. This runs against the dev build, NOT the prototype.scripts/visual-test-compare.mjs) -- loads development captures and prototype baselines, computes pixel diff, generates a diff report with highlighted differences, reports pass/fail per screenscripts/visual-test-baseline.mjs) -- copies prototype screenshots to baseline directory with proper naming (one-time setup)scripts/visual-test-cross-env.sh) -- WSL2-to-Windows build+capture+copy pipeline, or equivalentPresent the scripts to the user before writing to the project.
Invoke arn-spark-visual-test-engineer (foreground) with:
visual-tests/baselines/)The agent:
screen-name--light.png, screen-name--dark.png)baseline-manifest.json mapping screen names to baseline file pathsCheck if the project uses Git (from ## Arness config or by checking for .git/). If not configured, skip this step silently.
If Git is configured:
.gitignore and check which paths are already covered"The visual testing strategy created these paths. Which should be excluded from Git?
| Path | Type | Recommendation |
|---|---|---|
| [path] | [ephemeral / shared] | [ignore / track] |
| ... | ... | ... |
Ephemeral paths (captures, diffs, spike workspaces) are regenerated on every run and are typically ignored. Shared paths (baselines, scripts, manifests) are reference artifacts the team needs and are typically tracked.
Want to proceed with these recommendations, or adjust?"
.gitignore under a # Visual testing comment blockPresent integration options:
Ask (using AskUserQuestion):
"How should visual tests integrate with your workflow?"
Options:
node scripts/visual-test-capture.mjs && node scripts/visual-test-compare.mjs when you want to checkvisual-test to package.json scripts (Recommended)/arn-code-execute-task completion verificationBased on user choice, configure the integration:
"visual-test:capture", "visual-test:compare", and "visual-test" (runs both) to package.json scripts/arn-spark-dev-setup first.Read the template:
Read
${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/visual-strategy-template.md
Populate the template with all collected information:
Write to [vision-dir]/visual-strategy.md.
Add or update a ### Visual Testing subsection in the ## Arness section of the project's CLAUDE.md:
### Visual Testing
- **Strategy doc:** [vision-dir]/visual-strategy.md
- **Baseline directory:** visual-tests/baselines/
- **Screen manifest:** visual-tests/screen-manifest.json
- **Capture script:** scripts/visual-test-capture.mjs
- **Compare script:** scripts/visual-test-compare.mjs
- **Layers:** [Layer 1 name, Layer 2 name, ...]
- **Diff threshold:** [N]% (pixel difference tolerance)
- **Integration:** [manual / npm-script / ci / arness-pipeline]
During feature development, the capture script runs against the development build (dev server) and the comparison script diffs those captures against the prototype baselines. This catches visual regressions as features are implemented. To update baselines after intentional design changes, run `[baseline update command]`.
[For each additional layer with status other than Layer 1:]
#### Layer [N]: [Name]
- **Status:** [active / deferred]
- **Capture script:** [path]
- **Compare script:** [path]
- **Baseline directory:** [path]
- **Diff threshold:** [N]%
- **Requires dev server:** [yes / no]
- **Activation criteria:** [free-text condition, or "N/A" if active]
- **Environment:** [required OS/platform]
- **Spike result:** [Validated: evidence / Deferred: reason]
- **Interaction:** static | journey
- **Journey manifest:** [path to journey-manifest.json] (only if Interaction: journey)
- **Journey runner:** [path to platform-specific runner script] (only if Interaction: journey)
Note: The
**Interaction:**field distinguishes between static screenshot capture (default) and journey-based interaction testing. When set tojourney, the**Journey manifest:**and**Journey runner:**fields specify the paths to the auto-generated journey definitions and platform runner script. These are generated byarn-spark-visual-test-engineerduring the spike or readiness activation. If the**Interaction:**field is absent,staticis assumed (backward compatible).
Top-level fields (no #### subsection) are implicitly Layer 1 and are always active. Skills that are not layer-aware continue reading top-level fields unchanged.
"Visual testing strategy configured.
Strategy: [layers summary] Baselines: [N] screen baselines + [M] journey baselines from prototype v[X] Scripts: [list with paths] Spike results: Layer 1: [result]. Layer 2: [result].
Files created/updated:
CLAUDE.md updated with ### Visual Testing configuration.
Recommended next steps:
/arn-spark-feature-extract to build the backlog/arn-planning to begin the development pipeline. Arness auto-configures on first use.[If any layers were deferred:]
Deferred layers: [layer names]. These layers were configured but could not be validated in the current environment. After the first feature is implemented and the application builds on the target platform, run /arn-spark-visual-readiness to activate them."
| Situation | Action |
|---|---|
| Validate a testing layer (Step 4) | Invoke arn-spark-visual-test-engineer sequentially (foreground, not background) with layer spec, stack, workspace, spike checklist, and capture script template. Wait for completion before starting the next spike. |
| Generate production scripts (Step 5) | Invoke arn-spark-visual-test-engineer with validated layer specs, full screen list, and project context. |
| Set up baselines (Step 6) | Invoke arn-spark-visual-test-engineer with screenshot paths, baseline directory, and naming convention. |
| Agent permission denied | Same as arn-spark-spike: re-run in foreground. If still denied, execute directly in conversation (write POC files and run capture commands yourself). |
| User asks about prototype quality | Reference the prototype lock manifest and judge reports. Do not re-run validation. |
| User asks about specific framework capture methods | Discuss and invoke arn-spark-tech-evaluator if deep comparison needed. |
| User asks about CI setup | Discuss briefly. If CI workflow exists, offer to add visual test step. If not, suggest running /arn-spark-dev-setup first. |
| User asks about feature implementation | Defer: "Feature implementation is handled by /arn-code-feature-spec and the Arness development pipeline." |
| Cross-environment spike deferred | Record the deferral with instructions. Create the scripts anyway. The user can run them manually on the target OS later. |
npm install -D @playwright/test). If the user declines, note that Layer 1 is unavailable and fall back to Layer 2 options or manual testing only.[path]. Do you want to replace it, update specific layers, or add a new layer?"