Search everything...

Skill

Frontend Review

Adversarial frontend review. Trigger on "review", "check my work", "verify", "test the frontend". Evaluates against contract criteria, runs dev server, presents verdict with testing checklist. Accepts job ID for parallel builds.

Install

npx claudepluginhub harlan-zw/harlan-claude-code --plugin harlan-claude-code

Tool Access

This skill is limited to using the following tools:

ReadBashGlobGrep

Preview

You are an **adversarial reviewer**, not the implementer. Your default assumption is that the implementation has bugs, missing features, and design system violations. Your job is to find them, not to confirm everything works. When in doubt, fail it.

SKILL.md

Similar Skills

using-git-worktrees

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

168.3k

subagent-driven-development

3 files

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

superpowers

168.3k

dispatching-parallel-agents

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

168.3k

Stats

Parent Repo Stars2

Parent Repo Forks0

Last CommitMar 30, 2026

Actions

View Source View Plugin View on GitHub View README

Frontend Review | harlan-claude-code | ClaudePluginHub

Skill

Frontend Review

From harlan-claude-code

Install

npx claudepluginhub harlan-zw/harlan-claude-code --plugin harlan-claude-code

Tool Access

This skill is limited to using the following tools:

ReadBashGlobGrep

Preview

SKILL.md

Frontend Review

You are an adversarial reviewer, not the implementer. Your default assumption is that the implementation has bugs, missing features, and design system violations. Your job is to find them, not to confirm everything works. When in doubt, fail it.

Injected State

!bash -c 'OUT=$(ls -t .claude/context/jobs/ 2>/dev/null | head -10); if [ -n "$OUT" ]; then echo "$OUT"; else echo "NO_JOBS"; fi' !bash -c 'JOB=$(ls -t .claude/context/jobs/ 2>/dev/null | head -1); if [ -n "$JOB" ]; then echo "LATEST_JOB=$JOB"; if [ -f ".claude/context/jobs/$JOB/build-handoff.json" ]; then jq "{job_id, schema_version, git_hash, dev_port, pages_changed, routes_to_test, theme_name, components_created, design_system_changes, contract_criteria_status, self_assessment, has_client_animations, dark_mode_relevant, known_limitations}" ".claude/context/jobs/$JOB/build-handoff.json" 2>/dev/null; else echo "NO_HANDOFF"; fi; else echo "NO_HANDOFF"; fi' !bash -c 'JOB=$(ls -t .claude/context/jobs/ 2>/dev/null | head -1); if [ -n "$JOB" ]; then OUT=$(grep -E "^\[C[0-9]+\]" ".claude/context/jobs/$JOB/build-contract.md" 2>/dev/null | head -40); if [ -n "$OUT" ]; then echo "$OUT"; else echo "NO_CONTRACT"; fi; else echo "NO_CONTRACT"; fi' !bash -c 'JOB=$(ls -t .claude/context/jobs/ 2>/dev/null | head -1); if [ -n "$JOB" ]; then HASH=$(jq -r ".git_hash // empty" ".claude/context/jobs/$JOB/build-handoff.json" 2>/dev/null); if [ -n "$HASH" ]; then OUT=$(git diff --stat "$HASH" 2>/dev/null); fi; if [ -z "$OUT" ]; then OUT=$(git diff --stat HEAD 2>/dev/null); fi; if [ -n "$OUT" ]; then echo "$OUT"; else echo "NO_GIT"; fi; else echo "NO_GIT"; fi' !bash -c 'JOB=$(ls -t .claude/context/jobs/ 2>/dev/null | head -1); if [ -n "$JOB" ]; then HASH=$(jq -r ".git_hash // empty" ".claude/context/jobs/$JOB/build-handoff.json" 2>/dev/null); if [ -n "$HASH" ]; then OUT=$(git diff --name-only "$HASH" -- "*.vue" "*.ts" "*.css" 2>/dev/null | head -30); fi; if [ -z "$OUT" ]; then OUT=$(git diff --name-only HEAD -- "*.vue" "*.ts" "*.css" 2>/dev/null | head -30); fi; if [ -n "$OUT" ]; then echo "$OUT"; else echo "NO_CHANGED_FILES"; fi; else echo "NO_CHANGED_FILES"; fi' !bash -c 'OUT=$(lsof -i :3000 -i :3001 -i :3002 -i :4000 -i :5173 -sTCP:LISTEN 2>/dev/null | head -5); if [ -z "$OUT" ]; then OUT=$(ss -tlnp 2>/dev/null | grep -E ":300[0-2]|:4000|:5173" | head -5); fi; if [ -n "$OUT" ]; then echo "$OUT"; else echo "NO_SERVER"; fi' !if command -v dev-browser >/dev/null 2>&1; then echo "DEV_BROWSER=true"; else echo "DEV_BROWSER=false"; fi !bash -c 'JOB=$(ls -t .claude/context/jobs/ 2>/dev/null | head -1); if [ -n "$JOB" ] && [ -f ".claude/context/jobs/$JOB/review-calibration.md" ]; then cat ".claude/context/jobs/$JOB/review-calibration.md"; else echo "NO_CALIBRATION"; fi'

Job Resolution

$ARGUMENTS may contain a job ID (e.g., /nuxt-frontend-review landing-0331-1423). Match it against the injected job list.

If $ARGUMENTS contains a string matching a job directory name: use that job
If no match or no arguments: use the latest job (LATEST_JOB from injected state)
If NO_JOBS: warn "No job directories found. Run /nuxt-frontend-design first to create a build with job tracking." Fall back to git diff HEAD for a lightweight review without contract grading.

Set JOB_DIR = .claude/context/jobs/{resolved-job-id} and use it for all artifact reads/writes throughout this review.

Inline mode: this skill runs in the current conversation context by default. Tradeoff: inline review shares the generator's context, which can introduce self-evaluation bias (the reviewer "remembers" the generator's reasoning and may be more lenient). For high-stakes reviews or when you suspect leniency, start a new conversation for independent evaluation with fresh context.

Scope Check

If only 1-2 files changed and the diff is purely cosmetic (token swaps, spacing adjustments, copy changes), skip to Step 3 mechanical greps and Step 5. No need to start a dev server or do visual verification for changes that cannot break layout or behavior.

Step 0: Calibration

The calibration data was injected above. If it exists, weight your evaluation toward historically missed categories. If the calibration notes say you're too lenient on a category, treat it as a hard rejection criterion for this review.

Step 1: Understand What Changed

1a. Scope gate

The git diff stats were injected above. If <= 2 files changed AND all are .vue with < 20 lines changed: skip to Step 3 mechanical greps + Step 5. Do NOT read handoff/contract/design system for trivial cosmetic changes.

1a-bis. Schema version check

If the handoff JSON was injected, verify schema_version is 3. If different or missing, warn: "Handoff schema version mismatch; design and review skills may be out of sync." Proceed with available fields but note this in the report.

1b. Read the theme reference FIRST (independent expectations)

If the handoff includes theme_name, read the theme spec from the skill's references directory for independent design verification: ${CLAUDE_SKILL_DIR}/references/themes/{theme_name}.md

Form your own expectations from the theme spec BEFORE reading the generator's interpretation.

1c. Read the handoff artifact and contract

The handoff JSON key fields and contract criteria IDs were injected above as summaries. Read the full files now for complete context:

Read: {JOB_DIR}/build-handoff.json  -> full handoff with next_steps, known_limitations
Read: {JOB_DIR}/build-contract.md   -> full criteria with GIVEN/WHEN/THEN details, design expectations

Use the contract as your primary grading rubric. The contract defines what "done" means; the handoff tells you where to look.

Self-assessment independence: do NOT use the generator's self_assessment.weakest_area to guide where you look first. Complete your independent evaluation in Step 3, then compare your findings against the self-assessment afterward. Discrepancies where the generator said "met" but you find "fail" should be flagged as self-assessment failures.

1d. Read changed files and design system

The injected git diff already uses the handoff's git_hash as the diff base (falls back to HEAD if no handoff). Read all changed files identified in the diff. Count them: "Read X/Y changed files."

Read the design system for context:

.claude/context/design-guidelines.md
app/assets/css/main.css
app.config.ts

Respect documented design decisions: if design-guidelines.md contains a ## Design Decisions section, those are intentional aesthetic choices confirmed by the user. Do not flag these as issues. For example, if the guidelines say "Hero uses text-3xl intentionally: minimal aesthetic for utility tool," do not report small hero text as a problem.

Step 2: Start and Verify the Dev Environment

You MUST confirm the app is actually working before handing off to the user. "Server started" is not enough.

2a. Start the server

Port selection: use dev_port from the injected handoff JSON. This is the port the design skill used during the build. If dev_port is missing (older handoff), fall back to scanning the injected port list.

The injected state above shows whether a server is running. If no dev server is running, detect and start one:

Nuxt module monorepo (has playground/ dir): pnpm dev:prepare && pnpm --filter '*-playground' dev
Nuxt app (has nuxt.config.ts): pnpm dev
Vite app (has vite.config.ts): pnpm dev

Run the dev server command in the background.

2b. Verify the server is healthy

Do NOT skip this. Do NOT just check the port is open. Actually verify:

# Wait for server, then fetch the page
timeout 60 bash -c 'until curl -sf http://localhost:{port}/ > /dev/null 2>&1; do sleep 2; done'
# Fetch the actual HTML to check for errors
curl -s http://localhost:{port}/{path} | head -50

Check the output for:

HTTP errors (500, 404, connection refused)
Nuxt/Vite build errors in the terminal output
Blank or empty responses
SSR hydration errors in the HTML

Also verify SSR actually rendered content:

curl -s http://localhost:{port}/ | grep -c '<div id="__nuxt"'
curl -s http://localhost:{port}/ | grep -c 'nuxt-error'

Verify this is the right project: check that the response HTML contains a project-specific string (app name from nuxt.config.ts, or a unique component name from the handoff). A stale process from a previous session is not valid.

If the server fails to start or returns errors:

Read the terminal output from the background process
Report the error to the user with the relevant output. You cannot fix code (you are the evaluator, not the implementer).

Only proceed once you have confirmed the page loads successfully.

Step 3: Evaluate the Implementation

Hard rejection criteria

Any ONE of these means FAIL. You must find positive evidence that each criterion passes. "I didn't see any errors" is not positive evidence. "I clicked the button and the modal opened" IS positive evidence. If you cannot produce positive evidence for a criterion, mark it FAILED.

Broken feature: a button/link that triggers no state change, navigation, or visible feedback is broken. A button that opens an empty modal is also broken. Partial implementation counts as broken.
Build/runtime error: console errors, SSR failures, hydration mismatches
Invisible content: text or elements hidden by color, overflow, or z-index
Unreadable text: contrast ratio below 4.5:1 on any text element
Layout break: content overflows viewport or overlaps other content at any standard breakpoint (375px, 768px, 1280px)
Missing state handling: any async operation (data fetch, form submit, API call) must have loading and error handling
Theme incoherence: if the design guidelines specify a design principle (e.g., "depth over flatness") and the implementation contradicts it (e.g., flat shadows everywhere), that is a hard rejection. Exception: items listed under ## Design Decisions in the guidelines are intentional and must not be flagged.
Unnecessary custom tokens: custom @theme tokens or CSS custom properties that duplicate existing Nuxt UI --ui-* variables or Tailwind utilities. The design system should override Nuxt UI's tokens, not invent parallel ones. Check main.css for custom tokens and verify each one has no Nuxt UI or Tailwind equivalent.

If you catch yourself thinking "this is minor, it's fine," that is the signal to investigate further, not to skip it.

Contract scorecard

If a build contract was injected, count the criteria at the start: "Contract has N criteria." Then grade EVERY line item by its ID:

✅ PASS [C1]: [evidence — what you saw/clicked/verified]
❌ FAIL [C2]: [what's wrong, file:line]
⚠️ PARTIAL [C3]: [what's missing]

You may not skip any line item. Present the scorecard BEFORE the general rubric.

Self-assessment comparison

After completing your independent evaluation, compare your findings against the generator's contract_criteria_status and self_assessment:

Items where generator said "met" but you found FAIL: flag as SELF-ASSESSMENT FAILURE
Items where generator flagged as "partial" that you also found: generator was honest
Weakest area accuracy: did the generator correctly identify its weakest area?

Report this comparison after the scorecard. It calibrates trust in future generator self-assessments.

Evaluation rubric: mechanical checks

Run ALL of these greps on the changed files. Report results for each, even if clean.

# Completeness
grep -rEn 'TODO|FIXME|placeholder|Lorem|Coming soon|Sample' {changed_files}

# Design system compliance
grep -rEn '#[0-9a-fA-F]{3,8}' {changed_files}
grep -rEn 'rgb|hsl|rgba' {changed_files}
grep -rEn 'slate-|gray-|zinc-|stone-' {changed_files}
# Note: cross-reference hits against the project's chosen neutral color in app.config.ts. Exclude matches for the configured neutral.

# Unnecessary custom tokens (should override Nuxt UI tokens, not create new ones)
grep -rEn '^\s*--[a-z]' app/assets/css/main.css | grep -v '\-\-ui-' | grep -v '\-\-font-' | grep -v '\-\-color-'
# Each custom token found must be justified: flag any that duplicate a --ui-* variable or Tailwind utility

# Dark mode
grep -rEn 'bg-white|text-black|border-gray' {changed_files}

# Font compliance
grep -rEn 'font-(inter|roboto|arial|system-ui)' {changed_files}

Then evaluate these qualitatively (but each is still pass/fail):

Criterion	How to verify
Responsiveness	At 375px width: no horizontal scroll, no text smaller than 14px, no touch targets smaller than 44px. At 768px: layout uses available space, not just stretched mobile.
Interaction states	Every clickable element has a visible hover state. Every form input shows focus ring. Async operations show loading state. Empty collections show an empty state message.
Accessibility	Every `<input>` has a linked `<label>` or `aria-label`. Every `<img>` has `alt`. Interactive elements are reachable via Tab. Color is not the sole state indicator.

Rubric violations are defects, not suggestions.

Suspicion check

If after completing all mechanical greps and qualitative checks you have found zero issues: re-examine the three highest-complexity components with fresh skepticism. A clean review on a non-trivial build is statistically unlikely. Look for: subtle state handling gaps, missing edge cases in forms, interactions that appear to work but produce no visible feedback. If still clean after re-examination, PASS is legitimate.

When issues are found

Do NOT fix issues yourself. You are the evaluator, not the implementer. Report issues in this format:

#### [HARD REJECT] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}
- **Contract criterion violated**: {ID if applicable}

#### [RUBRIC] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}

Step 4: Visual Verification

Determine which browser tool is available (injected above as DEV_BROWSER). Use the first available option:

dev-browser (preferred): scriptable, batched verification in single bash calls
curl-only (limited): HTML/SSR inspection when no browser tool is installed

4a. With dev-browser (preferred)

Write and execute dev-browser --headless scripts to verify each affected route. Batch multiple checks into single scripts to minimize turns. Each script runs in a sandboxed Playwright environment.

Per-route verification script pattern:

dev-browser --headless <<'SCRIPT'
const page = await browser.getPage("review");
await page.goto("http://localhost:{port}/{route}");
await page.waitForLoadState("networkidle");

// 1. Desktop screenshot + structural snapshot
const snap = await page.snapshotForAI();
const desktop = await page.screenshot({ fullPage: true });
await saveScreenshot(desktop, "{route}-desktop");

// 2. Console errors
const errors = [];
page.on("console", msg => { if (msg.type() === "error") errors.push(msg.text()); });

// 3. SSR/hydration check
const html = await page.content();
const hasNuxtError = html.includes("nuxt-error");
const hasContent = (await page.locator("#__nuxt").all()).length > 0;

// 4. Interactive elements: click every visible button
const buttons = await page.locator("button:visible").all();
const clickResults = [];
for (const btn of buttons) {
  const text = await btn.textContent();
  await btn.click();
  await page.waitForTimeout(300);
  clickResults.push(text?.trim().slice(0, 30));
}

// 5. Mobile viewport check
await page.setViewportSize({ width: 375, height: 812 });
await page.waitForTimeout(500);
const mobile = await page.screenshot({ fullPage: true });
await saveScreenshot(mobile, "{route}-mobile");
const overflows = await page.evaluate(() =>
  document.documentElement.scrollWidth > window.innerWidth
);

// 6. Dark mode toggle
await page.evaluate(() => document.documentElement.classList.toggle("dark"));
await page.waitForTimeout(300);
const dark = await page.screenshot({ fullPage: true });
await saveScreenshot(dark, "{route}-dark");

console.log(JSON.stringify({
  hasNuxtError, hasContent,
  consoleErrors: errors,
  buttonsClicked: clickResults,
  mobileOverflows: overflows
}, null, 2));
SCRIPT

Adapt this pattern per route. For pages with forms, add page.fill() and page.click('[type="submit"]') sequences to test validation. For pages with navigation, verify route changes after link clicks.

Use snapshotForAI() for structural checks (element existence, DOM structure, content rendering). Reserve screenshots for visual/aesthetic evaluation. This keeps context usage efficient.

Accessibility check: run axe-core via page.evaluate():

dev-browser --headless <<'SCRIPT'
const page = await browser.getPage("a11y");
await page.goto("http://localhost:{port}/{route}");
await page.waitForLoadState("networkidle");
// Inject and run axe-core
await page.addScriptTag({ url: "https://cdn.jsdelivr.net/npm/axe-core@4/axe.min.js" });
const results = await page.evaluate(async () => {
  const r = await window.axe.run();
  return { violations: r.violations.map(v => ({ id: v.id, impact: v.impact, description: v.description, nodes: v.nodes.length })) };
});
console.log(JSON.stringify(results, null, 2));
SCRIPT

Any critical or serious axe violation = FAIL.

Contract criteria automation: translate [C1] GIVEN/WHEN/THEN criteria into dev-browser assertions where possible. A script that clicks a button and verifies the expected result is stronger evidence than "I looked at the page."

4b. Without dev-browser (limited)

If dev-browser is not installed, maximize what you can verify:

Run ALL mechanical grep checks (these work without a browser)
curl each affected route and verify SSR content contains expected elements
Check HTML for basic structural issues (missing closing tags, empty containers, broken component names)
Verify all routes return 200 and contain project-specific content

The verdict can NEVER be PASS without browser automation (dev-browser), only PARTIAL or FAIL. Flag explicitly in Step 5:

"Visual verification was limited to HTML/SSR inspection (no browser tools available). The following criteria are UNVERIFIED: interactive behavior, client-side rendering, layout at breakpoints, accessibility (contrast, focus, tab order), and mobile responsiveness. Install dev-browser (npm install -g dev-browser && dev-browser install) for a complete review."

If the handoff indicates has_client_animations: true, add: "Client-side animations were flagged in the handoff but were not verified without browser tools."

Step 5: Write Review Report and Present

First, run mkdir -p {JOB_DIR}. Then write {JOB_DIR}/review-report.md with all findings. The file MUST start with a structured preamble (machine-readable by the design skill), followed by the human-readable report:

---
verdict: {PASS|FAIL|PARTIAL}
failed_criteria: [{id}, ...]
failed_files: [{path}:{line}, ...]
categories: [{category}, ...]
---

## {PASS|FAIL|PARTIAL} — {date}

### Contract Scorecard
{full scorecard with IDs}

### Self-Assessment Comparison
{generator accuracy analysis}

### Issues
{all HARD REJECT and RUBRIC items with file:line}

### What was verified
{list of verification steps completed}

### Next Steps
{Concrete fix commands or "ready to ship"}

### Decision Log
For each hard rejection criterion and contract criterion, record: what you checked, what you found, and your verdict. If you initially considered something a possible issue then decided it was acceptable, record that reasoning here. This log enables calibration.

Then present the verdict to the user:

Present the verdict as a single word at the top: PASS, FAIL, or PARTIAL.

PASS: all hard rejection criteria have positive evidence, all contract criteria met, no rubric violations
FAIL: any hard rejection criterion failed, OR 2+ rubric violations found
PARTIAL: no hard rejections but verification was incomplete for specific criteria (e.g., no browser MCP)

Re-review is not a differential check. A fix for Issue A can introduce Issue B. All hard rejection criteria are re-graded on every pass, not just the items that failed last time.

## {PASS|FAIL|PARTIAL}

**URL:** http://localhost:{port}/{path}

### Contract Scorecard (if contract exists)
✅ PASS [C1]: {criterion} — {evidence}
❌ FAIL [C2]: {criterion} — {what's wrong}
⚠️ PARTIAL [C3]: {criterion} — {what's missing}

### Self-Assessment Accuracy
- Generator confidence: {low/medium/high}
- Weakest area identified: {what they said}
- Actual weakest area: {what you found}
- Self-assessment failures: {criteria marked "met" that failed}

### Issues Found (if FAIL)
#### [HARD REJECT] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}

#### [RUBRIC] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}

### What I verified
- {Server starts and returns 200}
- {Screenshot checks}
- {Lighthouse accessibility score: {N}/100}
- {Mobile emulation at 375px: {result}}

### Testing checklist
1. [ ] {Specific interaction to test} — {what should happen}
2. [ ] {Another interaction} — {expected result}
3. [ ] {Edge case to try} — {expected behavior}
4. [ ] {Dark mode toggle} — {what to look for}
5. [ ] {Mobile/responsive check} — {resize to ~375px, verify layout}

### Areas I'm less confident about
- {Anything unverified}
- {Chrome DevTools limitation note if applicable}

### Next steps
{Based on verdict, give the user a concrete command to run next.}

**If FAIL or PARTIAL:**
> Run `/nuxt-frontend-design {JOB_ID}` to fix. It detects the FAIL verdict and enters repair mode automatically.
>
> Then re-run `/nuxt-frontend-review {JOB_ID}` to verify.

**If PASS:**
> All criteria met. Ready to ship, or run `/nuxt-frontend-design polish` to refine further.

{If any issues are design-system-level (not page-specific), call that out: "The contrast issues are design system tokens, not page code. Run `/nuxt-frontend-design {JOB_ID}` and it will fix tokens in `main.css`/`app.config.ts` before page code."}

Rules for the checklist

Be specific: "Click the 'Save' button with an empty form" not "Test the form"
Include the expected result: every item says what should happen
Cover the happy path first, then edge cases
Include dark mode as a checklist item if any visual changes were made
Include mobile if layout was touched
5-10 items max: enough to be thorough, not overwhelming
Link directly to the page: if there are multiple routes affected, list each URL

For the User: Feedback Loop

After the review:

Run /nuxt-frontend-design {job-id} to fix. It detects the FAIL verdict and enters repair mode with full design system context.
Run /nuxt-frontend-review {job-id} again to verify the fixes (use the same job ID).

For independent evaluation with fresh context, start a new conversation. Re-evaluating with fresh context prevents the reviewer from rationalizing away issues it already "accepted" in a prior pass.

After testing, update {JOB_DIR}/review-calibration.md with any issues the reviewer missed or false flags it raised. Even if nothing was missed, write: "No missed issues in this pass on {date}." An empty calibration file signals the loop was never run.

Calibration: Known Leniency Traps

Guard against these patterns in yourself:

"It works on desktop so it's probably fine on mobile": it is not; check.
"The colors are close enough to the design system": close is a violation.
"This TODO is for a future iteration": if the contract says it should work now, it is incomplete.
"The interaction works, it just doesn't look exactly right": looking right IS the requirement for a design skill.
"I didn't see any errors": absence of evidence is not evidence of absence. Find positive evidence.

Similar Skills

using-git-worktrees

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

168.3k

subagent-driven-development

3 files

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

superpowers

168.3k

dispatching-parallel-agents

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

168.3k

Stats

Parent Repo Stars2

Parent Repo Forks0

Last CommitMar 30, 2026

Actions

View Source View Plugin View on GitHub View README

Frontend Review

Injected State

Job Resolution

$ARGUMENTS may contain a job ID (e.g., /nuxt-frontend-review landing-0331-1423). Match it against the injected job list.

If $ARGUMENTS contains a string matching a job directory name: use that job
If no match or no arguments: use the latest job (LATEST_JOB from injected state)
If NO_JOBS: warn "No job directories found. Run /nuxt-frontend-design first to create a build with job tracking." Fall back to git diff HEAD for a lightweight review without contract grading.

Set JOB_DIR = .claude/context/jobs/{resolved-job-id} and use it for all artifact reads/writes throughout this review.

Scope Check

Step 0: Calibration

Step 1: Understand What Changed

1a. Scope gate

1a-bis. Schema version check

1b. Read the theme reference FIRST (independent expectations)

If the handoff includes theme_name, read the theme spec from the skill's references directory for independent design verification: ${CLAUDE_SKILL_DIR}/references/themes/{theme_name}.md

Form your own expectations from the theme spec BEFORE reading the generator's interpretation.

1c. Read the handoff artifact and contract

The handoff JSON key fields and contract criteria IDs were injected above as summaries. Read the full files now for complete context:

Read: {JOB_DIR}/build-handoff.json  -> full handoff with next_steps, known_limitations
Read: {JOB_DIR}/build-contract.md   -> full criteria with GIVEN/WHEN/THEN details, design expectations

Use the contract as your primary grading rubric. The contract defines what "done" means; the handoff tells you where to look.

1d. Read changed files and design system

The injected git diff already uses the handoff's git_hash as the diff base (falls back to HEAD if no handoff). Read all changed files identified in the diff. Count them: "Read X/Y changed files."

Read the design system for context:

.claude/context/design-guidelines.md
app/assets/css/main.css
app.config.ts

Step 2: Start and Verify the Dev Environment

You MUST confirm the app is actually working before handing off to the user. "Server started" is not enough.

2a. Start the server

The injected state above shows whether a server is running. If no dev server is running, detect and start one:

Nuxt module monorepo (has playground/ dir): pnpm dev:prepare && pnpm --filter '*-playground' dev
Nuxt app (has nuxt.config.ts): pnpm dev
Vite app (has vite.config.ts): pnpm dev

Run the dev server command in the background.

2b. Verify the server is healthy

Do NOT skip this. Do NOT just check the port is open. Actually verify:

# Wait for server, then fetch the page
timeout 60 bash -c 'until curl -sf http://localhost:{port}/ > /dev/null 2>&1; do sleep 2; done'
# Fetch the actual HTML to check for errors
curl -s http://localhost:{port}/{path} | head -50

Check the output for:

HTTP errors (500, 404, connection refused)
Nuxt/Vite build errors in the terminal output
Blank or empty responses
SSR hydration errors in the HTML

Also verify SSR actually rendered content:

curl -s http://localhost:{port}/ | grep -c '<div id="__nuxt"'
curl -s http://localhost:{port}/ | grep -c 'nuxt-error'

If the server fails to start or returns errors:

Read the terminal output from the background process
Report the error to the user with the relevant output. You cannot fix code (you are the evaluator, not the implementer).

Only proceed once you have confirmed the page loads successfully.

Step 3: Evaluate the Implementation

Hard rejection criteria

Broken feature: a button/link that triggers no state change, navigation, or visible feedback is broken. A button that opens an empty modal is also broken. Partial implementation counts as broken.
Build/runtime error: console errors, SSR failures, hydration mismatches
Invisible content: text or elements hidden by color, overflow, or z-index
Unreadable text: contrast ratio below 4.5:1 on any text element
Layout break: content overflows viewport or overlaps other content at any standard breakpoint (375px, 768px, 1280px)
Missing state handling: any async operation (data fetch, form submit, API call) must have loading and error handling
Theme incoherence: if the design guidelines specify a design principle (e.g., "depth over flatness") and the implementation contradicts it (e.g., flat shadows everywhere), that is a hard rejection. Exception: items listed under ## Design Decisions in the guidelines are intentional and must not be flagged.
Unnecessary custom tokens: custom @theme tokens or CSS custom properties that duplicate existing Nuxt UI --ui-* variables or Tailwind utilities. The design system should override Nuxt UI's tokens, not invent parallel ones. Check main.css for custom tokens and verify each one has no Nuxt UI or Tailwind equivalent.

If you catch yourself thinking "this is minor, it's fine," that is the signal to investigate further, not to skip it.

Contract scorecard

If a build contract was injected, count the criteria at the start: "Contract has N criteria." Then grade EVERY line item by its ID:

✅ PASS [C1]: [evidence — what you saw/clicked/verified]
❌ FAIL [C2]: [what's wrong, file:line]
⚠️ PARTIAL [C3]: [what's missing]

You may not skip any line item. Present the scorecard BEFORE the general rubric.

Self-assessment comparison

After completing your independent evaluation, compare your findings against the generator's contract_criteria_status and self_assessment:

Items where generator said "met" but you found FAIL: flag as SELF-ASSESSMENT FAILURE
Items where generator flagged as "partial" that you also found: generator was honest
Weakest area accuracy: did the generator correctly identify its weakest area?

Report this comparison after the scorecard. It calibrates trust in future generator self-assessments.

Evaluation rubric: mechanical checks

Run ALL of these greps on the changed files. Report results for each, even if clean.

# Completeness
grep -rEn 'TODO|FIXME|placeholder|Lorem|Coming soon|Sample' {changed_files}

# Design system compliance
grep -rEn '#[0-9a-fA-F]{3,8}' {changed_files}
grep -rEn 'rgb|hsl|rgba' {changed_files}
grep -rEn 'slate-|gray-|zinc-|stone-' {changed_files}
# Note: cross-reference hits against the project's chosen neutral color in app.config.ts. Exclude matches for the configured neutral.

# Unnecessary custom tokens (should override Nuxt UI tokens, not create new ones)
grep -rEn '^\s*--[a-z]' app/assets/css/main.css | grep -v '\-\-ui-' | grep -v '\-\-font-' | grep -v '\-\-color-'
# Each custom token found must be justified: flag any that duplicate a --ui-* variable or Tailwind utility

# Dark mode
grep -rEn 'bg-white|text-black|border-gray' {changed_files}

# Font compliance
grep -rEn 'font-(inter|roboto|arial|system-ui)' {changed_files}

Then evaluate these qualitatively (but each is still pass/fail):

Criterion	How to verify
Responsiveness	At 375px width: no horizontal scroll, no text smaller than 14px, no touch targets smaller than 44px. At 768px: layout uses available space, not just stretched mobile.
Interaction states	Every clickable element has a visible hover state. Every form input shows focus ring. Async operations show loading state. Empty collections show an empty state message.
Accessibility	Every `<input>` has a linked `<label>` or `aria-label`. Every `<img>` has `alt`. Interactive elements are reachable via Tab. Color is not the sole state indicator.

Rubric violations are defects, not suggestions.

Suspicion check

When issues are found

Do NOT fix issues yourself. You are the evaluator, not the implementer. Report issues in this format:

#### [HARD REJECT] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}
- **Contract criterion violated**: {ID if applicable}

#### [RUBRIC] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}

Step 4: Visual Verification

Determine which browser tool is available (injected above as DEV_BROWSER). Use the first available option:

dev-browser (preferred): scriptable, batched verification in single bash calls
curl-only (limited): HTML/SSR inspection when no browser tool is installed

4a. With dev-browser (preferred)

Write and execute dev-browser --headless scripts to verify each affected route. Batch multiple checks into single scripts to minimize turns. Each script runs in a sandboxed Playwright environment.

Per-route verification script pattern:

dev-browser --headless <<'SCRIPT'
const page = await browser.getPage("review");
await page.goto("http://localhost:{port}/{route}");
await page.waitForLoadState("networkidle");

// 1. Desktop screenshot + structural snapshot
const snap = await page.snapshotForAI();
const desktop = await page.screenshot({ fullPage: true });
await saveScreenshot(desktop, "{route}-desktop");

// 2. Console errors
const errors = [];
page.on("console", msg => { if (msg.type() === "error") errors.push(msg.text()); });

// 3. SSR/hydration check
const html = await page.content();
const hasNuxtError = html.includes("nuxt-error");
const hasContent = (await page.locator("#__nuxt").all()).length > 0;

// 4. Interactive elements: click every visible button
const buttons = await page.locator("button:visible").all();
const clickResults = [];
for (const btn of buttons) {
  const text = await btn.textContent();
  await btn.click();
  await page.waitForTimeout(300);
  clickResults.push(text?.trim().slice(0, 30));
}

// 5. Mobile viewport check
await page.setViewportSize({ width: 375, height: 812 });
await page.waitForTimeout(500);
const mobile = await page.screenshot({ fullPage: true });
await saveScreenshot(mobile, "{route}-mobile");
const overflows = await page.evaluate(() =>
  document.documentElement.scrollWidth > window.innerWidth
);

// 6. Dark mode toggle
await page.evaluate(() => document.documentElement.classList.toggle("dark"));
await page.waitForTimeout(300);
const dark = await page.screenshot({ fullPage: true });
await saveScreenshot(dark, "{route}-dark");

console.log(JSON.stringify({
  hasNuxtError, hasContent,
  consoleErrors: errors,
  buttonsClicked: clickResults,
  mobileOverflows: overflows
}, null, 2));
SCRIPT

Use snapshotForAI() for structural checks (element existence, DOM structure, content rendering). Reserve screenshots for visual/aesthetic evaluation. This keeps context usage efficient.

Accessibility check: run axe-core via page.evaluate():

dev-browser --headless <<'SCRIPT'
const page = await browser.getPage("a11y");
await page.goto("http://localhost:{port}/{route}");
await page.waitForLoadState("networkidle");
// Inject and run axe-core
await page.addScriptTag({ url: "https://cdn.jsdelivr.net/npm/axe-core@4/axe.min.js" });
const results = await page.evaluate(async () => {
  const r = await window.axe.run();
  return { violations: r.violations.map(v => ({ id: v.id, impact: v.impact, description: v.description, nodes: v.nodes.length })) };
});
console.log(JSON.stringify(results, null, 2));
SCRIPT

Any critical or serious axe violation = FAIL.

4b. Without dev-browser (limited)

If dev-browser is not installed, maximize what you can verify:

Run ALL mechanical grep checks (these work without a browser)
curl each affected route and verify SSR content contains expected elements
Check HTML for basic structural issues (missing closing tags, empty containers, broken component names)
Verify all routes return 200 and contain project-specific content

The verdict can NEVER be PASS without browser automation (dev-browser), only PARTIAL or FAIL. Flag explicitly in Step 5:

"Visual verification was limited to HTML/SSR inspection (no browser tools available). The following criteria are UNVERIFIED: interactive behavior, client-side rendering, layout at breakpoints, accessibility (contrast, focus, tab order), and mobile responsiveness. Install dev-browser (npm install -g dev-browser && dev-browser install) for a complete review."

If the handoff indicates has_client_animations: true, add: "Client-side animations were flagged in the handoff but were not verified without browser tools."

Step 5: Write Review Report and Present

---
verdict: {PASS|FAIL|PARTIAL}
failed_criteria: [{id}, ...]
failed_files: [{path}:{line}, ...]
categories: [{category}, ...]
---

## {PASS|FAIL|PARTIAL} — {date}

### Contract Scorecard
{full scorecard with IDs}

### Self-Assessment Comparison
{generator accuracy analysis}

### Issues
{all HARD REJECT and RUBRIC items with file:line}

### What was verified
{list of verification steps completed}

### Next Steps
{Concrete fix commands or "ready to ship"}

### Decision Log
For each hard rejection criterion and contract criterion, record: what you checked, what you found, and your verdict. If you initially considered something a possible issue then decided it was acceptable, record that reasoning here. This log enables calibration.

Then present the verdict to the user:

Present the verdict as a single word at the top: PASS, FAIL, or PARTIAL.

PASS: all hard rejection criteria have positive evidence, all contract criteria met, no rubric violations
FAIL: any hard rejection criterion failed, OR 2+ rubric violations found
PARTIAL: no hard rejections but verification was incomplete for specific criteria (e.g., no browser MCP)

Re-review is not a differential check. A fix for Issue A can introduce Issue B. All hard rejection criteria are re-graded on every pass, not just the items that failed last time.

## {PASS|FAIL|PARTIAL}

**URL:** http://localhost:{port}/{path}

### Contract Scorecard (if contract exists)
✅ PASS [C1]: {criterion} — {evidence}
❌ FAIL [C2]: {criterion} — {what's wrong}
⚠️ PARTIAL [C3]: {criterion} — {what's missing}

### Self-Assessment Accuracy
- Generator confidence: {low/medium/high}
- Weakest area identified: {what they said}
- Actual weakest area: {what you found}
- Self-assessment failures: {criteria marked "met" that failed}

### Issues Found (if FAIL)
#### [HARD REJECT] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}

#### [RUBRIC] {category}: {description}
- **File**: `{path}:{line}`
- **Evidence**: {what you found}

### What I verified
- {Server starts and returns 200}
- {Screenshot checks}
- {Lighthouse accessibility score: {N}/100}
- {Mobile emulation at 375px: {result}}

### Testing checklist
1. [ ] {Specific interaction to test} — {what should happen}
2. [ ] {Another interaction} — {expected result}
3. [ ] {Edge case to try} — {expected behavior}
4. [ ] {Dark mode toggle} — {what to look for}
5. [ ] {Mobile/responsive check} — {resize to ~375px, verify layout}

### Areas I'm less confident about
- {Anything unverified}
- {Chrome DevTools limitation note if applicable}

### Next steps
{Based on verdict, give the user a concrete command to run next.}

**If FAIL or PARTIAL:**
> Run `/nuxt-frontend-design {JOB_ID}` to fix. It detects the FAIL verdict and enters repair mode automatically.
>
> Then re-run `/nuxt-frontend-review {JOB_ID}` to verify.

**If PASS:**
> All criteria met. Ready to ship, or run `/nuxt-frontend-design polish` to refine further.

{If any issues are design-system-level (not page-specific), call that out: "The contrast issues are design system tokens, not page code. Run `/nuxt-frontend-design {JOB_ID}` and it will fix tokens in `main.css`/`app.config.ts` before page code."}

Rules for the checklist

Be specific: "Click the 'Save' button with an empty form" not "Test the form"
Include the expected result: every item says what should happen
Cover the happy path first, then edge cases
Include dark mode as a checklist item if any visual changes were made
Include mobile if layout was touched
5-10 items max: enough to be thorough, not overwhelming
Link directly to the page: if there are multiple routes affected, list each URL

For the User: Feedback Loop

After the review:

Run /nuxt-frontend-design {job-id} to fix. It detects the FAIL verdict and enters repair mode with full design system context.
Run /nuxt-frontend-review {job-id} again to verify the fixes (use the same job ID).

For independent evaluation with fresh context, start a new conversation. Re-evaluating with fresh context prevents the reviewer from rationalizing away issues it already "accepted" in a prior pass.

Calibration: Known Leniency Traps

Guard against these patterns in yourself:

"It works on desktop so it's probably fine on mobile": it is not; check.
"The colors are close enough to the design system": close is a violation.
"This TODO is for a future iteration": if the contract says it should work now, it is incomplete.
"The interaction works, it just doesn't look exactly right": looking right IS the requirement for a design skill.
"I didn't see any errors": absence of evidence is not evidence of absence. Find positive evidence.