From rkstack
Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems, AI slop patterns, and slow interactions -- then fixes them. Iteratively fixes issues in source code, committing each fix atomically and re-verifying with before/after screenshots. For plan-mode design review (before implementation), use /plan-design-review. Use when asked to "audit the design", "visual QA", "check if it looks good", or "design polish". Proactively suggest when the user mentions visual inconsistencies or wants to polish the look of a live site.
npx claudepluginhub mrkhachaturov/ccode-personal-plugins --plugin rkstackThis skill is limited to using the following tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
# === RKstack Preamble (design-review) ===
# Read detection cache (written by session-start via rkstack detect)
if [ -f .rkstack/settings.json ]; then
cat .rkstack/settings.json
else
echo "WARNING: .rkstack/settings.json not found — detection cache missing"
fi
# Session-volatile checks (can change mid-session)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
_HAS_CLAUDE_MD=$([ -f CLAUDE.md ] && echo "yes" || echo "no")
echo "BRANCH: $_BRANCH"
echo "CLAUDE_MD: $_HAS_CLAUDE_MD"
Use the detection cache and preamble output to adapt your behavior:
detection.flowType (web or default). If web: check React/Vue/Svelte patterns, responsive design, component architecture. If default: CLI tools, MCP servers, backend scripts.just commands instead of raw shell.detection.stack for what's in the project and detection.stats for scale (files, code, complexity).detection.repoMode for solo vs collaborative.detection.services for Supabase and other service integrations.ALWAYS follow this structure for every AskUserQuestion call:
_BRANCH value from preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)RECOMMENDATION: Choose [X] because [one-line reason] — always prefer the complete option over shortcuts (see Completeness Principle). Include Completeness: X/10 for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work.A) ... B) ... C) ... — when an option involves effort, show both scales: (human: ~X / CC: ~Y)Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
AI makes completeness near-free. Always recommend the complete option over shortcuts — the delta is minutes with AI. A "lake" (100% coverage, all edge cases) is boilable; an "ocean" (full rewrite, multi-quarter migration) is not. Boil lakes, flag oceans.
Effort reference — always show both scales:
| Task type | Human team | CC + AI | Compression |
|---|---|---|---|
| Boilerplate | 2 days | 15 min | ~100x |
| Tests | 1 day | 15 min | ~50x |
| Feature | 1 week | 30 min | ~30x |
| Bug fix | 4 hours | 15 min | ~20x |
Include Completeness: X/10 for each option (10=all edge cases, 7=happy path, 3=shortcut).
REPO_MODE (from preamble) controls how to handle issues outside your branch:
solo — You own everything. Investigate and offer to fix proactively.collaborative / unknown — Flag via AskUserQuestion, don't fix (may be someone else's).Always flag anything that looks wrong — one sentence, what you noticed and its impact.
Before building anything unfamiliar, search first.
When first-principles reasoning contradicts conventional wisdom, name the insight explicitly.
When completing a skill workflow, report status using one of:
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
Escalation format:
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards -- then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.
Parse the user's request for these parameters:
| Parameter | Default | Override example |
|---|---|---|
| Target URL | (auto-detect or ask) | https://myapp.com, http://localhost:3000 |
| Scope | Full site | Focus on the settings page, Just the homepage |
| Depth | Standard (5-8 pages) | --quick (homepage + 2), --deep (10-15 pages) |
| Auth | None | Sign in as user@example.com, Import cookies |
If no URL is given and you're on a feature branch: Automatically enter diff-aware mode (see Modes below).
If no URL is given and you're on main/master: Ask the user for a URL.
Dev server discovery:
Check CLAUDE.md for dev server configuration:
grep -E "^(Dev server|Dev URL|dev.server|dev.url):" CLAUDE.md 2>/dev/null || echo "NO_DEV_CONFIG"
If no config found and no URL was given, use AskUserQuestion to ask for the URL.
Check for DESIGN.md:
ls DESIGN.md design-system.md 2>/dev/null || echo "NO_DESIGN_FILE"
If found, read it -- all design decisions must be calibrated against it. Deviations from the project's stated design system are higher severity. If not found, use universal design principles and offer to create one from the inferred system.
Check for clean working tree:
git status --porcelain
If dirty, STOP and use AskUserQuestion:
Re-ground: Working tree has uncommitted changes.
Simplify: /design-review needs a clean tree so each design fix gets its own atomic commit.
RECOMMENDATION: Choose A to preserve your work before design review starts.
A) Commit all changes now (recommended) -- Completeness: 9/10 B) Stash my changes -- Completeness: 7/10 C) Abort -- Completeness: 5/10
The browse binary path is injected into session context by the session-start hook.
Look for RKSTACK_BROWSE=<path> at the top of this conversation.
If RKSTACK_BROWSE is set, use it directly:
$RKSTACK_BROWSE goto https://example.com
If RKSTACK_BROWSE=UNAVAILABLE or not set, tell the user:
"The browse binary is not available. Install it with the rkstack release for your platform." and stop.
Detect existing test framework and project runtime:
setopt +o nomatch 2>/dev/null || true # zsh compat
# Detect project runtime
[ -f Gemfile ] && echo "RUNTIME:ruby"
[ -f package.json ] && echo "RUNTIME:node"
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
[ -f go.mod ] && echo "RUNTIME:go"
[ -f Cargo.toml ] && echo "RUNTIME:rust"
[ -f composer.json ] && echo "RUNTIME:php"
[ -f mix.exs ] && echo "RUNTIME:elixir"
# Detect sub-frameworks
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
# Check for existing test infrastructure
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
# Check opt-out marker
[ -f .rkstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
If test framework detected (config files or test directories found): Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). Store conventions as prose context for use in later test generation steps. Skip the rest of bootstrap.
If BOOTSTRAP_DECLINED appears: Print "Test bootstrap previously declined -- skipping." Skip the rest of bootstrap.
If NO runtime detected (no config files found): Use AskUserQuestion:
"I couldn't detect your project's language. What runtime are you using?"
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
If user picks H -> write .rkstack/no-test-bootstrap and continue without tests.
If runtime detected but no test framework -- bootstrap:
Use WebSearch to find current best practices for the detected runtime:
"[runtime] best test framework 2025 2026""[framework A] vs [framework B] comparison"If WebSearch is unavailable, use this built-in knowledge table:
| Runtime | Primary recommendation | Alternative |
|---|---|---|
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
| Node.js | vitest + @testing-library | jest + @testing-library |
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
| Python | pytest + pytest-cov | unittest |
| Go | stdlib testing + testify | stdlib only |
| Rust | cargo test (built-in) + mockall | -- |
| PHP | phpunit + mockery | pest |
| Elixir | ExUnit (built-in) + ex_machina | -- |
Use AskUserQuestion: "I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: A) [Primary] -- [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e B) [Alternative] -- [rationale]. Includes: [packages] C) Skip -- don't set up testing right now RECOMMENDATION: Choose A because [reason based on project context]"
If user picks C -> write .rkstack/no-test-bootstrap. Tell user: "If you change your mind later, delete .rkstack/no-test-bootstrap and re-run." Continue without tests.
If multiple runtimes detected (monorepo) -> ask which runtime to set up first, with option to do both sequentially.
If package installation fails -> debug once. If still failing -> revert with git checkout -- package.json package-lock.json (or equivalent for the runtime). Warn user and continue without tests.
Generate 3-5 real tests for existing code:
git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10expect(x).toBeDefined() -- test what the code DOES.Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
# Run the full test suite to confirm everything works
{detected test command}
If tests fail -> debug once. If still failing -> revert all bootstrap changes and warn user.
# Check CI provider
ls -d .github/ 2>/dev/null && echo "CI:github"
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
If .github/ exists (or no CI detected -- default to GitHub Actions):
Create .github/workflows/test.yml with:
runs-on: ubuntu-latestIf non-GitHub CI detected -> skip CI generation with note: "Detected {provider} -- CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
First check: If TESTING.md already exists -> read it and update/append rather than overwriting. Never destroy existing content.
Write TESTING.md with:
First check: If CLAUDE.md already has a ## Testing section -> skip. Don't duplicate.
Append a ## Testing section:
git status --porcelain
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
git commit -m "chore: bootstrap test framework ({framework name})"
Create output directories:
mkdir -p .rkstack/design-reports/screenshots
Systematic review of all pages reachable from homepage. Visit 5-8 pages. Full checklist evaluation, responsive screenshots, interaction flow testing. Produces complete design audit report with letter grades.
--quick)Homepage + 2 key pages only. First Impression + Design System Extraction + abbreviated checklist. Fastest path to a design score.
--deep)Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist. For pre-launch audits or major redesigns.
When on a feature branch, scope to pages affected by the branch changes:
git diff main...HEAD --name-only--regression or previous design-baseline.json found)Run full audit, then load previous design-baseline.json. Compare: per-category grade deltas, new findings, resolved findings. Output regression table in report.
The most uniquely designer-like output. Form a gut reaction before analyzing anything.
$RKSTACK_BROWSE screenshot "$REPORT_DIR/screenshots/first-impression.png"This is the section users read first. Be opinionated. A designer doesn't hedge -- they react.
Extract the actual design system the site uses (not what a DESIGN.md says, but what's rendered):
# Fonts in use (capped at 500 elements to avoid timeout)
$RKSTACK_BROWSE js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).map(e => getComputedStyle(e).fontFamily))])"
# Color palette in use
$RKSTACK_BROWSE js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).flatMap(e => [getComputedStyle(e).color, getComputedStyle(e).backgroundColor]).filter(c => c !== 'rgba(0, 0, 0, 0)'))])"
# Heading hierarchy
$RKSTACK_BROWSE js "JSON.stringify([...document.querySelectorAll('h1,h2,h3,h4,h5,h6')].map(h => ({tag:h.tagName, text:h.textContent.trim().slice(0,50), size:getComputedStyle(h).fontSize, weight:getComputedStyle(h).fontWeight})))"
# Touch target audit (find undersized interactive elements)
$RKSTACK_BROWSE js "JSON.stringify([...document.querySelectorAll('a,button,input,[role=button]')].filter(e => {const r=e.getBoundingClientRect(); return r.width>0 && (r.width<44||r.height<44)}).map(e => ({tag:e.tagName, text:(e.textContent||'').trim().slice(0,30), w:Math.round(e.getBoundingClientRect().width), h:Math.round(e.getBoundingClientRect().height)})).slice(0,20))"
# Performance baseline
$RKSTACK_BROWSE perf
Structure findings as an Inferred Design System:
After extraction, offer: "Want me to save this as your DESIGN.md? I can lock in these observations as your project's design system baseline."
For each page in scope:
$RKSTACK_BROWSE goto <url>
$RKSTACK_BROWSE snapshot -i -a -o "$REPORT_DIR/screenshots/{page}-annotated.png"
$RKSTACK_BROWSE responsive "$REPORT_DIR/screenshots/{page}"
$RKSTACK_BROWSE console --errors
$RKSTACK_BROWSE perf
After the first navigation, check if the URL changed to a login-like path:
$RKSTACK_BROWSE url
If URL contains /login, /signin, /auth, or /sso: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run /setup-browser-cookies first if needed."
Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
1. Visual Hierarchy & Composition (8 items)
2. Typography (15 items)
text-wrap: balance or text-pretty on headings (check via $RKSTACK_BROWSE css <heading> text-wrap)font-variant-numeric: tabular-nums on number columns3. Color & Contrast (10 items)
color-scheme: dark on html element (if dark mode present)4. Spacing & Layout (12 items)
env(safe-area-inset-*) for notch devices5. Interaction States (10 items)
focus-visible ring present (never outline: none without replacement)cursor: not-allowedcursor: pointer on all clickable elements6. Responsive Design (8 items)
user-scalable=no or maximum-scale=1 in viewport meta7. Motion & Animation (6 items)
prefers-reduced-motion respected (check: $RKSTACK_BROWSE js "matchMedia('(prefers-reduced-motion: reduce)').matches")transition: all -- properties listed explicitlytransform and opacity animated (not layout properties like width, height, top, left)8. Content & Microcopy (8 items)
text-overflow: ellipsis, line-clamp, or break-words)9. AI Slop Detection (10 anti-patterns -- the blacklist)
The test: would a human designer at a respected studio ever ship this?
text-align: center on all headings, descriptions, cards)border-left: 3px solid <accent>)10. Performance as Design (6 items)
loading="lazy", width/height dimensions set, WebP/AVIF formatfont-display: swap, preconnect to CDN originsWalk 2-3 key user flows and evaluate the feel, not just the function:
$RKSTACK_BROWSE snapshot -i
$RKSTACK_BROWSE click @e3 # perform action
$RKSTACK_BROWSE snapshot -D # diff to see what changed
Evaluate:
Compare screenshots and observations across pages for:
.rkstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md
Baseline: Write design-baseline.json for regression mode:
{
"date": "YYYY-MM-DD",
"url": "<target>",
"designScore": "B",
"aiSlopScore": "C",
"categoryGrades": { "hierarchy": "A", "typography": "B" },
"findings": [{ "id": "FINDING-001", "title": "...", "impact": "high", "category": "typography" }]
}
Dual headline scores:
Per-category grades:
Grade computation: Each category starts at A. Each High-impact finding drops one letter grade. Each Medium-impact finding drops half a letter grade. Polish findings are noted but do not affect grade. Minimum is F.
Category weights for Design Score:
| Category | Weight |
|---|---|
| Visual Hierarchy | 15% |
| Typography | 15% |
| Spacing & Layout | 15% |
| Color & Contrast | 10% |
| Interaction States | 10% |
| Responsive | 10% |
| Content Quality | 10% |
| AI Slop | 5% |
| Motion | 5% |
| Performance Feel | 5% |
AI Slop is 5% of Design Score but also graded independently as a headline metric.
When previous design-baseline.json exists or --regression flag is used:
Use structured feedback, not opinions:
Tie everything to user goals and product objectives. Always suggest specific improvements alongside problems.
snapshot -a) to highlight elements.snapshot -C for tricky UIs. Finds clickable divs that the accessibility tree misses.$RKSTACK_BROWSE screenshot, $RKSTACK_BROWSE snapshot -a -o, or $RKSTACK_BROWSE responsive command, use the Read tool on the output file(s) so the user can see them inline. For responsive (3 files), Read all three. This is critical -- without it, screenshots are invisible to the user.Classifier -- determine rule set before evaluating:
Hard rejection criteria (instant-fail patterns -- flag if ANY apply):
Litmus checks (answer YES/NO for each):
Landing page rules (apply when classifier = MARKETING/LANDING):
App UI rules (apply when classifier = APP UI):
Universal rules (apply to ALL types):
AI Slop blacklist (the 10 patterns that scream "AI-generated"):
text-align: center on all headings, descriptions, cards)border-left: 3px solid <accent>)Source: OpenAI "Designing Delightful Frontends with GPT-5.4" (Mar 2026).
Record baseline design score and AI slop score at end of Phase 6.
.rkstack/design-reports/
├── design-audit-{domain}-{YYYY-MM-DD}.md # Structured report
├── screenshots/
│ ├── first-impression.png # Phase 1
│ ├── {page}-annotated.png # Per-page annotated
│ ├── {page}-mobile.png # Responsive
│ ├── {page}-tablet.png
│ ├── {page}-desktop.png
│ ├── finding-001-before.png # Before fix
│ ├── finding-001-after.png # After fix
│ └── ...
└── design-baseline.json # For regression mode
Sort all discovered findings by impact, then decide which to fix:
Mark findings that cannot be fixed from source code (e.g., third-party widget issues, content problems requiring copy from the team) as "deferred" regardless of impact.
For each fixable finding, in impact order:
# Search for CSS classes, component names, style files
# Glob for file patterns matching the affected page
git add <only-changed-files>
git commit -m "style(design): FINDING-NNN -- short description"
style(design): FINDING-NNN -- short descriptionNavigate back to the affected page and verify the fix:
$RKSTACK_BROWSE goto <affected-url>
$RKSTACK_BROWSE screenshot ".rkstack/design-reports/screenshots/finding-NNN-after.png"
$RKSTACK_BROWSE console --errors
$RKSTACK_BROWSE snapshot -D
Take before/after screenshot pair for every fix. Use the Read tool on both PNGs to show the user the improvement.
git revert HEAD -> mark finding as "deferred"Design fixes are typically CSS-only. Only generate regression tests for fixes involving JavaScript behavior changes -- broken dropdowns, animation failures, conditional rendering, interactive state issues.
For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /design-review.
If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing
test patterns, write a regression test encoding the exact bug condition, run it, commit if
passes or defer if fails). Commit format: test(design): regression test for FINDING-NNN.
Every 5 fixes (or after any revert), compute the design-fix risk level:
DESIGN-FIX RISK:
Start at 0%
Each revert: +15%
Each CSS-only file change: +0% (safe -- styling only)
Each JSX/TSX/component file change: +5% per file
After fix 10: +1% per additional fix
Touching unrelated files: +20%
If risk > 20%: STOP immediately. Show the user what you've done so far. Ask whether to continue.
Hard cap: 30 fixes. After 30 fixes, stop regardless of remaining findings.
After all fixes are applied:
Write the report to .rkstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md.
Report template:
# Design Audit -- [domain] -- [YYYY-MM-DD]
## First Impression
[structured critique from Phase 1]
## Summary
- **URL:** [target url]
- **Design Score:** [baseline] -> [final]
- **AI Slop Score:** [baseline] -> [final]
- **Findings:** [total]
- **Fixes Applied:** verified: X, best-effort: Y, reverted: Z
- **Deferred:** [count]
## Quick Wins
[3-5 highest-impact fixes that take <30 minutes each]
## Findings
### FINDING-001: [title]
- **Category:** [hierarchy/typography/color/spacing/interaction/responsive/motion/content/slop/performance]
- **Impact:** [high/medium/polish]
- **Page:** [url]
- **Description:** [what is wrong]
- **Fix Status:** [verified/best-effort/reverted/deferred]
- **Commit:** [SHA if fixed]
- **Files Changed:** [list if fixed]
- **Screenshots:**
- Before: [path]
- After: [path]
[repeat for each finding]
## Design Score Breakdown
| Category | Before | After | Weight |
|----------|--------|-------|--------|
| Visual Hierarchy | [grade] | [grade] | 15% |
| Typography | [grade] | [grade] | 15% |
| Spacing & Layout | [grade] | [grade] | 15% |
| Color & Contrast | [grade] | [grade] | 10% |
| Interaction States | [grade] | [grade] | 10% |
| Responsive | [grade] | [grade] | 10% |
| Content Quality | [grade] | [grade] | 10% |
| AI Slop | [grade] | [grade] | 5% |
| Motion | [grade] | [grade] | 5% |
| Performance Feel | [grade] | [grade] | 5% |
## Inferred Design System
[extracted from Phase 2]
## Ship Readiness
[Assessment of visual quality for shipping]
## PR Summary
> "Design review found N issues, fixed M. Design score X -> Y, AI slop score X -> Y."
If the repo has a TODOS.md:
git revert HEAD immediately.