Skill

prototype

Builds and iteratively improves interactive HTML/CSS/JS prototypes like pages, dashboards, flows, tools, and visualizations using Generator-Evaluator sub-agents.

Html

Css

Javascript

frontend

design

npx claudepluginhub bayramannakov/ai-native-product-skills --plugin prototype

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Build any kind of working interactive prototype, then autonomously improve it using the Generator-Evaluator loop.

SKILL.md

Similar Skills

arn-spark-clickable-prototype

Generates clickable interactive prototypes with linked screens and validates UX via Playwright interaction tests, per-criterion scoring, expert reviews, and iterative cycles.

4 files

arn-spark

prototype

71.6k

Builds throwaway prototypes to validate designs: terminal apps for state/logic questions or toggleable UI variants for mockups and exploration.

2 files

mattpocock-skills

vibe-coding

Use this skill when the user asks to "vibe code this", "build this with AI", "help me use Cursor/v0/Bolt to build this", "vibe coding from my PRD", "how do I code this with AI", "turn this spec into code", or wants guidance on using AI coding tools (Cursor, GitHub Copilot, v0, Bolt, Lovable, Claude Artifacts) to prototype or build a feature from a product spec. This is a coaching skill — it helps the PM get the most out of AI coding tools, not write code directly.

pm-copilot

Stats

Stars9

Forks1

Last CommitMar 29, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Autonomous Prototype Builder

Build any kind of working interactive prototype, then autonomously improve it using the Generator-Evaluator loop.

The user describes what to build via $ARGUMENTS. This can be anything interactive: a landing page, dashboard, onboarding flow, admin panel, mobile layout, data visualization, internal tool, form wizard, settings page, email template, or any other HTML/CSS/JS artifact.

You are the Orchestrator - you manage the pipeline and spawn independent sub-agents for generation and evaluation. You never build or evaluate directly.

Architecture (from Anthropic's harness design, March 2026):

  YOU (Orchestrator)
    ├── spawns → Generator sub-agent (builds the prototype)
    ├── spawns → Evaluator sub-agent (critiques independently)
    └── manages the loop: feedback flows via files only

Before Starting

Read CLAUDE.md if it exists - use any product/project context available.
Scan the workspace for relevant context files: competitive-analysis.md, design specs, brand guidelines, user research, existing prototypes. Use whatever is available - the skill works with minimal context, it just produces better results with more.
Check for prototyping-criteria.md - if it exists, use those evaluation criteria. If not, run Phase 0 to calibrate.

Phase 0: CALIBRATE EVALUATION CRITERIA

If prototyping-criteria.md does not exist, run this calibration step.

Step 1: Infer prototype type and suggest criteria

Analyze what the user wants to build. Different prototypes need different criteria:

Universal criteria (always include):

Clarity - Can someone understand what this is and how to use it within 5 seconds?
Visual Identity - Does it have a distinct, intentional look - not a generic template?
Interaction Completeness - Do all clickable/interactive elements actually work?

Then suggest 2-4 criteria based on prototype type:

Prototype Type	Suggested Criteria
Landing/marketing page	Competitive Differentiation, Persona Fit, CTA Effectiveness
Dashboard/analytics	Data Hierarchy, Scannability, Actionable Insights
Onboarding/wizard flow	Step Clarity, Progress Feedback, Error Recovery
Internal tool/admin	Efficiency (tasks per click), Information Density, Navigation
Mobile layout	Touch Target Size, Thumb Zone Compliance, Content Priority
Data visualization	Readability, Accuracy of Representation, Annotation Quality
Form/input heavy	Validation Feedback, Field Grouping Logic, Completion Motivation
E-commerce/marketplace	Trust Signals, Purchase Flow Friction, Product Presentation

Step 2: Ask the user to confirm

Present the suggested criteria and ask:

"Here are the evaluation criteria I'll use. Want to adjust, add, or remove any?"

Accept one round of edits, then save to prototyping-criteria.md.

Phase 1: PLAN

Generate prototype-spec.md from the user's description + available context:

What this prototype IS (type, scope, purpose)
Who sees it (target user or stakeholder)
Core screens/views (2-3 max)
Key interactions (max 5)
Content approach (real copy if product context available, realistic placeholder if not)
What makes this prototype worth building (the hypothesis or question it answers)

Keep the spec to ONE page.

Phase 2: BUILD (Generator sub-agent)

Spawn a Generator sub-agent with:

prototype-spec.md
Any available context files (CLAUDE.md, competitive-analysis.md, brand guidelines, etc.)
If round 2+: the Evaluator's feedback from the previous round

Generator sub-agent prompt:

You are a PROTOTYPE GENERATOR. Build a working interactive prototype
from the spec provided.

Read: prototype-spec.md and any context files provided.
[If round 2+] Read the evaluator feedback and implement the top 2 fixes.
[If round 2+] Also make ONE creative enhancement the evaluator didn't ask for.

IMPORTANT: If the /frontend-design skill is available, use it.
It produces distinctive, non-generic visual design and avoids
the default template look that kills Visual Identity scores.

Build as a multi-file project:
- index.html (main structure)
- styles.css (distinct visual identity, NOT generic defaults)
- app.js (all interactions working)

Requirements:
- All interactive elements must be functional (buttons, links, navigation, forms)
- Use realistic content appropriate to the prototype type
- Include sample data that feels real
- Save to prototypes/[name]/

Open the prototype in the browser when done.

[If round 2+] Save iteration-N-changelog.md noting what changed and why.

What the Generator does NOT receive:

The Evaluator's internal reasoning or scoring methodology
Other iterations' code (only the latest evaluation feedback)
Your orchestration notes

Phase 3: EVALUATE (Evaluator sub-agent)

Spawn an Evaluator sub-agent. This MUST be independent - separate context window, no access to the Generator's thought process.

Evaluator sub-agent prompt:

You are a PROTOTYPE EVALUATOR. You are independent and critical.
You did NOT build this prototype. You have never seen it before.

RULES:
- Score honestly. A 3/10 is valid. A 2/10 is valid.
- If something looks mediocre, say so. Do not hedge.
- Quote specific text/code as evidence for every score.
- Feedback must be actionable: say WHAT to fix and WHERE.
- Do not praise effort. Judge the output only.

Read the evaluation criteria: prototyping-criteria.md
Read any available context: CLAUDE.md (if exists)
Read the prototype: prototypes/[name]/

EVALUATION PROCESS:

1. VISUAL REVIEW:
   Open the prototype in a browser.
   Take a screenshot of the initial state.
   First impression in one sentence.

2. FUNCTIONAL VERIFICATION (browser automation preferred):
   - Click every button, link, and interactive element
   - Verify expected behavior after each click
   - Screenshot any broken states
   - Resize to 375px width and screenshot (mobile check)
   - Check for: broken layouts, overlapping text, dead buttons, console errors

   Fallback (if browser automation unavailable):
   - Read HTML/CSS/JS directly
   - Verify click handlers have functions, links have targets

3. CRITERIA SCORING:
   For each criterion in prototyping.md:
   - Score 1-10 (full range)
   - Specific evidence (quoted from the prototype)
   - One fix that would improve by 2+ points

4. SYNTHESIS:
   Save evaluation-round-N.md with:
   - Scores table
   - Browser verification results
   - Top 2 highest-impact improvements
   - Verdict: PASS (average >= 7, no criterion below 5) or FAIL
   - One thing the Generator won't want to hear

What the Evaluator does NOT receive:

The Generator's reasoning, planning, or spec
Changelogs or iteration notes
Any justification for design choices
Previous evaluation scores or feedback (prevents anchoring bias - each round is a fresh, independent assessment)

Why no previous evaluations: Anthropic's research calibrates evaluators against fixed criteria and few-shot examples, not against their own prior scores. Passing previous scores creates anchoring bias - the Evaluator adjusts incrementally around its last score instead of judging the prototype fresh. The Orchestrator (you) is the one who compares scores across rounds and tracks improvement.

Why this separation matters: Anthropic's research (March 2026): "When asked to evaluate their own work, agents confidently praise it - even when quality is obviously mediocre." Sub-agent isolation prevents this.

Phase 4: THE LOOP

Repeat Phases 2-3. You (Orchestrator) manage the handoff:

Round N:
  1. Read evaluation-round-(N-1).md yourself (Orchestrator)
  2. Spawn Generator sub-agent with: spec + evaluation feedback + context
  3. Generator produces updated prototype + changelog
  4. Spawn Evaluator sub-agent with: prototype + criteria + CLAUDE.md ONLY
     (NO previous evaluations - fresh eyes every round)
  5. Evaluator produces evaluation-round-N.md
  6. YOU compare scores across rounds and check for improvement/regression
  7. Check verdict: PASS → finalize. FAIL → round N+1.

Score tracking is YOUR job as Orchestrator. After each Evaluator round, you:

Compare this round's scores to previous rounds
Note which criteria improved, regressed, or stalled
Decide whether the Generator's changes actually helped
Feed only the CURRENT evaluation's feedback to the next Generator round

Loop rules:

Minimum 3 rounds (creative enhancements in round 3 often produce the best leap)
Maximum 5 rounds (diminishing returns after that)
If any criterion below 4 after round 2: focused Generator round on ONLY that criterion
Track all scores in a running summary

Communication is via FILES only. Generator writes prototype files + changelogs. Evaluator writes evaluation reports. You pass file paths between them.

Phase 5: FINALIZE

After the loop:

improvement-log.md - All rounds summarized:
```
| Criterion | R1 | R2 | R3 | Delta |
|-----------|----|----|----|----- -|
```
Plus: what changed each round, creative enhancements, where quality plateaued.
product-passport.md - Stakeholder-ready summary:
- What this prototype is and who it's for
- Key hypothesis or question it answers
- 3 things that need real validation (not AI opinion)
- Prototype file path
- Recommended next step
Open the final prototype in the browser.

Output Summary

Report to the user:

Round 1 → Final score (e.g., "3.2 → 7.4")
Rounds run and sub-agents spawned
Most surprising creative enhancement
Top remaining weakness
Links to all files

Design Principles

Three-agent separation is non-negotiable. Orchestrator manages. Generator builds. Evaluator critiques. No agent plays two roles.
File-based communication. No context leaks between Generator and Evaluator.
Browser verification preferred. A prototype that looks right in code but breaks in the browser teaches nothing.
Criteria calibration up front. Different prototypes need different criteria. Suggest based on type, let the user shape them.
Creative enhancements are mandatory. Each Generator round includes one thing the Evaluator didn't ask for. This produces creative leaps.
Context improves quality but isn't required. The skill works with just a description. CLAUDE.md, competitive analysis, brand guidelines - each layer of context raises the ceiling.
The loop has diminishing returns. Quality plateaus around round 3-4. If scores aren't improving, stop.
Every prototype should answer a question. If you can't state what you're testing or demonstrating, the prototype is decoration, not discovery.