Help us improve
Share bugs, ideas, or general feedback.
From agentic-development-workflow
Validates any AEP artifact (product context, architecture, stories, code, documents) using a generator/evaluator pattern with parallel agents. Run after any AEP generation phase or when the user requests validation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentic-development-workflow:validateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run a generator/evaluator pattern against any artifact produced by the AEP workflow. The generator attempts to use the artifact (dry-run), the evaluator checks it against reality (codebase, constraints, downstream protocols), and the results are consolidated into fixes applied to the artifact itself.
Share bugs, ideas, or general feedback.
Run a generator/evaluator pattern against any artifact produced by the AEP workflow. The generator attempts to use the artifact (dry-run), the evaluator checks it against reality (codebase, constraints, downstream protocols), and the results are consolidated into fixes applied to the artifact itself.
"When asked to evaluate work they've produced, agents tend to respond by confidently praising the work — even when, to a human observer, the quality is obviously mediocre." — Anthropic, "Harness Design for Long-Running Application Development"
Why separate agents: The agent that produced an artifact cannot honestly evaluate it. The generator/evaluator separation is the single most impactful quality improvement in agentic workflows. This skill applies the gen/eval pattern to product artifacts.
Uses the gen/eval utility pattern. Read these reference files for the underlying framework:
.claude/skills/aep-gen-eval/references/scoring-framework.md.claude/skills/aep-gen-eval/references/agent-contracts.md.claude/skills/aep-gen-eval/references/eval-protocol.md.claude/skills/aep-gen-eval/references/findings-format.mdWhere this fits:
/aep-envision → /aep-map → /aep-validate → /aep-dispatch → /aep-design → /aep-launch → /aep-build → /aep-wrap
▲ you are here
Also usable after any phase:
/aep-envision → /aep-validate (validate product context)
/aep-map → /aep-validate (validate architecture + stories)
/aep-design → /aep-validate (validate specs before launch)
/aep-build → /aep-validate (already built into Phase 5 — use that instead)
Session: Main, can be autonomous or interactive
Input: Any AEP artifact (product/index.yaml + product-context.yaml in split mode, or product-context.yaml alone in v1 mode, or OpenSpec change, design doc, code)
Output: The same artifact, with issues fixed. A validation report appended to changelog.
Identify what is being validated:
# Check for product context
ls product-context.yaml 2>/dev/null
# Check for OpenSpec changes
ls openspec/changes/ 2>/dev/null
# Check for design artifacts
ls .dev-workflow/ 2>/dev/null
If the user doesn't specify what to validate, auto-detect based on the most recently modified artifact.
File Resolution:
ls product/index.yaml 2>/dev/null && echo "SPLIT MODE" || echo "V1 MODE"
product-context.yaml only.The skill operates in one of four modes based on the artifact type. Each mode configures which agents to spawn and what they check.
When: After /aep-envision or /aep-map — validating product-context.yaml (and product/index.yaml in split mode)
Split-mode cross-file checks:
stories[].layer values must exist in product/index.yaml product.layers[]stories[].activity values must exist in product/index.yaml product.activities[]calibration.plan[].dimensions[] must reference product/index.yaml product.quality_dimensions[]opportunity or product section should exist in product-context.yaml when split mode is activeproduct/index.yaml must have personas, capabilities, and product sectionsMode A runs two passes — product design quality first, then technical correctness:
Agents: Product Design Evaluator + Vision Alignment Checker
| Agent | Role | What it checks |
|---|---|---|
| Product Design Evaluator | Review against user story mapping principles | Walking skeleton validity, layer ordering, INVEST compliance, dependency graph quality, activity coverage, narrative coherence |
| Vision Alignment Checker | Trace stories to opportunity brief | Every story maps to a stated user need, no scope creep, JTBD coverage |
Read the Product Design Evaluator prompt from .claude/skills/aep-gen-eval/references/agent-contracts.md.
Score using the story mapping dimensions from .claude/skills/aep-gen-eval/references/scoring-framework.md (Walking Skeleton Validity, Layer Ordering, Vision Alignment, INVEST Compliance).
Pass 1 hard failures:
Pass 1 activity checks (if product.activities exists):
layer_introduced: 0 has at least one Layer 0 story. An activity with no stories is a gap in the walking skeleton.activity references a valid activity id from product.activities.order — they should form a coherent user narrative: "User [activity 1], then [activity 2], then..." If it doesn't flow, the backbone needs restructuring.Agents: Generator + Evaluator + Protocol Checker
| Agent | Role | What it checks |
|---|---|---|
| Generator | Dry-run each story/layer | Can each story be implemented? Missing details, ambiguous criteria, dependency gaps |
| Evaluator | Compare design vs codebase | Package versions, import paths, existing patterns, file existence, API compatibility |
| Protocol Checker | Verify downstream compatibility | Dispatch-required fields, DAG validity, scoring compatibility, file conflict detection |
Why two passes: Pass 1 catches product design problems (wrong stories, bad layering, vision drift). Pass 2 catches technical problems (missing fields, broken references, codebase mismatches). Both are required before dispatching to autonomous agents — the agents will faithfully build whatever you give them, right or wrong.
When: After /aep-design — validating OpenSpec artifacts (proposal, design, specs, tasks)
Agents: Generator + Evaluator
| Agent | Role | What it checks |
|---|---|---|
| Generator | Walk through implementation mentally | Are tasks implementable? Missing technical details, unclear acceptance criteria |
| Evaluator | Check specs against codebase | Do referenced files exist? Are API assumptions correct? Do types match? |
When: After implementation — validating code changes
Agents: Generator + Evaluator (same as /aep-build Phase 5)
| Agent | Role | What it checks |
|---|---|---|
| Generator | Review code against spec | Does the code match what was specified? Missing features, incomplete flows |
| Evaluator | Test the running application | Functional testing, edge cases, security, performance |
Note: For code validation in a workspace, prefer
/aep-buildPhase 5 which has the full evaluator loop (spawned worktree-bound via executor.spawn_evaluator). Use this skill for code review on the integration branch or for lighter validation.
When: Validating any structured document (architecture doc, RFC, migration plan) Agents: Generator + Evaluator
| Agent | Role | What it checks |
|---|---|---|
| Generator | Follow the document's instructions | Can someone execute this document? Missing steps, ambiguous instructions |
| Evaluator | Check claims against reality | Do referenced tools/files/APIs exist? Are version numbers correct? |
For each agent, prepare a focused context package. Irrelevant context degrades evaluation quality.
The generator needs:
The generator does NOT need:
The evaluator needs:
The evaluator does NOT need:
The protocol checker needs:
Launch all agents in parallel. Each agent works independently — they do not see each other's output.
You are a GENERATOR agent performing a dry-run validation. Your job is to mentally
walk through using this artifact and identify gaps that would cause problems.
## The Artifact
{artifact_content}
## Your Task
For each item in this artifact, attempt to mentally execute it and report:
1. Can it be done? Yes/No
2. Missing details — anything vague or ambiguous that would cause guesswork
3. Dependency gaps — does this item have everything it needs?
4. Assumption mismatches — any implicit assumptions that could be wrong?
## Constraints
{technical_constraints}
## Output Format
For each item, output a brief assessment. Focus on PROBLEMS ONLY — don't describe
what's fine. At the end, produce a consolidated list of ALL changes needed.
You are an EVALUATOR agent. Your job is to compare this artifact against the
ACTUAL state of the codebase and find mismatches. Read real files and verify claims.
## The Artifact
{artifact_content}
## What to Verify
{verification_checklist}
## Your Task
Read the actual files referenced in this artifact. For each claim, check:
1. Does the referenced file/function/type exist?
2. Does it have the signature/shape the artifact assumes?
3. Are version numbers and dependency versions correct?
4. Do import paths resolve correctly?
## Output Format
Report ALL mismatches between the artifact and reality. Be specific — include
file paths and line numbers. End with a severity-ranked list of required fixes.
You are a PROTOCOL CHECKER. Your job is to verify this artifact is compatible
with the downstream protocol that will consume it.
## The Artifact
{artifact_content}
## The Downstream Protocol
{protocol_specification}
## Your Task
1. Are all required fields present on every item?
2. Is the dependency graph a valid DAG (no cycles, no missing references)?
3. Can the scoring/ranking algorithm be computed with the available fields?
4. Are there file-level conflicts between parallel items?
5. Can the downstream system create its required artifacts from this data?
## Output Format
Produce a compatibility report with specific fixes needed in the artifact.
After all agents return, consolidate their findings into a single action list.
| Category | Description | Action |
|---|---|---|
| Blocking | Would stop downstream consumers from working | Fix immediately |
| Important | Would cause friction, confusion, or rework | Fix before proceeding |
| Minor | Cosmetic, missing optional fields, documentation gaps | Fix if time permits |
Multiple agents may find the same issue from different angles. Merge these into a single finding with the combined evidence.
Show the consolidated findings with counts:
Validation complete: {N} blocking, {M} important, {K} minor issues found.
Blocking:
1. [issue] — found by Generator + Evaluator
2. [issue] — found by Protocol Checker
Important:
3. [issue] — found by Generator
...
Apply all blocking and important fixes to the artifact. Minor fixes are optional.
Rules for fixes:
open_question with a default assumption- date: <today>
author: aep-validate
summary: >
Generator/evaluator validation. Found {N} blocking, {M} important, {K} minor issues.
Fixed: [brief list of key fixes].
# Resolve $BASE (integration branch) — see git-ref "Integration Branch" (override → develop → main)
BASE=$(git config --get aep.integration-branch 2>/dev/null || true)
[ -z "$BASE" ] && { git show-ref --verify --quiet refs/heads/develop \
|| git show-ref --verify --quiet refs/remotes/origin/develop; } && BASE=develop
BASE=${BASE:-main}
git pull --ff-only origin "$BASE"
git add <validated-files>
git commit -m "fix: validate {artifact-name} — {N} issues found and fixed"
git push origin "$BASE"
When the agents evaluate, they should consider these dimensions (adapted from the evaluator criteria in /aep-build). Not all dimensions apply to every mode.
| Dimension | What to check |
|---|---|
| Completeness | Are all required sections present? Are enums listed explicitly? Are defaults specified? |
| Consistency | Do field names match across sections? Do stories reference valid module IDs? |
| Implementability | Can each story be implemented with the information given? Missing technical details? |
| Security | Are there security implications in the design that aren't addressed? (auth, data access, PII) |
| Downstream compatibility | Does the artifact work with its consumers? (dispatch, design, build) |
| Dimension | What to check |
|---|---|
| Completeness | Do specs cover all capabilities in the proposal? Are acceptance criteria testable? |
| Feasibility | Can the tasks be implemented with the stated approach? Are file paths correct? |
| Scope control | Are tasks properly bounded? Any scope creep beyond the proposal? |
See references/evaluator-criteria.md for the full 5-dimension scoring framework (Completeness, Correctness, UX Quality, Security, Code Quality).
| Dimension | What to check |
|---|---|
| Accuracy | Are all factual claims correct? Do referenced resources exist? |
| Executability | Can someone follow this document step by step? Are commands correct? |
| Completeness | Are there missing steps or assumptions? |
/aep-build Phase 5 — use the built-in evaluator loop instead (it has executor.spawn_evaluator, verification JSON, scoring framework)Create a validation-criteria.md file in your project's .dev-workflow/ directory to add project-specific validation checks. The agents will read this file if it exists.
# Project Validation Criteria
## Additional checks for Mode A (Product Context)
- All stories must have `business_value` field (required by our dispatch)
- Complexity must use S/M/L format (not small/medium/large)
- All file paths must be verified against the actual filesystem
## Additional checks for Mode B (Design)
- All API endpoints must include Zod validation schemas
- Database schema changes must include migration plan
By default, Mode A spawns 3 agents, Modes B-D spawn 2. You can adjust:
After validation, proceed to the appropriate downstream skill:
Product context validated → /aep-dispatch
Design validated → /aep-launch
Code validated → create PR (or /aep-build Phase 9)
Document validated → publish/share
npx claudepluginhub memorysaver/agentic-engineering-patterns --plugin agentic-development-workflowProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.