From ck
Implements validation-first design with 6-gate pipeline to verify AI agent outputs meet specs via automated checks like compilation, unit tests, and acceptance criteria.
npx claudepluginhub juliusbrussee/cavekitThis skill uses the workspace's default tool permissions.
Every spec requirement must include testable acceptance criteria that an agent can automatically verify. This is not optional — it is the foundation that makes SDD work.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides agent creation for Claude Code plugins with file templates, frontmatter specs (name, description, model), triggering examples, system prompts, and best practices.
Every spec requirement must include testable acceptance criteria that an agent can automatically verify. This is not optional — it is the foundation that makes SDD work.
Why? AI agents are non-deterministic. Without automated validation, there is no way to know whether an agent's output is correct. Validation gates turn "the agent generated some code" into "the agent generated code that provably meets the specification."
The validation-first rule applies at every level:
Every implementation must pass through six ordered checkpoints. Each successive gate is more expensive to run, so catching failures early saves significant time.
What: The project compiles/transpiles without errors.
# Generic pattern — substitute your project's build command
{BUILD_COMMAND}
Why it matters: If the code does not build, nothing else can be validated. This is the cheapest possible check.
What it catches:
Acceptance criteria pattern:
- [ ] `{BUILD_COMMAND}` completes with exit code 0
- [ ] No warnings related to {domain} (warnings in other domains are acceptable)
What: Unit tests pass on all changed files.
# Generic pattern
{TEST_COMMAND}
# Or targeted at changed files
{TEST_COMMAND} --filter {changed-files}
Why it matters: Unit tests verify individual functions and modules in isolation. They are fast, deterministic, and catch logic errors.
What it catches:
Acceptance criteria pattern:
- [ ] All existing unit tests pass
- [ ] New unit tests cover all acceptance criteria for R{N}
- [ ] No test relies on external services or network access
What: End-to-end and integration tests verify that components work together.
# Generic pattern
{TEST_COMMAND} --e2e
# Or with a specific test runner
{E2E_TEST_COMMAND}
Why it matters: Unit tests verify components in isolation. Integration tests verify they work together. Many bugs only appear at integration boundaries.
What it catches:
Acceptance criteria pattern:
- [ ] User can complete {workflow} end-to-end
- [ ] API endpoint returns correct response for {scenario}
- [ ] Error propagation works correctly from {source} to {destination}
What: Performance benchmarks pass defined thresholds.
# Generic pattern
{BENCHMARK_COMMAND}
# Or specific checks
{TEST_COMMAND} --performance
Why it matters: Functional correctness is necessary but not sufficient. Performance regression can make a feature unusable even if it produces correct output.
What it catches:
Acceptance criteria pattern:
- [ ] API response time < {N}ms at p95 under {M} concurrent users
- [ ] Page load time < {N}s on simulated 3G connection
- [ ] Memory usage does not exceed {N}MB during {operation}
- [ ] No operation blocks the main thread for > {N}ms
Note: Not every task needs performance gates. Apply Gate 4 when:
What: The application starts successfully and basic smoke tests pass.
# Generic pattern — start the application
{START_COMMAND}
# Verify it is running
curl -f http://localhost:{PORT}/health
# Or run smoke tests
{SMOKE_TEST_COMMAND}
Why it matters: Code can build and pass all tests but fail to start. Launch verification catches configuration issues, missing environment variables, port conflicts, and startup race conditions.
What it catches:
Acceptance criteria pattern:
- [ ] Application starts with `{START_COMMAND}` and responds to health check
- [ ] Main screen/page renders without errors
- [ ] No error-level entries in application logs during startup
- [ ] Application shuts down gracefully on interrupt signal
What: A human reviews the output for quality, design intent, and requirements that are difficult to automate.
Why it matters: Some things cannot be automated — UX quality, architectural elegance, naming consistency, documentation clarity. Gate 6 is where the human acts as the final quality filter.
What it catches:
How it works in practice:
Acceptance criteria pattern:
- [ ] Implementation reviewed by human for spec intent alignment
- [ ] No architectural concerns raised
- [ ] Code style consistent with project conventions
| Gate | Purpose | Command Pattern | Typical Duration | Automated? |
|---|---|---|---|---|
| 1. Compilation | Code compiles cleanly | {BUILD_COMMAND} | Seconds | Yes |
| 2. Unit Verification | Individual functions behave correctly | {TEST_COMMAND} | Seconds-Minutes | Yes |
| 3. Integration | Modules cooperate as expected | {E2E_TEST_COMMAND} | Minutes | Yes |
| 4. Benchmarks | Speed and resource use within budget | {BENCHMARK_COMMAND} | Minutes | Yes |
| 5. Smoke Test | Application boots and responds | {START_COMMAND} + health check | Seconds | Yes |
| 6. Manual Audit | Meets design intent and quality bar | Human inspection | Variable | No |
For the full validation gate reference with detailed examples, see
references/validation-gates.md.
Every spec requirement must map to at least one validation gate. When writing specs (see ck:cavekit-writing), each acceptance criterion should indicate which gate verifies it.
### R1: User Authentication
**Acceptance Criteria:**
- [ ] Valid credentials return session token — **Gate 2** (unit test)
- [ ] Invalid credentials return 401 error — **Gate 2** (unit test)
- [ ] Session token grants access to protected endpoints — **Gate 3** (integration)
- [ ] Login page renders within 2s — **Gate 4** (performance)
- [ ] Application starts with auth module loaded — **Gate 5** (launch)
If a requirement cannot be mapped to any gate, it has one of two problems:
Either way, an unmapped requirement will not be reliably met by an agent.
Phase gates are mandatory verification checkpoints between Hunt phases. They ensure that the output of one phase is solid before the next phase builds on it.
| Transition | Gate Condition | How to Verify |
|---|---|---|
| Spec → Plan | All domains have specs with testable acceptance criteria | Review cavekit-overview.md; every R{N} has AC items |
| Plan → Implement | Plans reference specs, define sequence, include test strategies | Review plan files; every task maps to spec requirements |
| Implement → Iterate | Code builds (Gate 1), tests pass (Gate 2), impl tracking is current | Run {BUILD_COMMAND} and {TEST_COMMAND}; check impl tracking |
| Iterate → Monitor | Convergence detected: changes decreasing iteration-over-iteration | Compare diffs across last 3-5 iterations |
| Monitor → Spec | Gap found or new requirement identified | Gap analysis identifies unmet acceptance criteria |
Phase gates are enforced by the iteration loop. When a prompt includes phase gate checks, the agent:
## Exit Criteria (Phase Gate)
Before reporting completion:
- [ ] `{BUILD_COMMAND}` succeeds with exit code 0
- [ ] `{TEST_COMMAND}` passes with no new failures
- [ ] All files created/modified are listed in impl tracking
- [ ] All dead ends encountered are documented
- [ ] Test health table is updated with current counts
When working with agent teams (multiple agents dispatched via the Agent tool), the merge protocol ensures that integrating work from different agents does not break validation gates.
Agent A completes work in its isolated branch
Agent B completes work in its isolated branch
Agent C completes work in its isolated branch
Merge sequence (one at a time):
1. Merge Agent A's branch → main
2. Run: {BUILD_COMMAND} → must pass
3. Run: {TEST_COMMAND} → must pass
4. Run: Launch verification → must pass
5. If all pass → proceed
6. If any fail → fix before merging next branch
7. Merge Agent B's branch → main
8. Run: {BUILD_COMMAND} → must pass
9. Run: {TEST_COMMAND} → must pass
10. ...repeat for each agent branch
Merging all agent branches simultaneously and then running tests makes it impossible to determine which merge caused a failure. Merging one at a time with validation between each merge pinpoints failures immediately.
git branch -D <branch>).Completion signals are specific strings that agents emit when all exit criteria for a task or phase are met. They enable automation to detect when an agent is done.
The prompt defines the signal:
When ALL exit criteria are met, output exactly:
<all-tasks-complete>
The agent emits the signal after verifying all exit criteria
The iteration loop detects the signal and stops iterating
## Exit Criteria
Complete all of the following before emitting the completion signal:
- [ ] All T- tasks are DONE or BLOCKED with documented blockers
- [ ] `{BUILD_COMMAND}` succeeds
- [ ] `{TEST_COMMAND}` passes with no new failures
- [ ] Implementation tracking is updated
- [ ] All dead ends are documented
When ALL criteria above are met, output:
<all-tasks-complete>
If you cannot meet all criteria, document what is blocking
and do NOT output the completion signal.
Write or generate tests before implementing the feature. The test defines what "correct" means.
1. Read spec requirement R{N} acceptance criteria
2. Generate test cases that verify each criterion
3. Run tests → all fail (RED)
4. Implement the feature
5. Run tests → all pass (GREEN)
6. Refactor if needed
This is TDD-within-SDD. See superpowers:test-driven-development for the existing TDD skill.
Run gates in order. If an earlier gate fails, do not run later gates.
Gate 1 (Build) → FAIL → fix build errors → retry Gate 1
Gate 1 (Build) → PASS → Gate 2 (Unit Tests)
Gate 2 (Tests) → FAIL → fix failing tests → retry Gate 2
Gate 2 (Tests) → PASS → Gate 3 (Integration)
...
Earlier gates are cheaper. Fixing a build error costs seconds. Fixing an integration error costs minutes. Fix cheap problems first.
Not every iteration needs all gates. Use progressive depth based on the phase:
| Phase | Required Gates | Optional Gates |
|---|---|---|
| Early Implement | 1 (Build), 2 (Unit) | — |
| Mid Implement | 1, 2, 3 (Integration) | 4 (Performance) |
| Late Implement | 1, 2, 3, 5 (Launch) | 4 |
| Pre-Release | All 1-6 | — |
When a gate that previously passed starts failing, treat it as a P0 issue:
superpowers:verification-before-completionThe existing verification-before-completion skill provides a general framework for verifying work before marking it done. Validation-first design extends this with the specific 6-gate pipeline and phase gate system used in SDD.
How they work together:
superpowers:verification-before-completion ensures the agent checks its workck:validation-first defines exactly what checks to run and in what orderck:cavekit-writingEvery spec requirement must have acceptance criteria that map to validation gates. The spec-writing skill defines how to write those criteria. Validation-first design defines how to verify them.
ck:impl-trackingValidation results are recorded in the implementation tracking document's Test Health table. Gate failures become Issues. Gate-related dead ends are documented in the Dead Ends section.
ck:methodologyValidation gates operate continuously across all Hunt phases. Phase gates control transitions between phases. The iteration loop uses gate results as convergence signals.