Skill

Proof Verify

Verifies implemented features against frozen acceptance criteria using independent agents post-build, ensuring unbiased confirmation specs are met.

testing

code-quality

npx claudepluginhub anastasiyaw/claude-code-config

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Plan-based verification: freeze acceptance criteria BEFORE building, verify AFTER with independent agents.

Supporting Assets

references/kb-aware-verification.md

SKILL.md

Similar Skills

dev-verify

Verifies task completion by enforcing fresh automated test runs, runtime evidence review, and spec re-read in /dev workflow Phase 7.

workflows

verification-loop

Runs structured verification pass on implemented changes: checks behavior vs design, happy/edge cases, failure recovery, docs alignment, records gaps/owners. Use for feature completion, risky fixes, milestones.

everything-game-dev-code

verification

Enforces evidence-based verification by running fresh tests, builds, linters, reviewing outputs before claiming work done, committing, or PRing.

oh-my-claude

Stats

Stars100

Forks13

Last CommitApr 22, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Proof Verify

Plan-based verification: freeze acceptance criteria BEFORE building, verify AFTER with independent agents.

When to Use

After completing a feature/fix that was built from a plan
When you need independent confirmation that work meets spec
When the builder should NOT verify their own work
Trigger phrases: "verify against plan", "check the implementation", "proof check", "independent review"

The Pattern

PHASE 1: PLAN (before any code)
  Create .proof/PLAN.md with numbered acceptance criteria
  Each AC: testable, specific, has a verification command or check
  Plan is FROZEN - no changes during build

PHASE 2: BUILD (normal work)
  Implement against the plan
  Mark progress in .proof/PROGRESS.md
  Builder does NOT self-verify

PHASE 3: VERIFY (after build, independent agent)
  Fresh agent reads PLAN.md (never saw the build process)
  Walks through each AC, runs verification commands
  Writes .proof/VERDICT.md with PASS/FAIL per criterion
  If any FAIL → .proof/PROBLEMS.md with specific fixes

PHASE 4: FIX (if needed)
  Builder reads PROBLEMS.md, makes minimal fixes
  Back to PHASE 3 (re-verify)
  Loop until all PASS

Phase 1: Create Plan

Create .proof/PLAN.md in the project root:

# Verification Plan

**Created:** YYYY-MM-DD HH:MM
**Task:** [one-line description]
**Builder:** [session ID or "current"]
**Status:** FROZEN

## Acceptance Criteria

### AC1: [short name]
**Description:** [what must be true]
**Verify:** [exact command or check to run]
**Expected:** [what success looks like]

### AC2: [short name]
**Description:** [what must be true]
**Verify:** [exact command or check to run]
**Expected:** [what success looks like]

### AC3: [short name]
...

## Out of Scope
- [explicitly what this plan does NOT cover]

## Constraints
- [time, resource, or technical constraints]

Rules for good ACs:

Testable - there is a command or check that produces PASS/FAIL
Specific - "function returns correct value" not "code works"
Independent - each AC can be verified without the others
3-8 ACs - fewer than 3 = loopholes, more than 8 = checklist gaming
Frozen - once written, do not modify during build

Phase 2: Build

Normal implementation. The only additions:

Create .proof/PROGRESS.md as you work:

# Build Progress

### AC1: [name]
- [x] Implemented in `src/foo.py:42`
- Files changed: `src/foo.py`, `tests/test_foo.py`

### AC2: [name]
- [x] Implemented in `src/bar.py:18`
- Files changed: `src/bar.py`
- Note: chose approach B because [reason]

After build is complete, write .proof/EVIDENCE.md:

# Evidence

### AC1: [name]
**Command:** `pytest tests/test_foo.py -v`
**Output:**
\```
tests/test_foo.py::test_returns_correct PASSED
tests/test_foo.py::test_handles_edge PASSED
\```
**Result:** PASS

### AC2: [name]
**Command:** `grep -c "TODO" src/bar.py`
**Output:** `0`
**Result:** PASS

Builder collects evidence but does NOT write the verdict. That is the verifier's job.

Phase 3: Verify (Independent Agent)

This is the critical phase. The verifier MUST be:

A fresh agent (new session or subagent) that never saw the build
Given ONLY: PLAN.md + access to the codebase
NOT given: PROGRESS.md, EVIDENCE.md, or any build context

Verifier prompt template

You are an independent verifier. Your job is to check whether
the implementation meets the acceptance criteria in .proof/PLAN.md.

Rules:
1. Read .proof/PLAN.md first. This is your ONLY specification.
2. For each AC, run the verification command yourself.
3. Do NOT read .proof/PROGRESS.md or .proof/EVIDENCE.md
   (those are the builder's claims - you verify independently).
4. Write your verdict to .proof/VERDICT.md in this format:

# Verification Verdict

**Verifier:** [your session ID]
**Date:** YYYY-MM-DD HH:MM
**Plan hash:** [first 8 chars of md5 of PLAN.md]

## Results

### AC1: [name]
**Status:** PASS | FAIL
**Evidence:** [what you saw when you ran the check]
**Notes:** [any observations]

### AC2: [name]
...

## Summary
- Total: N criteria
- Passed: X
- Failed: Y
- **Overall:** PASS | FAIL

5. If any AC fails, also create .proof/PROBLEMS.md:

# Problems

### AC2: [name]
**Expected:** [from PLAN.md]
**Actual:** [what you found]
**Suggested fix:** [smallest change that would fix it]
**Affected files:** [list]

6. Do NOT fix anything. You are read-only. Report only.

How to spawn the verifier

Option A: Subagent (same session)

Agent({
  description: "Independent verification against plan",
  prompt: "[verifier prompt above]",
  mode: "plan"  // read-only first
})

Option B: Fresh session (stronger isolation) Write handoff with instruction: "Start by reading .proof/PLAN.md and running verification."

Option C: Multiple verifiers (highest confidence) Spawn 2-3 verifiers independently. If they disagree on any AC, that AC needs investigation.

Phase 4: Fix Loop

If VERDICT.md shows any FAIL:

Builder reads PROBLEMS.md
Makes minimal fixes (not refactoring, not "while I'm here")
Updates EVIDENCE.md with new evidence for failed ACs
Verifier runs again (Phase 3)
Loop until all PASS

Typical: 1-2 fix rounds. If 3+ rounds on same AC → the AC itself might be wrong. Revisit PLAN.md.

File Structure

.proof/
  PLAN.md        # frozen acceptance criteria (Phase 1)
  PROGRESS.md    # builder's notes (Phase 2)
  EVIDENCE.md    # builder's evidence (Phase 2)
  VERDICT.md     # verifier's verdict (Phase 3)
  PROBLEMS.md    # verifier's findings (Phase 3, if failures)

Gotchas

Builder reads VERDICT, not the reverse. Verifier never sees builder's evidence. This prevents confirmation bias.
"PASS with concerns" is FAIL. Either it passes or it doesn't. No soft passes.
Plan hash in verdict. If someone edited PLAN.md mid-build, the hash won't match. Catch.
Time limit. If verification takes >30 min, the ACs are too vague. Rewrite them.
Don't verify style. ACs should be functional ("function returns X"), not stylistic ("code is clean"). Style is for code review, not proof loop.

Troubleshooting

Symptom	Cause	Fix
Verifier passes everything	ACs too vague	Rewrite with specific commands
3+ fix rounds on same AC	AC is wrong or untestable	Revisit PLAN.md
Verifier disagrees with builder's evidence	Different env or stale state	Both run from clean state
Builder keeps editing PLAN.md	Not frozen	Hash check catches this

Sources

Proof Loop (Principle 02) - the theoretical foundation
OpenClaw-RL - spec freeze → build → fresh verify
Agent-R - failed-then-fixed trajectories
oh-my-claudecode Ralph - PRD-driven persistence (practical inspiration)