Help us improve
Share bugs, ideas, or general feedback.
From vibekit
Verifies implementation against a spec with evidence-based checks and three independent self-consistency passes. Ensures every requirement is backed by verbatim evidence before merge.
npx claudepluginhub rizukirr/vibekit --plugin vibekitHow this skill is triggered — by the user, by Claude, or both
Slash command
/vibekit:verify-gateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Prove the feature is done. Not guess, not assume — prove.
Verifies implementation completion by running tests, code hygiene review, spec compliance validation, and drift checks; blocks claims on failures. Use before commits or merges.
Enforces evidence-based verification by running fresh tests, builds, linters, reviewing outputs before claiming work done, committing, or PRing.
Verifies feature completion by requiring automated tests that prove functionality works. Enforces phase gates and spec alignment before acceptance.
Share bugs, ideas, or general feedback.
Prove the feature is done. Not guess, not assume — prove.
This skill produces a verification report that: (a) lists every spec requirement, (b) names the evidence that each one is satisfied, (c) quotes that evidence verbatim, and (d) reaches a verdict through three independent self-consistency passes. No requirement is marked satisfied without evidence. No evidence is summarized or paraphrased.
exec-dispatch reports all tasks complete.finish-branch, merge, tag, release, or any outward-facing action.docs/specs/YYYY-MM-DD-<topic>-design.md or a path the user specifies.docs/plans/YYYY-MM-DD-<feature>.md.If any prerequisite fails, stop and surface what is missing.
Every claim in the verification report takes the form:
REQUIREMENT: <quoted from spec, exact>
STATUS: satisfied | not satisfied | partial | untestable
EVIDENCE:
<verbatim quote from test output, git log, file contents, or command output,
with the source cited as file:line or command-and-date>
If the EVIDENCE field cannot be filled with a real, quotable artifact, STATUS is not satisfied. No exceptions for "obvious" requirements.
Everything in the verification report that qualifies as evidence is verbatim:
Headers, section labels, and meta-narration can be terse. Evidence never is.
Read the spec. Extract every testable requirement. Put them in a checklist, each quoted verbatim from the spec. Also include:
If a spec section contains a requirement that is not directly testable (e.g., "the UI should feel responsive"), mark it untestable and propose a proxy metric (e.g., "first paint under 200ms on a reference device") that the user can confirm.
For each requirement, identify and collect:
sha — subject.Use the report-filter mindset: drop pleasantries, keep evidence byte-for-byte.
For each requirement, perform the verdict check three times, independently. Independence is enforced by dispatching each pass as a separate fresh subagent. In-session reasoning rounds are not a substitute — context leaks between rounds and the first answer anchors the rest.
Dispatch brief per pass:
ROLE: Verification pass N of 3 for requirement R-k.
TASK: Decide whether the supplied evidence satisfies the supplied requirement.
CONSTRAINTS:
- Read-only. No edits, no commits, no external tool calls beyond reading the cited artifacts.
- Base the decision on the evidence as quoted. Do not consult other requirements, other passes, or external knowledge.
- If the evidence is insufficient to answer, return `partial` with a one-line reason.
CONTEXT:
Requirement (verbatim from spec): <quoted sentence>
Evidence:
- <artifact ref + verbatim quote>
- <artifact ref + verbatim quote>
OUTPUT:
{
"pass": N,
"verdict": "yes | no | partial",
"reason": "<one line; required if verdict is no or partial>"
}
The three passes are dispatched with no shared context, no awareness of each other's verdicts. On each return, collect the three verdict values and apply the verdict rule below.
Cost control for large specs. Three passes per requirement means 3×N subagent dispatches. At N ≥ ~15 requirements this becomes expensive. Offer the user a critical-requirements-only option before dispatching: ask them to tag each requirement as critical | normal, run three-pass verification only on critical requirements, and run a single-pass verdict on the rest. Non-critical requirements with a single-pass no or partial still block the overall verdict. Single-pass is weaker evidence, and the report must mark those requirements as such so the user sees the trade.
Verdict rule:
yes → satisfied.no → not satisfied.partial → partial.disagreement: escalate. Do NOT pick the majority. Do NOT hide the disagreement.Disagreement means the evidence is ambiguous or the requirement is under-specified. Both are user-actionable problems.
After the three per-requirement passes, run one additional self-consistency pass on blast radius. Dispatch a separate fresh subagent with this brief:
ROLE: Surgical-diff auditor, read-only.
TASK: For every changed file and hunk between <base sha> and HEAD, verify that the change traces to a specific plan task or spec requirement.
CONSTRAINTS:
- Read-only. No edits.
- For each changed file, name the plan task whose "Files" section lists it, OR the spec requirement that authorizes it.
- For each hunk with no traceable origin, report it as an orphan with file:line range.
- Pre-existing dead code that was deleted without explicit user request is an orphan.
- Formatting / comment changes unrelated to a task are orphans.
CONTEXT:
Plan path: <path>
Spec path: <path>
Diff command: git diff <base sha>..HEAD
Files changed: <list from git diff --name-only>
OUTPUT:
{
"verdict": "clean | orphans-found",
"orphans": [
{"file": "<path>", "lines": "<range>", "reason": "<why no plan/spec match>"}
]
}
Verdict rule:
clean and zero orphans → recorded as repo-level pass.orphans-found → hard fail. The overall verdict cannot be ready. Orphans are listed in the report's Blockers section verbatim.This pass is non-optional and is not subject to the critical-requirements cost-control opt-out.
Independent of per-requirement verification, run and record:
- Test suite: <command> → <exit code> + <last 20 lines verbatim>
- Type checker: <command> → <exit code> + <output tail>
- Linter: <command> → <exit code> + <output tail>
- Build: <command> → <exit code> + <output tail>
- git status: <output verbatim>
- git log --oneline <base>..HEAD: <output verbatim>
Any non-zero exit code is a hard fail. The verdict cannot be ready if any repo-level check failed, regardless of per-requirement verdicts.
Save to docs/verifications/YYYY-MM-DD-<feature>-verify.md. Structure:
# Verification Report — <feature>
**Date:** YYYY-MM-DD
**Spec:** <path>
**Plan:** <path>
**Commit verified:** <sha>
## Repo-level checks
- Tests: <pass|fail> — `<command>` → exit <code>
```
- Types: ...
- Linter: ...
- Build: ...
- `git status`:
```
```
- Surgical-diff pass:
- `:` —
tests/...:42 — passed in run above.abc1234 — feat: ...<verbatim>
For each requirement where the three passes disagreed:
clean.not satisfied, partial, disagreement, or any orphan from the surgical-diff pass. See list below.If not ready, list the exact blockers.
### Step 6 — surface the verdict
Send the user a short message:
Verification report: . Verdict: <ready|not ready>. Blockers:
Suggested next step: <re-plan | re-run specific task | amend spec>
Do not claim `ready` if any repo-level check failed, any requirement is `not satisfied` or `partial`, any disagreement exists, or the surgical-diff pass returned `orphans-found`.
## Anti-patterns
- Claiming a requirement is satisfied with "I checked" as the evidence. "I checked" is not an artifact.
- Paraphrasing test output for brevity. Verbatim or drop; there is no middle.
- Running the three passes back-to-back in the same reasoning thread and letting the first answer anchor the other two. Independent means independent.
- Picking the majority answer on disagreement. Disagreement is a signal, not noise.
- Accepting a non-zero exit code from any repo-level check. No "it's just a lint warning".
- Writing a verdict of `ready` and then adding a caveat. If there is a caveat, it is not ready.
## Failure modes
- **No tests exist for a requirement.** Do not mark it `satisfied`. Either it is `untestable` with a proxy check documented, or it blocks the verdict.
- **A test exists but does not actually test the requirement.** Mark `partial` with evidence that the test's assertions are weak (quote the assertion lines). Escalate.
- **Build broken in an unrelated area.** Still a hard fail for the overall verdict. Surface to user; do not carve out an exception.
- **The spec itself is ambiguous.** Do not resolve ambiguity silently. Record a disagreement and escalate.
## Output of this skill
- A committed verification report at `docs/verifications/YYYY-MM-DD-<feature>-verify.md`.
- A short user-facing verdict message.
- If `ready`: the next skill in the pipeline (review-pack or finish-branch) may proceed.
- If `not ready`: the pipeline halts until the user acts.