Skill

verify

Invoke this skill whenever the user has finished an implementation task and needs an outside check that it actually works and meets requirements. This covers: post-implementation sanity checks ("is it done?", "does it pass?", "am I good?"), pre-ship quality gates (lint + types + tests + AC check), acceptance criteria audits against story specs or beads issues, and explicit /verify commands. The distinguishing intent: the user is not asking for new code — they want judgment on work already done. This skill runs regression tests, evaluates each acceptance criterion with evidence, and writes tests to lock in new behavior. Do NOT use when the user wants to implement something new, refactor code, or run a single named test command they specified.

npx claudepluginhub ivintik/private-claude-marketplace --plugin famdeck

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/famdeck:verify

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Verification answers two questions: "Did I break anything?" (regression tests) and "Does the new thing work?" (acceptance evals). After both pass, new functionality gets locked down with tests so it becomes regression-protected for next time.

Supporting Files

evals/evals.json

SKILL.md

160 lines · ~2.1k tokens

Similar Skills

verification-before-completion

Enforces running verification commands before claiming work is complete. Useful for preventing false success claims and ensuring evidence-based completion.

rpikit

verify

363

Runs automated checks (build, test, lint) and compares completed work against original requirements before committing.

memstack

dev-verify

Verifies feature completion by requiring automated tests that prove functionality works. Enforces phase gates and spec alignment before acceptance.

workflows

Stats

LanguagePython

Stars1

MaintenanceExcellent

Last CommitMar 10, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Verify: Regression Tests + Acceptance Evals

The Three Phases

/verify              → Run all three phases
/verify --tests-only → Phase 1 only (regression)
/verify --evals-only → Phase 2 only (acceptance)
/verify --lock-down  → Phase 3 only (write tests for new code)

Phase 1: Regression Tests

Run every verification tool the project has. The goal is binary: everything that passed before must still pass.

python -c "
from famdeck.quality.gates import run_quality_gates
report = run_quality_gates('$PWD')
print(report.summary())
"

If famdeck is not importable, detect and run tools manually:

Stack	Test runner	Linter	Type checker
Python	`pytest`	`ruff check .`	`pyright` / `mypy`
TypeScript	`npm test` / `vitest run`	`eslint .`	`tsc --noEmit`
Go	`go test ./...`	`golangci-lint run`	(built-in)

Run all detected tools. Report each gate as PASS/FAIL. If any fail:

Show the actionable error lines (not full logs)
Suggest specific fixes
Stop here — do not proceed to Phase 2 until regression tests pass

A gate is also FAIL if:

The tool crashes, can't parse the code, or exits with an unexpected error — broken tooling means unverified code, which is a failure
A configured tool/dependency is missing (e.g., eslint referenced in package.json but not installed, pytest in pyproject.toml but not available) — report what's missing and how to fix it
The tool runs but skips all files (e.g., eslint finds no matching files) — this suggests misconfiguration, not a clean pass

Do not report broken tools as SKIP or PASS. If the tool can't run, the gate fails.

E2E tests are just tests — they run in this phase alongside unit and integration tests. For famdeck projects, E2E tests go through the OpenClaw gateway (see CLAUDE.md for E2E rules).

Phase 2: Acceptance Evals

Verify that new functionality meets its acceptance criteria. This phase uses LLM judgment — you (the agent running this skill) act as the judge.

Find the acceptance criteria

Check these sources in order:

BMAD story spec: If $ARGS is a story ID, look in _bmad-output/implementation-artifacts/stories/ for the spec file and extract the ## Acceptance Criteria section
Beads issue: If the current task is a beads issue, read its description for ACs
User-provided: If the user specified criteria in the prompt, use those
None found: If no ACs exist, determine whether there is genuinely no new functionality:

Think carefully before accepting "no new functionality." Look at:
- Recent git diff or uncommitted changes — are there new functions, endpoints, classes, or behaviors?
- New files that didn't exist before?
- Changed function signatures, new parameters, new return values?
- New dependencies added?
If you see evidence of new or changed behavior → this is a FAIL: "New functionality detected but no acceptance criteria found. Task definition is incomplete — ACs are required before verification can pass."

If the changes are genuinely non-functional (pure refactor, file moves, dependency bumps, formatting, renaming) → Phase 2 is SKIPPED with explanation. Regression tests (Phase 1) become the sole quality gate and are strictly mandatory — the verdict depends entirely on Phase 1 passing.

Evaluate each criterion

For each AC, determine the best verification approach:

Deterministic (prefer when possible):

File exists? → check the filesystem
Command exits 0? → run it
Config has required fields? → parse and check
API returns expected status? → call it

LLM judgment (when deterministic isn't enough):

"Is the implementation correct?" → inspect the code, reason about it
"Does the output look right?" → examine outputs, compare to intent
"Is the error handling adequate?" → review edge cases

For each AC, report:

PASS (score 1.0) — criterion fully met, with evidence
PARTIAL (score 0.5-0.9) — partially met, explain what's missing. PARTIAL is a failure — the AC is not fully met.
FAIL (score 0.0-0.4) — not met, explain why
BLOCKED — cannot verify because a required tool, dependency, or environment is broken/missing. This is a FAIL, not a skip — if you can't verify it, it's not verified. Report what's missing and what needs to be fixed.

There is no SKIP status. Every AC must be evaluated. If something prevents evaluation, that's BLOCKED (which counts as a failure). The reasoning: "I couldn't check" is not the same as "it's fine" — unverified criteria are unmet criteria.

Report

Acceptance Eval: 1-1-setup-scaffold
  AC1: pyproject.toml with name/version    PASS  (1.0)  pyproject.toml has name="my-app", version="1.0.0"
  AC2: tests/ directory with test file     PASS  (1.0)  tests/test_smoke.py exists
  AC3: pytest exits 0                      PASS  (1.0)  1 test passed, exit 0
  AC4: CI workflow exists                  FAIL  (0.0)  .github/workflows/ci.yml not found

  Score: 0.75 (threshold: 0.85)
  Verdict: FAIL — AC4 not met

Verdict rules — the overall eval verdict is FAIL if ANY of these are true:

Score < threshold (default 0.85)
Any single AC is FAIL or BLOCKED
Any AC is PARTIAL (partial is not passing — the criterion is not fully met)
New functionality exists but no acceptance criteria were found (task definition is incomplete)
A required tool/dependency is missing, making verification impossible

Only if ALL ACs are PASS (1.0) and score >= threshold: verdict is PASS.

If no ACs exist AND no new functionality was introduced (pure refactor/cleanup), Phase 2 is skipped — verdict depends on Phase 1 alone.

If verdict is FAIL, stop here. The agent should iterate on the implementation or fix the environment.

Phase 3: Lock Down

After Phases 1 and 2 pass, ensure the new functionality is covered by deterministic tests. This converts today's eval into tomorrow's regression guard.

Check what changed:

Identify new/modified functions and classes
Check if they have corresponding test files
For uncovered code, write tests following the project's existing test patterns

Guidelines for test writing:

Unit tests for new pure functions and data transforms
Integration tests for new external boundaries (API calls, DB queries)
E2E test only if a new user journey was added (not for every change)
Match the project's existing test style, framework, and naming conventions
Target: new code should not decrease overall coverage

After writing tests, run Phase 1 again to confirm the new tests pass.

When Autopilot Calls This

Autopilot runs /verify automatically after completing each task:

Agent finishes implementation
/verify runs Phase 1 → if fail, agent fixes and retries
/verify runs Phase 2 → if fail, agent iterates on implementation
/verify runs Phase 3 → agent writes tests for new code
/verify --tests-only runs one final time to confirm everything passes
Ship (PR, close issue)

For Bug Fixes

Bug fixes have implicit acceptance criteria: "The bug no longer reproduces." The eval phase for a bug fix:

Reproduce the original bug (run the failing scenario)
Confirm it no longer fails after the fix
Lock down: write a regression test that would catch this bug if reintroduced

verify

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

verify

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Verify: Regression Tests + Acceptance Evals

The Three Phases

Phase 1: Regression Tests

Phase 2: Acceptance Evals

Find the acceptance criteria

Evaluate each criterion

Report

Phase 3: Lock Down

When Autopilot Calls This

For Bug Fixes

Similar Skills

Help us improve

Verify: Regression Tests + Acceptance Evals

The Three Phases

Phase 1: Regression Tests

Phase 2: Acceptance Evals

Find the acceptance criteria

Evaluate each criterion

Report

Phase 3: Lock Down

When Autopilot Calls This

For Bug Fixes