Invoke this skill whenever the user has finished an implementation task and needs an outside check that it actually works and meets requirements. This covers: post-implementation sanity checks ("is it done?", "does it pass?", "am I good?"), pre-ship quality gates (lint + types + tests + AC check), acceptance criteria audits against story specs or beads issues, and explicit /verify commands. The distinguishing intent: the user is not asking for new code — they want judgment on work already done. This skill runs regression tests, evaluates each acceptance criterion with evidence, and writes tests to lock in new behavior. Do NOT use when the user wants to implement something new, refactor code, or run a single named test command they specified.
From famdecknpx claudepluginhub ivintik/private-claude-marketplace --plugin famdeckThis skill uses the workspace's default tool permissions.
evals/evals.jsonSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Verification answers two questions: "Did I break anything?" (regression tests) and "Does the new thing work?" (acceptance evals). After both pass, new functionality gets locked down with tests so it becomes regression-protected for next time.
/verify → Run all three phases
/verify --tests-only → Phase 1 only (regression)
/verify --evals-only → Phase 2 only (acceptance)
/verify --lock-down → Phase 3 only (write tests for new code)
Run every verification tool the project has. The goal is binary: everything that passed before must still pass.
python -c "
from famdeck.quality.gates import run_quality_gates
report = run_quality_gates('$PWD')
print(report.summary())
"
If famdeck is not importable, detect and run tools manually:
| Stack | Test runner | Linter | Type checker |
|---|---|---|---|
| Python | pytest | ruff check . | pyright / mypy |
| TypeScript | npm test / vitest run | eslint . | tsc --noEmit |
| Go | go test ./... | golangci-lint run | (built-in) |
Run all detected tools. Report each gate as PASS/FAIL. If any fail:
A gate is also FAIL if:
eslint referenced in package.json but not installed, pytest in pyproject.toml but not available) — report what's missing and how to fix itDo not report broken tools as SKIP or PASS. If the tool can't run, the gate fails.
E2E tests are just tests — they run in this phase alongside unit and integration tests. For famdeck projects, E2E tests go through the OpenClaw gateway (see CLAUDE.md for E2E rules).
Verify that new functionality meets its acceptance criteria. This phase uses LLM judgment — you (the agent running this skill) act as the judge.
Check these sources in order:
BMAD story spec: If $ARGS is a story ID, look in _bmad-output/implementation-artifacts/stories/ for the spec file and extract the ## Acceptance Criteria section
Beads issue: If the current task is a beads issue, read its description for ACs
User-provided: If the user specified criteria in the prompt, use those
None found: If no ACs exist, determine whether there is genuinely no new functionality:
Think carefully before accepting "no new functionality." Look at:
If you see evidence of new or changed behavior → this is a FAIL: "New functionality detected but no acceptance criteria found. Task definition is incomplete — ACs are required before verification can pass."
If the changes are genuinely non-functional (pure refactor, file moves, dependency bumps, formatting, renaming) → Phase 2 is SKIPPED with explanation. Regression tests (Phase 1) become the sole quality gate and are strictly mandatory — the verdict depends entirely on Phase 1 passing.
For each AC, determine the best verification approach:
Deterministic (prefer when possible):
LLM judgment (when deterministic isn't enough):
For each AC, report:
There is no SKIP status. Every AC must be evaluated. If something prevents evaluation, that's BLOCKED (which counts as a failure). The reasoning: "I couldn't check" is not the same as "it's fine" — unverified criteria are unmet criteria.
Acceptance Eval: 1-1-setup-scaffold
AC1: pyproject.toml with name/version PASS (1.0) pyproject.toml has name="my-app", version="1.0.0"
AC2: tests/ directory with test file PASS (1.0) tests/test_smoke.py exists
AC3: pytest exits 0 PASS (1.0) 1 test passed, exit 0
AC4: CI workflow exists FAIL (0.0) .github/workflows/ci.yml not found
Score: 0.75 (threshold: 0.85)
Verdict: FAIL — AC4 not met
Verdict rules — the overall eval verdict is FAIL if ANY of these are true:
Only if ALL ACs are PASS (1.0) and score >= threshold: verdict is PASS.
If no ACs exist AND no new functionality was introduced (pure refactor/cleanup), Phase 2 is skipped — verdict depends on Phase 1 alone.
If verdict is FAIL, stop here. The agent should iterate on the implementation or fix the environment.
After Phases 1 and 2 pass, ensure the new functionality is covered by deterministic tests. This converts today's eval into tomorrow's regression guard.
Check what changed:
Guidelines for test writing:
After writing tests, run Phase 1 again to confirm the new tests pass.
Autopilot runs /verify automatically after completing each task:
/verify runs Phase 1 → if fail, agent fixes and retries/verify runs Phase 2 → if fail, agent iterates on implementation/verify runs Phase 3 → agent writes tests for new code/verify --tests-only runs one final time to confirm everything passesBug fixes have implicit acceptance criteria: "The bug no longer reproduces." The eval phase for a bug fix: