Help us improve
Share bugs, ideas, or general feedback.
From agentops
Produces PASS/WARN/FAIL verdicts for artifacts, plans, code, PRs, or CI gates using configurable modes (quick, deep, mixed, debate, post-impl, pre-impl, PR). Use when you need a structured judgment before proceeding.
npx claudepluginhub boshu2/agentops --plugin agentopsHow this skill is triggered — by the user, by Claude, or both
Slash command
/agentops:validateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Role:** validator. Input = artifact (plan, spec, code, PR, fitness gate). Output = `verdict.v1` (PASS / WARN / FAIL with rationale + findings).
Runs parallel reviews from 6 reviewers (security, UX/DX, external Codex/Gemini CLIs, domain experts) on code, plans, or requirements for quality gates. Invoke via /review --mode code/plan/clarify.
Runs multi-agent verification loop post-implementation, dispatching specialized agents for review with autonomous subagent fixes and retries until unanimous approval.
Share bugs, ideas, or general feedback.
Role: validator. Input = artifact (plan, spec, code, PR, fitness gate). Output =
verdict.v1(PASS / WARN / FAIL with rationale + findings).
Status (2026-05-08): introduced ADDITIVE in Phase 1 (m6v5.D.1 / soc-78s2v). Existing validators (council/vibe/pre-mortem/red-team/review/scenario plus retired pr-validate and validation lanes) stayed until Phase 2 shim conversion (m6v5.D.2). Fix-C smoke (
soc-wb2aa) gates Phase 2.
/validate is a driving adapter for the validate_acceptance port in the
Intent-to-Loop Hexagon.
When the artifact contains a hexagon: block, preserve the bounded context,
context packet, guard adapters, and done state in the verdict.
When the artifact claims DONE/closed/green, apply the
Completion-Claim Kernel
before returning PASS.
| Mode | Purpose | Replaces (post-Phase 2) |
|---|---|---|
| (default) | 2-judge multi-judge consensus on any artifact | /council default |
--quick | Inline single-agent structured review | /council --quick |
--deep | 4-judge thorough review | /council --deep |
--mixed | Cross-vendor (Claude + Codex), N×2 judges | /council --mixed |
--debate | Adversarial 2-round refinement | /council --debate, /red-team |
--mode=post-impl | Code-readiness pipeline (complexity → bug-hunt → council) | /vibe |
--mode=pre-impl [--target=X] | Plan/spec validation; target ∈ {scenario,fitness,ratchet,scope,skill,health} | /pre-mortem, /scenario, /goals measure, /ratchet, /scope, /skill-auditor, ao doctor |
--mode=pr | PR-shape verdict (diff review + acceptance check) | /review |
Mode-budget assertion: 8 modes. Adding a 9th requires demoting an existing one OR refusing the addition (per Fix-F § continuous CI gate).
validation + pr-validate retired into these modesThe retired validation and pr-validate lanes were the Phase-1 placeholders for --mode=post-impl
and --mode=pr; both are now retired (cp-ki8) and their load-bearing contract folded
here so no capability is lost:
--mode=post-impl (was the validation lane) — full close-out + no-self-grading invariant.
Beyond the inline complexity → bug-hunt → council pipeline, this mode owns the
validate_acceptance port: every Given/When/Then from the intent issue must map to a
passing test (criterion→test roll-up; activity logs do not close beads), and the
acceptance verdict must be produced by a blind, context-isolated sub-agent judge that
did not author the code (author ≠ validator — ag-9jle.5 / ag-lmdx.4). Refuse to
certify acceptance when judge_id == author_id; the only escape is an inline-fallback
self-grade that is stamped as waived, not independently validated. Apply the
Completion-Claim Kernel
before accepting any DONE/closed/green claim. For epic-scope close-out this mode may
delegate to /vibe, /post-mortem, and /forge rather than inlining them.--mode=pr (was the pr-validate lane) — submission-readiness checks. In addition to the
diff/acceptance verdict, run, in order: (1) upstream alignment FIRST (BLOCKING —
git rev-list --count HEAD..origin/main; fail if many commits behind or merge would
conflict), (2) CONTRIBUTING.md compliance (BLOCKING), (3) isolation — single commit
type + thematic files + atomic scope, (4) scope-creep containment, (5) quality gate
(tests/lint, non-blocking). On FAIL, emit remediation steps (split-by-type cherry-pick,
rebase-on-upstream) so the verdict is actionable./validate path/to/plan.md # default 2-judge consensus
/validate --quick path/to/plan.md # inline single-agent
/validate --deep path/to/spec.md # 4-judge thorough
/validate --mode=pre-impl path/to/plan.md # pre-mortem mode
/validate --mode=post-impl recent # vibe mode (post-implement)
/validate --mode=pr 123 # PR review by PR number
/validate --mode=pre-impl --target=fitness # fitness gate against GOALS.md
Default uses runtime-native subagent spawning. Falls back to --quick (inline) when no multi-agent capability detected.
Parse --mode and --target. Default mode is multi-judge. Validate combinations:
| Mode | Allowed --target |
|---|---|
| default, --quick, --deep, --mixed, --debate | n/a |
| --mode=post-impl | n/a (pipeline scope is recent code changes) |
| --mode=pre-impl | scenario, fitness, ratchet, scope, skill, health (default: pre-mortem on plan) |
| --mode=pr | n/a (PR ID/path is positional) |
Reject invalid combinations (e.g., --mode=pr --target=fitness).
# resolve artifact:
ARTIFACT="${1:-recent}" # path, PR ID, or "recent"
# load FAIL patterns:
# (folded into skill body; not a separate hook)
For --mode=pre-impl, also load:
.agents/planning-rules/*.md (compiled planning rules).agents/findings/registry.jsonl (active findings).agents/pre-mortem-checks/*.md (compiled prevention)For --mode=post-impl, run pre-checks:
/bug-hunt skill needed)For --mode=pr, fetch the PR diff (gh pr diff <id> or path).
spawn_agent available → Codex sub-agentTeamCreate available → Claude native teamtask (read-only skill tool, OpenCode) → opencode subagent--quick (inline single-agent)Log selected backend in the verdict frontmatter.
| Mode | Judges | Perspectives |
|---|---|---|
| default | 2 | independent (no labeled perspectives) |
| --deep | 4 | missing-requirements, feasibility, scope, spec-completeness |
| --mixed | 2N (default N=3) | same N perspectives across Claude + Codex |
| --debate | 2+ rounds | adversarial; 2 rounds with critique-rebuttal |
| --quick | 0 (inline self) | structured review |
| --mode=post-impl | 2 + pipeline | complexity → bug-hunt → 2-judge council |
| --mode=pre-impl | 2-4 | per target preset |
| --mode=pr | 2 | diff-review + acceptance-check |
Each judge gets:
For --mode=pre-impl --target=plan:
For --mode=post-impl:
For --mode=pre-impl --target=fitness:
Each judge returns a per-judge result. Consolidate:
Output path: .agents/council/YYYY-MM-DD-validate-<topic-slug>.md
---
id: validate-YYYY-MM-DD-<slug>
type: verdict
date: YYYY-MM-DD
mode: <mode>
target: <target or n/a>
artifact: <path>
backend: <codex-subagents | claude-teams | opencode | inline>
---
# Validate Verdict — <topic>
## Council Verdict: PASS / WARN / FAIL
| Failure mode | Risk | Severity | Addressed? |
|---|---|---|---|
| ... | ... | ... | ... |
## Pseudocode Fixes (when WARN/FAIL)
(copy-pastable into affected issues per pre-mortem 4.6 contract)
## FAIL Pattern Check
(top 8 patterns — status per pattern)
## Verdict
PASS — proceed
WARN — review concerns, accept risk, or apply fixes
FAIL — block; revise artifact and rerun
The exact heading ## Council Verdict: PASS / WARN / FAIL is mandatory — ao rpi phased (when present) parses with anchored regex.
For --mode=pre-impl reusable findings: append to .agents/findings/registry.jsonl (atomic temp+rename).
--target | What gets graded | Replaces |
|---|---|---|
| (default) | Plan/spec for an upcoming /implement | /pre-mortem |
| scenario | Holdout scenario gate | /scenario |
| fitness | GOALS.md fitness gates | /goals measure, ao goals measure |
| ratchet | Brownian Ratchet checkpoint | /ratchet, ao ratchet status |
| scope | Frozen-dirs declaration | /scope |
| skill | SKILL.md hygiene + audit | /skill-auditor, /heal-skill (audit half) |
| health | Repo health probe | ao doctor |
Each target has its own inline check rubric until Phase 2 extraction.
VERDICT: PASS
(blank line)
COMMANDS RUN:
<actual commands + verbatim output snippets>
REASONS:
- bullet citing a COMMANDS RUN line
A verdict with no COMMANDS RUN: section is unverified — reject it and
dispatch a fresh validator. A verdict whose COMMANDS RUN: lists only commands
the author ran (not the judge) is a counterfeit judge — treat as FAIL and
re-route to a genuinely independent validator. No ## headings or parentheticals
on VERDICT: or COMMANDS RUN: lines; the gate parses them anchored.
For assurance closes (the control-plane verdict-gate, cp-icb6), the floor is
≥2 verdicts from ≥2 distinct model families, author family excluded, fail-closed.
This skill supports that policy via --mixed mode and the verdict form above;
the policy itself lives in the gate, not here. A same-model council is valid for
non-assurance decisions (design brainstorms, quick checks) — do not refuse those.
Tier mapping:
The A7 ruling (2026-06-09, memory validation-family-policy-risk-tiered):
Gemini is currently benched for STRICT validation — use Codex + Fable for A1/A2
tiers. Gemini may return for STRICT when Bo graduates it from the bench. Do not
present Gemini paths as live for tier A1/A2 until then.
When two implementations of the same intent exist, do NOT award based on authorship or surface aesthetics. Run both on a differentiating fixture (an input that exposes their behavioral difference), record the outputs verbatim, and graft the loser's unique assets onto the winner. "My worker wrote it" is not evidence.
Before dispatching a validator, register intent on the bead graph (update status, assign actor). Two parallel validators on the same bead produce a dedup incident, not a cross-family quorum. Check for an existing actor before spawning.
A judge re-runs the cited commands on the actual artifacts. It does not read the
author's evidence file and agree. Attest judge_source: <model> inside COMMANDS RUN
so the gate can confirm the judge identity. A judge that ran nothing is a reader, not
a verifier — discard its verdict.
A worker's evidence file may only contain numbers and outputs that were captured — pasted verbatim from a command's output — never reconstructed from memory. The canonical failure: "36 checks — 35 pass" stated with confidence was inference; the measured reality was 36 run / 34 pass / 1 fail / 1 skip, on a different commit.
/implement for fixes).## Council Verdict: ... text format.skills/rpi/SKILL.md — orchestrator that fires /validate --mode=pre-impl after /planskills/curate/SKILL.md — miner role (paired canonical skill)schemas/verdict.v1.schema.json — output contract