npx claudepluginhub boshu2/agentops --plugin agentopsThis skill uses the workspace's default tool permissions.
> **Purpose:** Is this code ready to ship?
references/deep-audit-protocol.mdreferences/deep-checks.mdreferences/examples.mdreferences/go-patterns.mdreferences/go-standards.mdreferences/json-standards.mdreferences/markdown-standards.mdreferences/patterns.mdreferences/python-standards.mdreferences/report-format.mdreferences/rust-standards.mdreferences/shell-standards.mdreferences/test-pyramid-weighting.mdreferences/typescript-standards.mdreferences/verification-report.mdreferences/vibe-coding.mdreferences/vibe-suppressions.mdreferences/write-time-quality.mdreferences/yaml-standards.mdscripts/prescan.shCreates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Purpose: Is this code ready to ship?
Three steps:
/vibe # validates recent changes
/vibe recent # same as above
/vibe src/auth/ # validates specific path
/vibe --quick recent # fast inline check, no agent spawning
/vibe --structured recent # 6-phase verification report (build→types→lint→tests→security→diff)
/vibe --deep recent # 3 judges instead of 2
/vibe --sweep recent # deep audit: per-file explorers + council
/vibe --mixed recent # cross-vendor (Claude + Codex)
/vibe --preset=security-audit src/auth/ # security-focused review
/vibe --explorers=2 recent # judges with explorer sub-agents
/vibe --debate recent # two-round adversarial review
/vibe --tier=quality recent # use quality tier for council calls
Before reviewing, pull relevant learnings from prior code reviews and known patterns:
if command -v ao &>/dev/null; then
ao lookup --query "<target-scope> code review patterns" --limit 3 2>/dev/null || true
fi
Apply retrieved knowledge (mandatory when results returned):
If learnings or patterns are returned, do NOT just load them as passive context. For each returned item:
known_risk in your review — state the pattern, what to look for, and whether the code exhibits itAfter applying, record the citation:
ao metrics cite "<learning-path>" --type applied 2>/dev/null || true
Skip silently if ao is unavailable or returns no results.
Project reviewer config: If .agents/reviewer-config.md exists, its full config (reviewers, plan_reviewers, skip_reviewers) is passed to council for judge selection. See skills/council/SKILL.md Step 1b.
Before scanning for changed files via git diff, check if a crank checkpoint exists:
if [ -f .agents/vibe-context/latest-crank-wave.json ]; then
echo "Crank checkpoint found — using files_changed from checkpoint"
FILES_CHANGED=$(jq -r '.files_changed[]' .agents/vibe-context/latest-crank-wave.json 2>/dev/null)
WAVE_COUNT=$(jq -r '.wave' .agents/vibe-context/latest-crank-wave.json 2>/dev/null)
echo "Wave $WAVE_COUNT checkpoint: $(echo "$FILES_CHANGED" | wc -l | tr -d ' ') files changed"
fi
When a crank checkpoint is available, use its files_changed list instead of re-detecting via git diff. This ensures vibe validates exactly the files that crank modified.
If target provided: Use it directly.
If no target or "recent": Auto-detect from git:
# Check recent commits
git diff --name-only HEAD~3 2>/dev/null | head -20
If nothing found, ask user.
Pre-flight: If no files found: Return immediately with: "PASS (no changes to review) — no modified files detected." Do NOT spawn agents for empty file lists.
If --structured flag is set, run a 6-phase mechanical verification pipeline instead of the council flow. This produces a machine-readable verification report suitable for PR gates and CI integration.
Phases: Build → Types → Lint → Tests → Security → Diff Review.
Read references/verification-report.md for the full report template and per-phase commands. Each phase is fail-fast — if Build fails, skip remaining phases and report NOT READY.
After all phases complete, write the structured report to .agents/council/YYYY-MM-DD-verification-<target>.md and output the summary table to the user.
When to use: Pre-PR gate, CI integration, when you need a mechanical pass/fail rather than judgment-based review.
If --quick flag is set, skip Steps 2a through 2e as heavy pre-processing, plus 2.5 and 2f, and jump to Step 4 with inline council after Steps 2.3, 2.4, 2g, and Step 3. Domain checklists, compiled-prevention loading, test-pyramid inventory, and inline product context are cheap and high-value, so they still run in quick mode. Complexity analysis (Step 2) still runs — it's cheap and informative.
Why: Steps 2.5 and 2a–2f add 30–90 seconds of pre-processing that mainly feed multi-judge council packets. In --quick mode (single inline agent), those inputs are not worth the cost, but test-pyramid and product-context checks still shape the inline review meaningfully.
Filter by language present in the change set first. Run only the
analyzers whose language actually appears in the diff. A docs/shell/BATS-only
epic must NOT trigger gocyclo against the entire cli/ tree (it has hung
in past runs); a Python-free epic must NOT trigger radon.
# Detect which languages are present in the diff (or in <path> for full audits).
# Use `git diff --name-only <base>...HEAD` for a PR; fall back to listing
# files under <path> when no diff base is available.
mkdir -p .agents/council
HAS_GO=false; HAS_PY=false
DIFF_FILES="$(git diff --name-only "${BASE:-HEAD~1}"...HEAD 2>/dev/null || find <path> -type f)"
echo "$DIFF_FILES" | grep -q '\.go$' && HAS_GO=true
echo "$DIFF_FILES" | grep -q '\.py$' && HAS_PY=true
echo "$(date -Iseconds) preflight: HAS_GO=$HAS_GO HAS_PY=$HAS_PY" >> .agents/council/preflight.log
For Python (only when HAS_PY=true):
if [ "$HAS_PY" = "true" ]; then
echo "$(date -Iseconds) preflight: checking radon" >> .agents/council/preflight.log
if ! which radon >> .agents/council/preflight.log 2>&1; then
echo "⚠️ COMPLEXITY SKIPPED: radon not installed (pip install radon)"
else
radon cc <path> -a -s 2>/dev/null | head -30
radon mi <path> -s 2>/dev/null | head -30
fi
else
echo "ℹ️ COMPLEXITY SKIPPED: no .py files in diff"
fi
For Go (only when HAS_GO=true):
if [ "$HAS_GO" = "true" ]; then
echo "$(date -Iseconds) preflight: checking gocyclo" >> .agents/council/preflight.log
if ! which gocyclo >> .agents/council/preflight.log 2>&1; then
echo "⚠️ COMPLEXITY SKIPPED: gocyclo not installed (go install github.com/fzipp/gocyclo/cmd/gocyclo@latest)"
else
gocyclo -over 10 <path> 2>/dev/null | head -30
fi
else
echo "ℹ️ COMPLEXITY SKIPPED: no .go files in diff"
fi
For other languages: Skip complexity with explicit note: "⚠️ COMPLEXITY SKIPPED: No analyzer for "
Interpret results:
| Score | Rating | Action |
|---|---|---|
| A (1-5) | Simple | Good |
| B (6-10) | Moderate | OK |
| C (11-20) | Complex | Flag for council |
| D (21-30) | Very complex | Recommend refactor |
| F (31+) | Untestable | Must refactor |
Include complexity findings in council context.
Detect code patterns in the target files and load matching domain-specific checklists from standards/references/:
| Trigger | Checklist | Detection |
|---|---|---|
| SQL/ORM code | sql-safety-checklist.md | Files contain SQL queries, ORM imports (database/sql, sqlalchemy, prisma, activerecord, gorm, knex), or migration files in changeset |
| LLM/AI code | llm-trust-boundary-checklist.md | Files import anthropic, openai, google.generativeai, or match *llm*, *prompt*, *completion* patterns |
| Concurrent code | race-condition-checklist.md | Files use goroutines, threading, asyncio, multiprocessing, sync.Mutex, concurrent.futures, or shared file I/O patterns |
| Codex skills | codex-skill.md | Files under skills-codex/, or files matching *codex*SKILL.md, convert.sh, skills-codex-overrides/, or converter scripts |
For each matched checklist, load it via the Read tool and include relevant items in the council packet as context.domain_checklists. Multiple checklists can be loaded simultaneously.
Skip silently if no patterns match. This step runs in both --quick and full modes (domain checklists are cheap to load and high-value).
Steps 2.4-2f, 2h, 3-3.6 (Deep Checks & Pre-Council Prep): Read references/deep-checks.md for compiled prevention, prior findings, pre-council deep analysis checks, product context, spec loading, suppressions, pre-mortem correlation, and model cost tiers. Loaded automatically unless --quick mode is set. In --quick mode, skip directly to Step 2g.
Compiled prevention inputs: Load .agents/pre-mortem-checks/ and .agents/planning-rules/ when available. These compiled artifacts contain known_risks from prior findings that inform the review — carry matched finding IDs into council context so judges can assess whether the flywheel prevented rediscovery.
Skip if --quick. Load prior findings from .agents/findings/registry.jsonl.
Skip if --quick. Run compiled constraint tests from .agents/constraints/.
Skip if --quick. Verify file metadata consistency.
Skip if --quick. Run organizational-lint checks.
Skip if --quick. Search for relevant prior learnings via ao lookup.
Skip if --quick. Run proactive bug-hunt audit on target files.
Skip if --quick. When --mixed is passed and Codex CLI is available, send the first 2000 chars of the diff to Codex for a parallel review. Cap input at 2000 chars to stay within Codex context budgets.
Skip if --quick as a separate judge-fanout step. When PRODUCT.md exists and the user did not pass an explicit --preset override, quick mode still loads DX expectations inline in the single-agent review. In non-quick modes, add a DX (developer experience) judge: 2 independent + 1 DX judge (3 judges total). The DX judge evaluates whether the code aligns with the product's stated personas and value propositions.
Assess test coverage against the test pyramid standard (the test pyramid standard (loaded via /standards)).
Read skills/vibe/references/test-pyramid-weighting.md for test pyramid weighting — L3+ tests found all production bugs, weight them 5x.
Test Pyramid Weighting: Weight test coverage by level: L0–L1 at 1x, L2 at 3x, L3+ at 5x. Unit-only coverage is a WARN signal, not a PASS. See references/test-pyramid-weighting.md.
Run even in --quick mode — this is cheap (file existence checks) and high-signal.
Identify changed modules from git diff or target scope
For each changed module, check coverage pyramid (L0–L3):
For boundary-touching code, check bug-finding pyramid (BF1–BF5):
Compute weighted pyramid score for changed code paths:
Formula:
weighted_score = (L0_count x 1 + L1_count x 1 + L2_count x 3 + L3_count x 5 + L4_count x 5) / max_possible
Where max_possible = total_test_count x 5 (the score if every test were L3+).
Count tests at each level for changed code paths:
Interpretation:
weighted_score >= 0.6 — strong pyramid, L2+ tests present0.3 <= weighted_score < 0.6 — acceptable, but recommend more integration testsweighted_score < 0.3 AND all tests are L0-L1 only — WARN: unit-only test coverage (feeds into vibe verdict as a WARN signal, not a separate gate)Satisfaction exposure: The weighted_score is also exposed as satisfaction_score (with source "test-pyramid-weighted") in the test_pyramid output block AND promoted to the top-level verdict JSON as satisfaction_score (verdict schema field, skills/council/schemas/verdict.json: number 0.0-1.0, "Probabilistic satisfaction score (0.0 = unsatisfied, 1.0 = fully satisfied). Optional — absent means not computed."). Downstream consumers (e.g., /validation STEP 1.8 holdout evaluation) can use satisfaction_score as a normalized quality signal.
Include in council packet and vibe report output:
## Test Pyramid Score
| Level | Count | Weight | Contribution |
|-------|-------|--------|--------------|
| L0 | 2 | 1x | 2 |
| L1 | 8 | 1x | 8 |
| L2 | 0 | 3x | 0 |
| L3 | 0 | 5x | 0 |
| L4 | 0 | 5x | 0 |
| **Total** | **10** | | **10 / 50 = 0.20** |
WARN: weighted_score 0.20 < 0.3 and all tests are L0-L1 only
Build coverage table and include in council packet as context.test_pyramid:
"test_pyramid": {
"coverage": {
"L0": {"status": "pass", "files": ["test_spec_enforcement.py"]},
"L1": {"status": "pass", "files": ["test_module.py"]},
"L2": {"status": "gap", "reason": "crosses subsystem boundary, no integration test"}
},
"bug_finding": {
"BF4_chaos": {"status": "gap", "reason": "external API calls without failure injection"},
"BF1_property": {"status": "na", "reason": "no data transformations in scope"}
},
"weighted_score": 0.20,
"satisfaction_score": 0.20,
"satisfaction_source": "test-pyramid-weighted",
"score_breakdown": {"L0": 2, "L1": 8, "L2": 0, "L3": 0, "L4": 0},
"max_possible": 50,
"warn_unit_only": true,
"verdict": "WARN: weighted_score 0.20 < 0.3, all tests L0-L1 only"
}
Verdict rules:
weighted_score < 0.3 AND all tests L0-L1 only — WARN: unit-only coverage (include in council findings)weighted_score >= 0.6 — no mention neededWhen coverage gaps are found, run /test <module> to generate test candidates for uncovered code.
With spec found — use code-review preset:
/council --preset=code-review validate <target>
error-paths: Trace every error handling path. What's uncaught? What fails silently?api-surface: Review every public interface. Is the contract clear? Breaking changes?spec-compliance: Compare implementation against the spec. What's missing? What diverges?The spec content is injected into the council packet context so the spec-compliance judge can compare implementation against it.
Without spec — 2 independent judges (no perspectives):
/council validate <target>
2 independent judges (no perspective labels). Use --deep for 3 judges on high-stakes reviews. Override with --quick (inline single-agent check) or --mixed (cross-vendor with Codex).
Council receives:
context.spec)--deep or --sweep, in context.sweep_manifest — judges shift to adjudication mode, see references/deep-audit-protocol.md)All council flags pass through: --quick (inline), --mixed (cross-vendor), --preset=<name> (override perspectives), --explorers=N, --debate (adversarial 2-round), --tier=<name> (model cost tier: quality/balanced/budget). See Quick Start examples and /council docs.
Each judge reviews for:
| Aspect | What to Look For |
|---|---|
| Correctness | Does code do what it claims? |
| Security | Injection, auth issues, secrets |
| Edge Cases | Null handling, boundaries, errors |
| Quality | Dead code, duplication, clarity |
| Complexity | High cyclomatic scores, deep nesting |
| Architecture | Coupling, abstractions, patterns |
| Council Verdict | Vibe Result | Action |
|---|---|---|
| PASS | Ready to ship | Merge/deploy |
| WARN | Review concerns | Address or accept risk |
| FAIL | Not ready | Fix issues |
Write to: .agents/council/YYYY-MM-DD-vibe-<target>.md (use date +%Y-%m-%d)
Read references/report-format.md for the full vibe report markdown template. The report includes: complexity analysis, council verdict table, shared/critical/informational findings, all findings (when --deep/--sweep), recommendation, and decision checkboxes.
Tell the user:
After council verdict:
ao ratchet record vibe --output "<report-path>" 2>/dev/null || trueRead the council report. For each finding, format as:
FINDING: <description> | FIX: <fix or recommendation> | REF: <ref or location>
Fallback for v1 findings (no fix/why/ref fields):
fix = finding.fix || finding.recommendation || "No fix specified"
ref = finding.ref || finding.location || "No reference"
If verdict is WARN or FAIL, persist reusable findings to .agents/findings/registry.jsonl and optionally mirror the broader narrative to a learning file.
Registry write rules:
dedup_key, provenance, pattern, detection_question, checklist_item, applicable_when, and confidenceapplicable_when must use the controlled vocabulary from the finding-registry contractdedup_keyIf a broader prose summary still helps, also write the existing anti-pattern learning file to .agents/learnings/YYYY-MM-DD-vibe-<target>.md. Skip both if verdict is PASS.
After the registry update, if hooks/finding-compiler.sh exists, run:
bash hooks/finding-compiler.sh --quiet 2>/dev/null || true
This keeps the same-session post-mortem path synchronized with the latest reusable findings. session-end-maintenance.sh remains the idempotent backstop.
After validation completes, clean up stale test beads (bd list --status=open | grep -iE "test bead|test quest") via bd close to prevent bead pollution. Skip if bd unavailable.
/implement issue-123
│
▼
(coding, quick lint/test as you go)
│
▼
/vibe ← You are here
│
├── Complexity analysis (find hotspots)
├── Bug hunt audit (find concrete bugs)
└── Council validation (multi-model judgment)
│
├── PASS → ship it
├── WARN → review, then ship or fix
└── FAIL → fix, re-run /vibe
User says: "Run a quick validation on the latest changes."
Do:
/vibe recent
/vibe recent
Runs complexity on recent changes, then council reviews.
/vibe src/auth/
Complexity + council on auth directory.
/vibe --deep recent
Complexity + 3 judges for thorough review.
/vibe --mixed recent
Complexity + Claude + Codex judges.
See references/examples.md for additional examples: security audit with spec compliance, developer-experience code review with PRODUCT.md, and fast inline checks.
| Problem | Cause | Solution |
|---|---|---|
| "COMPLEXITY SKIPPED: radon not installed" | Python complexity analyzer missing | Install with pip install radon or skip complexity (council still runs). |
| "COMPLEXITY SKIPPED: gocyclo not installed" | Go complexity analyzer missing | Install with go install github.com/fzipp/gocyclo/cmd/gocyclo@latest or skip. |
| Vibe returns PASS but constraint tests fail | Council LLMs miss mechanical violations | Check .agents/council/<timestamp>-vibe-*.md for constraint test results. Failed constraints override council PASS. Fix violations and re-run. |
| Codex review skipped | --mixed not passed, Codex CLI not on PATH, or no uncommitted changes | Codex review is opt-in — pass --mixed to enable. Also requires Codex CLI on PATH and uncommitted changes. |
| "No modified files detected" | Clean working tree, no recent commits | Make changes or specify target path explicitly: /vibe src/auth/. |
| Spec-compliance judge not spawned | No spec found in beads/plans | Reference bead ID in commit message or create plan doc in .agents/plans/. Without spec, vibe uses 2 independent judges (3 with --deep). |
The hooks/write-time-quality.sh PostToolUse hook runs automatically after every Write/Edit tool call, catching common anti-patterns at edit time rather than review time. It checks:
fmt.Print in library codeexcept:, eval/exec, missing type hints on public functionsset -euo pipefail, unquoted variablesThe hook is non-blocking (always exits 0) and outputs warnings via JSON. See references/write-time-quality.md for the full design.
skills/council/SKILL.md — Multi-model validation councilskills/complexity/SKILL.md — Standalone complexity analysisskills/bug-hunt/SKILL.md — Proactive code audit and bug investigation.agents/specs/conflict-resolution-algorithm.md — Conflict resolution between agent findings