From claude-commands
Reviews evidence bundles for claims/PRs against standards, verifying bundle integrity via checksums and producing PASS/PARTIAL/FAIL verdicts. Invoke with /er.
npx claudepluginhub joshuarweaver/cascade-code-general-misc-2 --plugin jleechanorg-claude-commandsThis skill uses the workspace's default tool permissions.
**Purpose**: Review evidence for a claim, a path, or a PR. Produce a PASS / PARTIAL / FAIL verdict with specific artifact-level citations. This skill is the enforcement layer for the `evidence-standards` skill (the "what to produce") — this file is the "how to judge it".
Defines evidence standards for testing with git provenance capture via Python, server runtime checks, Three Evidence Rule, and real/mock decision tree. Verifies test claims with metadata like commit diffs and PIDs.
Validates AI agent claims like 'tests pass' or 'fixed' against evidence trails and tool outputs. Detects stubs and unproven assertions. Auto-triggers at workflow end.
Guides writing evidence packages for consensus-loop watch file in code reviews. Covers claims, changed files, executable tests, results, risks, tag lifecycles, and rejection fixes.
Share bugs, ideas, or general feedback.
Purpose: Review evidence for a claim, a path, or a PR. Produce a PASS / PARTIAL / FAIL verdict with specific artifact-level citations. This skill is the enforcement layer for the evidence-standards skill (the "what to produce") — this file is the "how to judge it".
Invoked by: /er [subject or path]
Standards source: evidence-standards skill (/es). When in doubt about a requirement, read the standards first.
| Verdict | Meaning |
|---|---|
| PASS | Every claim has a matching artifact of STRONG quality and every mandatory check below passes. |
| PARTIAL | Claims are supported but one or more mandatory checks fail or soft-warn (e.g., WARN in verification_report.json, missing downloadable MP4). A PR at PARTIAL is not merge-ready. |
| FAIL | A claim is contradicted by an artifact, or integrity is broken (sha256 mismatch, dirty capture producing the claim, scope exclusion). |
| INCONCLUSIVE | Not enough artifact data exists to decide. Request more. |
Evidence bundles use one of two checksum modes (declared in metadata.json → checksum_mode):
bundle_checksum: a single top-level checksums.sha256 file lists every tracked artifactper_file_checksums: each artifact has its own sibling <file>.sha256Detect the mode, then verify accordingly. If metadata.json is absent or silent, infer: top-level checksums.sha256 → bundle_checksum; otherwise fall back to per-file.
cd <bundle_dir>
if [ -f checksums.sha256 ]; then
sha256sum -c checksums.sha256
else
# per_file_checksums mode — verify each sibling .sha256
find . -name '*.sha256' -exec sh -c 'd=$(dirname "$1") && b=$(basename "$1") && (cd "$d" && sha256sum -c "$b")' _ {} \;
fi
INVALID label may be used per-artifact in the claim map (Phase 2 below) to mark contradicted or corrupted individual artifacts, but the overall verdict vocabulary stays PASS | PARTIAL | FAIL | INCONCLUSIVEverification_report.json is optional — it is produced by bundles that went through an explicit verifier pass, but is not required by evidence-standards. Treat its presence as a ceiling constraint, not a prerequisite:
[ -f verification_report.json ] && jq -r '.overall_verdict' verification_report.json
PASS → proceedWARN → verdict ceiling is PARTIAL (never promote WARN to PASS without resolving each recorded violation)grep -A10 "Scope note" README.md
Tmux / Terminal video (required for any code change, test run, deploy):
Browser UI video (required when PR adds or modifies any testing_ui/test_*.py file):
If ANY of the above is missing → verdict is PARTIAL (not PASS), regardless of other evidence quality.
GIFs and MP4s must be on a public repository — private repo release assets return 404 for anonymous viewers and do NOT render as inline images in PR descriptions.
# For each <owner>/<repo> hosting a video asset:
gh api repos/<owner>/<repo> --jq '.private'
# Must be: false
# And verify the asset itself is uploaded and accessible:
gh api repos/<owner>/<repo>/releases/tags/<tag> \
--jq '.assets[] | {name: .name, state: .state}'
# All states must be "uploaded"
Private repo assets = PARTIAL / FAIL for inline rendering.
A PASS verdict requires the PR to meet the "clean computer" standard from evidence-standards:
git clone <url> + git checkout <branch>Failure mode: if the only instructions are "see the repo" or "run the tests" without exact commands → PARTIAL.
For each claim, identify the single primary artifact that proves it. Rate quality:
| Quality | Meaning |
|---|---|
| STRONG | Claim directly observable in a raw artifact (log line, screenshot frame, test output, sha256-verified file) |
| WEAK | Claim is indirect — derived from self-reporting (evidence.md, summary.md) without raw backing |
| MISSING | No artifact supports the claim |
| INVALID | An artifact exists but contradicts the claim, or fails integrity check |
Run all six checks in the "Mandatory Pre-PASS Checks" section above. Record the result of each.
Produce output in this format:
## Evidence Review Verdict
**Subject**: <what was reviewed>
**Bundle**: <path>
**Overall**: PASS | PARTIAL | FAIL | INCONCLUSIVE
**Confidence**: HIGH | MEDIUM | LOW
### Claim Map
| # | Claim | Artifact | Quality | Notes |
|---|-------|----------|---------|-------|
| 1 | ... | run.json | STRONG | Line 42 shows ...|
| 2 | ... | (none) | MISSING | No artifact found |
### Mandatory Checks
- [x] sha256sum -c → 38/38 OK
- [x] verification_report.json overall_verdict = PASS
- [x] Scope note matches claimed domain
- [x] Terminal GIF + MP4 + caption present
- [ ] Browser UI GIF: 404 — private repo hosting (→ PARTIAL)
- [x] Gist has clone + test commands
### Violations
1. <specific evidence item that fails>
### Accepted Exceptions
1. <with rationale from verification_report.json>
### Recommendations
1. <non-blocking suggestions for future bundles>
evidence.md cites itself instead of raw artifacts → WEAKevidence/ but PR description has no gist/release link → PR fails "clean computer" checkecho "PASS" in terminal video instead of real test runner output: hard block → FAILyour-project.com release returned 404. Moved to public agent-orchestrator release. → Added mandatory check 5.verification_report.json with rationale.verification_report.json is a ceiling — never promote to PASS without resolving each recorded violation.