Help us improve
Share bugs, ideas, or general feedback.
From claude-commands
Proves, rejects, or deletes backend adjustment registry entries by running isolated red/green comparator evidence with real LLM captures.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:adjustment-proofThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill before accepting any claim that a backend adjustment is proven
Enforces a proof standard before allowing backend adjusters that override or reshape model-owned output. Use when reviewing corrections, defaults, suppressions, clamps, guardrails, or fallback logic.
Verifies completion claims like 'done', 'tests pass', or 'ready to merge' using fresh command outputs as proof. Includes self-consistency reasoning and config change gates.
Enforces verification gates: run fresh commands before claiming work is complete, tests pass, or bugs are fixed. Prevents confirmation bias.
Share bugs, ideas, or general feedback.
Use this skill before accepting any claim that a backend adjustment is proven necessary or safe to delete. A subagent report, PR prose, or one successful organic run is not proof. Proof requires an isolated red/green comparator for each adjustment.
This skill extends:
.claude/skills/zfc-adjuster/SKILL.md.claude/skills/evidence-standards.md.claude/skills/zfc-leveling-roadmap/SKILL.md for level-up/rewards changesAllowed verdicts:
proven: Current-head green evidence passes, and a separate worktree with
exactly one adjustment disabled fails for the matching behavior.delete_candidate: Current-head green evidence passes, and a separate
worktree with exactly one adjustment disabled also passes. This is not enough
to delete by itself; it means the adjustment lacks positive necessity proof.insufficient: Required artifacts are missing, stale, non-real, not isolated,
or do not demonstrate the claimed behavior.Do not use delete from a single negative run. Deletion requires a separate
design decision after reviewing broader coverage, risk, and whether the code is
now redundant.
For each adjustment ID:
scripts/adjustment_proof_matrix.py validate.One disabled worktree per adjustment is mandatory. A bundle named for one adjustment cannot prove or delete another adjustment.
Each green and red evidence directory must contain:
run.jsonmetadata.jsonhttp_request_responses.jsonlllm_request_responses.jsonlFor LLM-interacting level-up/rewards paths, unit tests are supporting evidence
only. They cannot establish proven or delete_candidate.
Use the script to list registered adjustments:
./scripts/adjustment_proof_matrix.py list
Create a deterministic proof plan for one adjustment:
./scripts/adjustment_proof_matrix.py plan \
--adjustment-id level_up_atomicity.suppress_unpaired_rewards_box \
--test-command 'MCP_TEST_MODE=real MOCK_SERVICES_MODE=false ./venv/bin/python testing_mcp/core/test_level_up_organic.py --level-up-scenario single-organic'
Follow the generated commands to create independent worktrees. In the red worktree, disable only the named adjustment. Prefer the narrowest possible edit: remove or bypass the exact correction/suppression branch that implements the registry entry. Do not change prompts, test harnesses, unrelated adjusters, or scenario inputs in the red worktree.
After both runs finish, collect and validate:
./scripts/adjustment_proof_matrix.py collect \
--adjustment-id level_up_atomicity.suppress_unpaired_rewards_box \
--green-worktree /tmp/your-project.com/adjustment-proof/<sha>/<slug>/green \
--red-worktree /tmp/your-project.com/adjustment-proof/<sha>/<slug>/red-disabled \
--green-evidence /tmp/your-project.com/<branch>/<green-run>/iteration_001 \
--red-evidence /tmp/your-project.com/<branch>/<red-run>/iteration_001 \
--test-command 'MCP_TEST_MODE=real MOCK_SERVICES_MODE=false ./venv/bin/python testing_mcp/core/test_level_up_organic.py --level-up-scenario single-organic'
./scripts/adjustment_proof_matrix.py validate \
/tmp/your-project.com/adjustment-proof/<sha>/<slug>/proof_manifest.json
The validator emits one of the allowed verdicts. Use that verdict in the PR body or registry update; do not upgrade it manually.
Fail the proof as insufficient when:
run.json or metadata.jsonprovenIf red and green both pass, record delete_candidate and recommend either
deleting in a separate cleanup PR with broader evidence or keeping the registry
entry as runtime evidence missing until that cleanup decision is made.
Use precise wording:
proven: "Disabling <adjustment_id> alone caused <failure> in red while
the same command passed on green at <sha>."delete_candidate: "Disabling <adjustment_id> alone did not reproduce a
failure under this proof command. This does not prove deletion is safe across
all paths."insufficient: "Evidence did not isolate <adjustment_id> or lacked required
real-LLM artifacts."Never write "all blockers resolved" or "all adjustments proven" unless every registered adjustment has its own validated manifest.