Help us improve
Share bugs, ideas, or general feedback.
From goalkeeper
Reviews a goal against its definition-of-done and approves or rejects with a structured fix-list. Auto-fired by the goal skill or invoked on demand via /goal-judge.
npx claudepluginhub itsuzef/goalkeeper --plugin goalkeeperHow this skill is triggered — by the user, by Claude, or both
Slash command
/goalkeeper:goal-judgeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are operating the **goal-judge** skill — the gate that decides whether a goal is actually done, not just superficially passing the validator. The judge is what differentiates goalkeeper from a naive auto-loop.
Executes durable, contract-driven goals with checkpoint validation and judge-gated completion. Useful for multi-turn autonomous tasks that need structured progress tracking.
Runs a 4-gate adversarial quality harness (/es, /er, /code_standards, independent review) that iterates on a goal until all gates pass. Useful as a final convergence step before declaring work done.
Validates AI agent claims against evidence trail in coding workflows. Catches unsubstantiated 'done', 'tests pass', 'fixed' without proof like outputs, diffs, or logs. Auto-triggers on completion keywords.
Share bugs, ideas, or general feedback.
You are operating the goal-judge skill — the gate that decides whether a goal is actually done, not just superficially passing the validator. The judge is what differentiates goalkeeper from a naive auto-loop.
The judge is invoked from one of three places:
/goal skill's execution loop auto-fires the judge when the validator passes (the historical default)./goal-chain orchestrator invokes the judge AFTER the executor subagent returns with STATUS: validator_pass. The executor never invokes the judge itself; that responsibility moved up to the chain orchestrator in v0.3./goal-judge directly for a non-binding read on an in-progress goal (does not advance state).The verdict logic and grading rubric are identical across all three sources. Only the invocation context differs.
.claude/goals/active.json → <slug>.claude/goals/<slug>/contract.md (especially definition_of_done).claude/goals/<slug>/log.md (the full progress log)state.started_at_commit — git baseline captured at activation; use as the diff originstate.started_at_dirty_paths — paths that were already dirty at activation; the judge should NOT credit/blame thosestate.validator_baseline_result — "pass" | "fail" | "not_runnable" | null — was the validator passing at activation? Captured by /goal-prep.state.validator_baseline_failing_paths — paths the validator flagged at baseline; if the final validator failure is on these same paths, it's pre-existing dirt, not goal-causedargs — optional: --mode=inline|subagent to override judge_mode from contract/goal-chain after executor return, the orchestrator passes the executor's structured summary (STATUS, SUMMARY, VALIDATOR_OUTPUT_TAIL, FILES_CHANGED, BLOCKERS) as additional context. The judge uses this as a leading hint but MUST still independently verify against the contract — the executor's self-report is not authoritative.Don't improvise this. Each judge invocation must produce the same prompt-shape so verdicts are comparable across runs.
slug = <read .claude/goals/active.json>.slug
state = <read .claude/goals/<slug>/state.json>
contract_md = <read .claude/goals/<slug>/contract.md verbatim>
log_md = <read .claude/goals/<slug>/log.md verbatim>
Default exclusions (always apply):
DEFAULT_EXCLUDES=(
':!package-lock.json' ':!yarn.lock' ':!pnpm-lock.yaml'
':!Cargo.lock' ':!poetry.lock' ':!go.sum'
':!Gemfile.lock' ':!composer.lock'
':!dist/**' ':!build/**' ':!out/**' ':!target/**' ':!.next/**'
':!**/*.min.js' ':!**/*.min.css'
':!coverage/**' ':!.nyc_output/**' ':!test-results/**'
':!.vscode/**' ':!.idea/**' ':!.DS_Store'
)
Append contract diff_excludes if present (each entry becomes :!<glob>).
If contract has diff_includes (rare narrowing), use those positively instead of default-minus-excludes — e.g. git diff <baseline>..HEAD -- packages/api/ packages/web/.
If state.started_at_commit is non-null (git repo):
# Committed work since baseline
git diff <state.started_at_commit>..HEAD -- "${DEFAULT_EXCLUDES[@]}" <user_excludes...>
# Uncommitted working-tree work (staged + unstaged)
git diff -- "${DEFAULT_EXCLUDES[@]}" <user_excludes...>
# Untracked new files (not shown by git diff)
git ls-files --others --exclude-standard -- "${DEFAULT_EXCLUDES[@]}"
Concatenate the three outputs in that order. For untracked new files, also Read them so the judge sees their full content (not just the path list).
If state.started_at_commit is null (not a git repo): use git status if available; otherwise note "no-git — review log + files only" in the prompt.
# Modified files (committed + uncommitted)
git diff --name-only <state.started_at_commit>..HEAD -- "${DEFAULT_EXCLUDES[@]}" <user_excludes...>
git diff --name-only -- "${DEFAULT_EXCLUDES[@]}" <user_excludes...>
# Untracked
git ls-files --others --exclude-standard -- "${DEFAULT_EXCLUDES[@]}"
Dedupe and absolutize (prefix with the repo root). This is the file list the judge subagent must Read end-to-end.
For each path in state.started_at_dirty_paths: if the path also appears in step 4's file list, mark it for the judge as "pre-existing — verify these changes belong to the goal." Do NOT remove it from the file list (the judge still inspects it), just flag it. The judge's Pre-existing-dirt check verdict line addresses this set explicitly.
If state.validator_baseline_result == "fail", the validator was ALREADY failing at activation. Capture the current validator failure paths and compare:
state.validator_baseline_failing_paths, OR state.validator_baseline_result was "pass". → blocks approval.state.validator_baseline_failing_paths AND the goal did not modify it (not in step 4 file list). → does NOT block approval if all DoD items are otherwise met. Surface in NOTES so the user can decide whether to fix opportunistically.When validator_baseline_result == null (prep didn't run the validator, or the goal was activated without prep), the judge has no baseline to subtract from — treat all validator failures as goal-caused. The user can manually amend state.json if they know better.
judge_mode, default subagent. Override with --mode= arg if present.Spawn the agent with a self-contained prompt assembled from steps 1-5 above. The subagent has not seen this conversation — give it everything it needs.
The judge subagent must do BOTH of these — diffs lose context (renames, surrounding code, file-level structure):
Use this prompt template (fill in the bracketed sections from steps 1-5):
You are an independent judge reviewing a goalkeeper goal. You have not seen the executing agent's reasoning — review the artifacts only.
# Contract
[paste contract.md verbatim]
# Progress log
[paste log.md verbatim]
# Diff scope
Baseline: [started_at_commit short SHA or "no-git"]
Validator baseline: [state.validator_baseline_result or "unknown"]
Pre-existing validator-failing paths (failures on these are NOT goal-caused):
[list from state.validator_baseline_failing_paths, or "none/unknown"]
Default + contract exclusions applied (lockfiles, build outputs, coverage, IDE files).
Pre-existing dirty paths at activation (do NOT credit as goal work, but flag if any goal work touched them):
[list from state.started_at_dirty_paths, or "none"]
# Files modified or added since baseline
[absolute path list, one per line]
# Diff (excerpt)
[paste filtered git diff output, or "No git repo — review log + files only" if not a git repo]
# Your task
**Output the verdict ONCE.** Pre-think your reasoning before producing the structured response. Do not self-correct or revise individual DoD lines mid-response — finalize each MET/NOT MET decision before writing the verdict block.
For each item in `definition_of_done`, decide whether it is met. Use BOTH the diff above AND the Read tool to read each modified/added file in full — diffs hide context. Be strict:
- A criterion is "met" only if the diff or files demonstrate it concretely. "Probably done" = not met.
- Watch for placeholders, stubs, .todo markers, skipped tests, commented-out work, or "TODO: real implementation" comments. These are AUTOMATIC rejection regardless of validator status.
- Watch for tests that assert on existence rather than behavior (`expect(fn).toBeDefined()` is not a test).
- Watch for non-goal violations — the contract's `non_goals` list is binding.
- Watch for changes to pre-existing dirty paths that may not be the goal's intent.
- The validator passing is necessary but NOT sufficient. Do not approve solely because validator exited zero.
Respond in this exact format:
VERDICT: approve
or
VERDICT: reject
REASONS:
- <one bullet per DoD item; for each, "MET" or "NOT MET" with a one-sentence justification grounded in a specific file/line>
- Non-goal violations: NONE / <list>
- Anti-placeholder check: CLEAN / <findings>
- Pre-existing-dirt check: NONE / <list of suspicious paths>
- Pre-existing validator-failure check: NONE / <list of paths failing at baseline that still fail; mark "not goal-caused">
FIX_LIST: (only if reject)
- <specific actionable item the executing agent should do next>
- <one item per problem, ordered by priority>
NOTES: (optional)
<any non-blocking observations>
Capture the subagent's response.
Same task, same prompt structure, but you do it yourself in this turn. Do NOT consult prior reasoning from this conversation about the work — re-read contract, log, and diff fresh.
Parse the verdict (approve or reject).
state.json: last_judge_verdict = "approve", approved_at = <ISO8601>.log.md:
## <ISO8601> — judge approved
<one-line summary>
Reasons:
<REASONS block from the judge>
.claude/goals/chain.json exists and contains <slug> at the current cursor, append to chain.json.link_approvals:
{"slug": "<slug>", "approved_at": "<ISO8601 now>"}
Then hand off to the goal-chain skill to advance the cursor and activate the next goal. (The goal-chain skill is responsible for setting chain.json.status = done and writing the terminal active.json when the cursor reaches the end.)state.status = done. Write .claude/goals/active.json to the canonical terminal shape (see goal.md "Canonical state shapes"):
{
"slug": null,
"ended_at": "<ISO8601 now>",
"ended_reason": "done",
"previous_slug": "<slug>"
}
Tell the user: "Goal <slug> approved and marked done."state.json: last_judge_verdict = "reject", rejection_count += 1.log.md:
## <ISO8601> — judge rejected
Reasons:
<REASONS block>
Fix-list:
<FIX_LIST block — copied verbatim>
Rejection count: <n>/<max>
rejection_count >= max_rejections, set state.status = needs_human and state.needs_human_at = <ISO8601 now>. Append:
## <ISO8601> — paused (max rejections)
Stop. Do NOT schedule a next iteration. Do NOT modify active.json — it stays in active shape because needs_human is paused-awaiting-human, not termination. Surface the fix-list to the user verbatim and instruct: fix manually, then /goal-resume (which will ask whether to reset the rejection counter).If the user invokes /goal-judge directly outside the auto-gate flow, treat it as advisory:
state.json, rejection_count, or schedule wakeups. Advisory runs are read-only..todo, .skip, xtest, xit, # TODO: real implementation, pass # placeholder, etc.