From geas
Evidence Gate v2 — Tier 0 (Precheck) + Tier 1 (Mechanical) + Tier 2 (Contract+Rubric). Returns pass/fail/block/error.
npx claudepluginhub choam2426/geasThis skill uses the workspace's default tool permissions.
Objective verification of whether a worker's output meets the TaskContract requirements. The gate verdict is strictly objective — it answers "does this meet the contract?" with one of four verdicts: **pass**, **fail**, **block**, or **error**.
Validates AI agent claims like 'tests pass' or 'fixed' against evidence trails and tool outputs. Detects stubs and unproven assertions. Auto-triggers at workflow end.
Validates acceptance criteria with evidence checklist after tests/review pass, runs Go tests, checks prior gate, requires explicit user APPROVED/REJECTED to proceed.
Validates phase gates (LOM, ABM, IOC, PRM) by inventorying artifacts, dispatching multi-agent validators, aggregating results, and generating pass/fail reports for phase transitions.
Share bugs, ideas, or general feedback.
Objective verification of whether a worker's output meets the TaskContract requirements. The gate verdict is strictly objective — it answers "does this meet the contract?" with one of four verdicts: pass, fail, block, or error.
iterate is NOT a gate verdict. Product judgment happens in the Final Verdict, which is a separate pipeline step.
.geas/missions/{mission_id}/tasks/{task-id}.json.geas/missions/{mission_id}/tasks/{task-id}/worker-self-check.json.geas/missions/{mission_id}/evidence/{task-id}/. Naming: {agent-type}-review.json when the agent produces multiple artifacts (e.g., design-authority-review.json for specialist review, distinct from design-authority.json design guide), or {agent-type}.json when the agent produces one artifact (e.g., quality-specialist.json for testing, design-authority.json for design guide)| gate_profile | Tier 0 | Tier 1 | Tier 2 | Description |
|---|---|---|---|---|
implementation_change | Run | Run | Run | Standard task involving implementation changes |
artifact_only | Run | Skip | Run (rubric only, no build/test) | Tasks without code changes such as documentation or design |
closure_ready | Run | Skip | Simplified (completeness check only) | Final cleanup tasks such as release or config |
Verify that all prerequisites are in place before running expensive checks.
Required artifacts existence
worker-self-check.json must exist at .geas/missions/{mission_id}/tasks/{task-id}/worker-self-check.jsonimplementation_change profile)Task state eligibility
implementation_change profile: task must be in integrated stateartifact_only profile: task must be in reviewed stateclosure_ready profile: task must be in reviewed or integrated stateBaseline check
implementation_change: verify base_commit ancestry — the integration branch must contain the declared base commitRequired reviewer presence
block. Does NOT consume retry_budget. Gate re-entry not allowed until the artifact is created.error. The orchestration_authority inspects and corrects the state.block. Re-enter after performing revalidation.block. Does NOT consume retry_budget.Stop — do not proceed to Tier 1 or Tier 2.
Run the eval commands from the TaskContract and check results.
Skip conditions: Skip for artifact_only and closure_ready profiles. Record as {"status": "skipped", "details": "Profile does not require mechanical verification"}.
eval_commands from the TaskContract# Run each eval_command from the TaskContract
{eval_command_1}
{eval_command_2}
...
Stop on first failure — no point running contract checks if the code doesn't build.
Important: You MUST execute eval_commands and record the results. Do not assume "pass". If no commands exist, record as
{"status": "skipped", "details": "No eval_commands configured"}. Having commands but not running them is a gate violation.
If previous evidence already contains verify_results, compare them against a fresh run. Trust the fresh run.
Multi-part evaluation of contract compliance and quality.
For closure_ready profile: only run Part A (completeness check). Skip Parts B, C, D.
For each criterion in acceptance_criteria:
criteria_results -> verify their self-assessment{ "criterion": "...", "met": true/false, "evidence": "..." }All criteria must be met to proceed to Part B.
Compare files changed against the task's scope.paths:
scope.paths from the TaskContract (or implementation contract)scope.paths as a potential scope violationVerify that each item in the implementation contract's known_risks has been handled:
.geas/missions/{mission_id}/evolution/debt-register.jsonAny known_risk with no handling status -> Tier 2 fails.
Read the rubric array from the TaskContract. For each dimension:
design_authority review evidence -> code_quality scorequality_specialist evidence -> core_interaction, feature_completeness, regression_safety scoresquality_specialist or communication_specialist evidence -> ux_clarity, visual_coherence scores (UI-sensitive tasks)rubric_scores from their reviewthreshold| dimension | evaluator | default threshold |
|---|---|---|
core_interaction | quality_specialist | 3 |
feature_completeness | quality_specialist | 4 |
code_quality | design_authority | 4 |
regression_safety | quality_specialist | 4 |
UI-sensitive tasks add:
| dimension | evaluator | default threshold |
|---|---|---|
ux_clarity | quality_specialist or communication_specialist | 3 |
visual_coherence | quality_specialist or communication_specialist | 3 |
Read confidence from worker-self-check.json. If confidence <= 2, add +1 to every rubric dimension threshold.
Example: if confidence is 2, thresholds become: core_interaction 3->4, feature_completeness 4->5, code_quality 4->5, regression_safety 4->5.
If possible_stubs[] from the worker self-check is non-empty:
feature_completeness is capped at a maximum of 2blockDefault stub cap by risk_level:
| risk_level | stub cap |
|---|---|
low | 3 |
normal | 2 |
high | 0 |
critical | 0 |
Record rubric results:
{
"rubric_scores": [
{ "dimension": "core_interaction", "score": 4, "threshold": 3, "passed": true },
{ "dimension": "code_quality", "score": 3, "threshold": 4, "passed": false }
],
"blocking_dimensions": ["code_quality"]
}
All rubric dimensions must meet their threshold for Tier 2 to pass. The blocking_dimensions list tells the verify-fix-loop exactly what to target.
fail vs block Distinctionfail: implementation quality issue. Can be fixed via the verify-fix-loop. Consumes 1 from retry_budget.block: structural prerequisite not met. Cannot be resolved by modifying the implementation alone. Does NOT consume retry_budget. Re-enter the gate after resolving the blocking cause.Conditions that produce block:
If the gate verdict is error:
retry_budget is NOT consumederror 3 consecutive times, the task transitions to blocked and the cause is recordedWrite to .geas/missions/{mission_id}/tasks/{task-id}/gate-result.json conforming to schemas/gate-result.schema.json.
{
"version": "1.0",
"artifact_type": "gate_result",
"artifact_id": "gate-{task-id}-{timestamp}",
"producer_type": "quality_specialist",
"created_at": "<actual ISO 8601 from date -u>",
"task_id": "{task-id}",
"gate_profile": "implementation_change",
"verdict": "pass",
"tier_results": {
"tier_0": { "status": "pass", "details": "All prerequisites verified" },
"tier_1": { "status": "pass", "details": "All eval_commands passed" },
"tier_2": { "status": "pass", "details": "All criteria met, rubric passed" }
},
"rubric_scores": [
{ "dimension": "core_interaction", "score": 4, "threshold": 3, "passed": true },
{ "dimension": "feature_completeness", "score": 4, "threshold": 4, "passed": true },
{ "dimension": "code_quality", "score": 4, "threshold": 4, "passed": true },
{ "dimension": "regression_safety", "score": 5, "threshold": 4, "passed": true }
],
"blocking_dimensions": [],
"retry_budget_before": 3,
"retry_budget_after": 3
}
Also log a detailed event to .geas/ledger/events.jsonl:
{
"event": "gate_result",
"task_id": "{task-id}",
"result": "pass",
"gate_profile": "implementation_change",
"tier_results": {
"tier_0": { "status": "pass" },
"tier_1": { "status": "pass" },
"tier_2": { "status": "pass" }
},
"blocking_dimensions": [],
"timestamp": "<actual ISO 8601 from date -u>"
}
"verified" in .geas/missions/{mission_id}/tasks/{task-id}.jsonretry_budget (retry_budget_after = retry_budget_before - 1)/verify-fix-loop with the failure detailsescalation_policy:
"design-authority-review": spawn the design_authority for architectural review, write a DecisionRecord"product-authority-decision": spawn the product_authority for a strategic decision (continue/cut/pivot)"pivot": invoke pivot protocol"escalated".geas/missions/{mission_id}/decisions/{dec-id}.jsonretry_budget (retry_budget_after = retry_budget_before)When the gate results in an escalation or significant decision, write a DecisionRecord:
mkdir -p .geas/missions/{mission_id}/decisions
Write to .geas/missions/{mission_id}/decisions/{dec-id}.json conforming to schemas/decision-record.schema.json.
This creates a durable record of WHY a decision was made, not just WHAT happened.