Skill

evidence-gate

Evidence Gate v2 — Tier 0 (Precheck) + Tier 1 (Mechanical) + Tier 2 (Contract+Rubric). Returns pass/fail/block/error.

npx claudepluginhub choam2426/geas

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Objective verification of whether a worker's output meets the TaskContract requirements. The gate verdict is strictly objective — it answers "does this meet the contract?" with one of four verdicts: **pass**, **fail**, **block**, or **error**.

Supporting Assets

schemas/decision-record.schema.jsonschemas/evidence-bundle.schema.jsonschemas/gate-result.schema.json

SKILL.md

Similar Skills

completion-gate

Validates AI agent claims like 'tests pass' or 'fixed' against evidence trails and tool outputs. Detects stubs and unproven assertions. Auto-triggers at workflow end.

rune

ring:dev-validation

169

Validates acceptance criteria with evidence checklist after tests/review pass, runs Go tests, checks prior gate, requires explicit user APPROVED/REJECTED to proceed.

ring-dev-team

gate-evaluation

104

Validates phase gates (LOM, ABM, IOC, PRM) by inventorying artifacts, dispatching multi-agent validators, aggregating results, and generating pass/fail reports for phase transitions.

sdlc

Stats

Parent Repo Stars4

Parent Repo Forks2

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Evidence Gate

iterate is NOT a gate verdict. Product judgment happens in the Final Verdict, which is a separate pipeline step.

Inputs

TaskContract — read from .geas/missions/{mission_id}/tasks/{task-id}.json
Worker Self-Check — read from .geas/missions/{mission_id}/tasks/{task-id}/worker-self-check.json
Specialist Reviews — read from .geas/missions/{mission_id}/evidence/{task-id}/. Naming: {agent-type}-review.json when the agent produces multiple artifacts (e.g., design-authority-review.json for specialist review, distinct from design-authority.json design guide), or {agent-type}.json when the agent produces one artifact (e.g., quality-specialist.json for testing, design-authority.json for design guide)
Integration Result — merge status from worktree integration
Gate profile — determines which tiers to run (see below)

Gate Profiles

gate_profile	Tier 0	Tier 1	Tier 2	Description
`implementation_change`	Run	Run	Run	Standard task involving implementation changes
`artifact_only`	Run	Skip	Run (rubric only, no build/test)	Tasks without code changes such as documentation or design
`closure_ready`	Run	Skip	Simplified (completeness check only)	Final cleanup tasks such as release or config

Tier 0 — Precheck

Verify that all prerequisites are in place before running expensive checks.

Checks

Required artifacts existence
- worker-self-check.json must exist at .geas/missions/{mission_id}/tasks/{task-id}/worker-self-check.json
- Specialist reviews must exist (per the task's required reviewer set)
- Integration result must be recorded (for implementation_change profile)
Task state eligibility
- implementation_change profile: task must be in integrated state
- artifact_only profile: task must be in reviewed state
- closure_ready profile: task must be in reviewed or integrated state
Baseline check
- For implementation_change: verify base_commit ancestry — the integration branch must contain the declared base commit
Required reviewer presence
- All agent types listed in the task's required reviewer set must have submitted reviews

On Tier 0 Failure

Missing required artifact: verdict = block. Does NOT consume retry_budget. Gate re-entry not allowed until the artifact is created.
Task state ineligible: verdict = error. The orchestration_authority inspects and corrects the state.
Baseline mismatch: verdict = block. Re-enter after performing revalidation.
Missing required reviewer: verdict = block. Does NOT consume retry_budget.

Stop — do not proceed to Tier 1 or Tier 2.

Tier 1 — Mechanical

Run the eval commands from the TaskContract and check results.

Skip conditions: Skip for artifact_only and closure_ready profiles. Record as {"status": "skipped", "details": "Profile does not require mechanical verification"}.

Procedure

Read eval_commands from the TaskContract

Run each command:

# Run each eval_command from the TaskContract
{eval_command_1}
{eval_command_2}
...

Record results:
- pass: command exits 0
- fail: command exits non-zero (capture error output)

Stop on first failure — no point running contract checks if the code doesn't build.

Important: You MUST execute eval_commands and record the results. Do not assume "pass". If no commands exist, record as {"status": "skipped", "details": "No eval_commands configured"}. Having commands but not running them is a gate violation.

If previous evidence already contains verify_results, compare them against a fresh run. Trust the fresh run.

Tier 2 — Contract + Rubric

Multi-part evaluation of contract compliance and quality.

For closure_ready profile: only run Part A (completeness check). Skip Parts B, C, D.

Part A: Acceptance Criteria

For each criterion in acceptance_criteria:

Read the worker's evidence (summary, files_changed, criteria_results if present)
Assess whether the criterion is met:
- If the worker provided criteria_results -> verify their self-assessment
- If not -> infer from the evidence (files changed, test results, code inspection)
Record: { "criterion": "...", "met": true/false, "evidence": "..." }

All criteria must be met to proceed to Part B.

Part B: Scope Violation Check

Compare files changed against the task's scope.paths:

Read scope.paths from the TaskContract (or implementation contract)
List all files modified by the worker
Flag any file outside scope.paths as a potential scope violation
Minor scope violations (e.g., shared config files): record as warning
Major scope violations (touching unrelated modules): Tier 2 fails

Part C: Known Risk Handling

Verify that each item in the implementation contract's known_risks has been handled:

Mitigated: evidence shows the risk was addressed
Accepted with rationale: explicit rationale recorded for accepting the risk
Deferred to debt: recorded in .geas/missions/{mission_id}/evolution/debt-register.json

Any known_risk with no handling status -> Tier 2 fails.

Part D: Rubric Scoring

Read the rubric array from the TaskContract. For each dimension:

Identify the evaluator's evidence:
- design_authority review evidence -> code_quality score
- quality_specialist evidence -> core_interaction, feature_completeness, regression_safety scores
- quality_specialist or communication_specialist evidence -> ux_clarity, visual_coherence scores (UI-sensitive tasks)
Read the evaluator's rubric_scores from their review
Compare each score against the dimension's threshold

Default Dimensions

dimension	evaluator	default threshold
`core_interaction`	`quality_specialist`	3
`feature_completeness`	`quality_specialist`	4
`code_quality`	`design_authority`	4
`regression_safety`	`quality_specialist`	4

UI-sensitive tasks add:

dimension	evaluator	default threshold
`ux_clarity`	`quality_specialist` or `communication_specialist`	3
`visual_coherence`	`quality_specialist` or `communication_specialist`	3

Low Confidence Threshold Adjustment

Read confidence from worker-self-check.json. If confidence <= 2, add +1 to every rubric dimension threshold.

Example: if confidence is 2, thresholds become: core_interaction 3->4, feature_completeness 4->5, code_quality 4->5, regression_safety 4->5.

Stub Check

If possible_stubs[] from the worker self-check is non-empty:

Verify those locations are not left as stubs
If confirmed stubs exist: feature_completeness is capped at a maximum of 2
If confirmed stub count exceeds the stub cap: gate immediately returns block

Default stub cap by risk_level:

risk_level	stub cap
`low`	3
`normal`	2
`high`	0
`critical`	0

Rubric Result

Record rubric results:

{
  "rubric_scores": [
    { "dimension": "core_interaction", "score": 4, "threshold": 3, "passed": true },
    { "dimension": "code_quality", "score": 3, "threshold": 4, "passed": false }
  ],
  "blocking_dimensions": ["code_quality"]
}

All rubric dimensions must meet their threshold for Tier 2 to pass. The blocking_dimensions list tells the verify-fix-loop exactly what to target.

`fail` vs `block` Distinction

fail: implementation quality issue. Can be fixed via the verify-fix-loop. Consumes 1 from retry_budget.
block: structural prerequisite not met. Cannot be resolved by modifying the implementation alone. Does NOT consume retry_budget. Re-enter the gate after resolving the blocking cause.

Conditions that produce block:

Tier 0: missing required artifact, baseline mismatch, missing required reviewer
Tier 2: stub cap exceeded, required specialist review missing

Gate Error Handling

If the gate verdict is error:

retry_budget is NOT consumed
The orchestration_authority resolves the cause and re-runs the gate
If the same cause produces error 3 consecutive times, the task transitions to blocked and the cause is recorded

Output

Write to .geas/missions/{mission_id}/tasks/{task-id}/gate-result.json conforming to schemas/gate-result.schema.json.

{
  "version": "1.0",
  "artifact_type": "gate_result",
  "artifact_id": "gate-{task-id}-{timestamp}",
  "producer_type": "quality_specialist",
  "created_at": "<actual ISO 8601 from date -u>",
  "task_id": "{task-id}",
  "gate_profile": "implementation_change",
  "verdict": "pass",
  "tier_results": {
    "tier_0": { "status": "pass", "details": "All prerequisites verified" },
    "tier_1": { "status": "pass", "details": "All eval_commands passed" },
    "tier_2": { "status": "pass", "details": "All criteria met, rubric passed" }
  },
  "rubric_scores": [
    { "dimension": "core_interaction", "score": 4, "threshold": 3, "passed": true },
    { "dimension": "feature_completeness", "score": 4, "threshold": 4, "passed": true },
    { "dimension": "code_quality", "score": 4, "threshold": 4, "passed": true },
    { "dimension": "regression_safety", "score": 5, "threshold": 4, "passed": true }
  ],
  "blocking_dimensions": [],
  "retry_budget_before": 3,
  "retry_budget_after": 3
}

Also log a detailed event to .geas/ledger/events.jsonl:

{
  "event": "gate_result",
  "task_id": "{task-id}",
  "result": "pass",
  "gate_profile": "implementation_change",
  "tier_results": {
    "tier_0": { "status": "pass" },
    "tier_1": { "status": "pass" },
    "tier_2": { "status": "pass" }
  },
  "blocking_dimensions": [],
  "timestamp": "<actual ISO 8601 from date -u>"
}

On Pass

Update TaskContract status to "verified" in .geas/missions/{mission_id}/tasks/{task-id}.json
Log the gate_result event (see Output above)
Return to the pipeline — next step is Closure Packet assembly, then Critical Reviewer Challenge, then Final Verdict

On Fail

Decrement retry_budget (retry_budget_after = retry_budget_before - 1)
If retries remain:
- Invoke /verify-fix-loop with the failure details
- The fix loop dispatches the worker with failure context
- After fix, re-run the gate
If retries exhausted:
- Follow the escalation_policy:
  - "design-authority-review": spawn the design_authority for architectural review, write a DecisionRecord
  - "product-authority-decision": spawn the product_authority for a strategic decision (continue/cut/pivot)
  - "pivot": invoke pivot protocol
- Update TaskContract status to "escalated"
- Write a DecisionRecord to .geas/missions/{mission_id}/decisions/{dec-id}.json

On Block

Do NOT decrement retry_budget (retry_budget_after = retry_budget_before)
Record the blocking cause in the gate result
Task cannot re-enter the gate until the blocking cause is resolved
The orchestration_authority is responsible for resolving the block

Decision Records

When the gate results in an escalation or significant decision, write a DecisionRecord:

mkdir -p .geas/missions/{mission_id}/decisions

Write to .geas/missions/{mission_id}/decisions/{dec-id}.json conforming to schemas/decision-record.schema.json.

This creates a durable record of WHY a decision was made, not just WHAT happened.

evidence-gate

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

evidence-gate

Tool Access

Preview

Supporting Assets

SKILL.md

Evidence Gate

Inputs

Gate Profiles

Tier 0 — Precheck

Checks

On Tier 0 Failure

Tier 1 — Mechanical

Procedure

Tier 2 — Contract + Rubric

Part A: Acceptance Criteria

Part B: Scope Violation Check

Part C: Known Risk Handling

Part D: Rubric Scoring

Default Dimensions

Low Confidence Threshold Adjustment

Stub Check

Rubric Result

fail vs block Distinction

Gate Error Handling

Output

On Pass

On Fail

On Block

Decision Records

Similar Skills

Help us improve

Evidence Gate

Inputs

Gate Profiles

Tier 0 — Precheck

Checks

On Tier 0 Failure

Tier 1 — Mechanical

Procedure

Tier 2 — Contract + Rubric

Part A: Acceptance Criteria

Part B: Scope Violation Check

Part C: Known Risk Handling

Part D: Rubric Scoring

Default Dimensions

Low Confidence Threshold Adjustment

Stub Check

Rubric Result

fail vs block Distinction

Gate Error Handling

Output

On Pass

On Fail

On Block

Decision Records

`fail` vs `block` Distinction

`fail` vs `block` Distinction