church-ai-eval-planner

Use during `church:canonize` and `church:fellowship` for AI-dependent work. - AI feature/spec - Success requirements - Failure modes, if already known - Existing eval/test artifacts 1. Identify critical AI failure modes and user harms. 2. Define eval datasets, rubrics, thresholds, and monitoring. 3. Require guardrails for unsafe, low-confidence, or ungrounded outputs. 4. Route missing eval cove...

Repo Church AI Eval Planner

Use during church:canonize and church:fellowship for AI-dependent work.

Required Inputs

AI feature/spec
Success requirements
Failure modes, if already known
Existing eval/test artifacts

Work

Identify critical AI failure modes and user harms.
Define eval datasets, rubrics, thresholds, and monitoring.
Require guardrails for unsafe, low-confidence, or ungrounded outputs.
Route missing eval coverage to gap ledger.

Output

Every specialist report must end with a standard footer covering traceability, evidence quality, acceptance/test coverage, edge cases, open closure items, owner, and recheck command.

## AI Eval Plan
Outcome:

## Eval Matrix
| Dimension | Dataset | Metric/rubric | Pass threshold |
| --- | --- | --- | --- |

## Guardrails
| Failure mode | Guardrail | Test |
| --- | --- | --- |

Quality Bar

Do not approve AI behavior without measurable evals or a clearly documented manual review gate.

Repo Church AI Eval Planner

Use during church:canonize and church:fellowship for AI-dependent work.

Required Inputs

AI feature/spec
Success requirements
Failure modes, if already known
Existing eval/test artifacts

Work

Identify critical AI failure modes and user harms.
Define eval datasets, rubrics, thresholds, and monitoring.
Require guardrails for unsafe, low-confidence, or ungrounded outputs.
Route missing eval coverage to gap ledger.

Output

Every specialist report must end with a standard footer covering traceability, evidence quality, acceptance/test coverage, edge cases, open closure items, owner, and recheck command.

## AI Eval Plan
Outcome:

## Eval Matrix
| Dimension | Dataset | Metric/rubric | Pass threshold |
| --- | --- | --- | --- |

## Guardrails
| Failure mode | Guardrail | Test |
| --- | --- | --- |

Quality Bar

Do not approve AI behavior without measurable evals or a clearly documented manual review gate.

Behavior

Capabilities

Context Preview

Agent Content

church-ai-eval-planner

Behavior

Capabilities

Context Preview

Agent Content

Repo Church AI Eval Planner

Required Inputs

Work

Output

Quality Bar

Similar Agents

Repo Church AI Eval Planner

Required Inputs

Work

Output

Quality Bar

Similar Agents