From stdd-agents
Use when creating or updating agent evaluation suites. Defines eval structure, rubrics, and validation patterns.
npx claudepluginhub craigtkhill/stdd-agents --plugin stdd-agentsThis skill uses the workspace's default tool permissions.
Guidelines for creating comprehensive evaluation suites.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Guidelines for creating comprehensive evaluation suites.
Use this skill when:
All evaluations in evals/ follow a consistent structure with both code-based and LLM-as-judge validations.
Use this template for all eval spec.yaml files:
feature:
name: "[Feature Name] Evaluation"
as_a: evaluator
i_want: validate feature behavior
solutions:
- Ground truth validation
- Code-based validation
- LLM-as-judge validation
requirements:
- id: REQ-EVAL-XX-001
eval: G
description: Description of ground truth requirement
- id: REQ-EVAL-XX-002
eval: C
description: Description of code-based requirement
- id: REQ-EVAL-XX-003
eval: L
description: Description of LLM-judged requirement
- id: REQ-EVAL-XX-004
eval: O
description: Description of planned requirement
Template Rules:
REQ-EVAL-XX-NNN
XX = 2-3 letter eval abbreviation (e.g., AG for action_generation, AS for action_scenarios)NNN = Sequential 3-digit number starting at 001[G] = Ground truth validation (matches expected output)[C] = Code-based validation (deterministic checks)[L] = LLM-as-judge validation (quality assessment)[O] = Not yet implemented (planned for future)Use this template for all rubric.md files:
# [Feature Name] Reasoning Trace Rubric
## Format
`[PASS/FAIL] RUBRIC-ID: Criterion description`
## Based on: [Concrete example with specific values]
### [Category Name]
- [ ] RUB-XX-001: Specific, objective criterion
- [ ] RUB-XX-002: Another specific criterion
Template Rules:
RUB-XX-NNN (matches spec.yaml abbreviation)- [ ] format for LLM judge to mark pass/fail