Use when creating or updating agent evaluation suites. Defines eval structure, rubrics, and validation patterns.
/plugin marketplace add craigtkhill/stdd-agents/plugin install stdd-agents@stdd-agentsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Guidelines for creating comprehensive evaluation suites.
Use this skill when:
All evaluations in evals/ follow a consistent structure with both code-based and LLM-as-judge validations.
Use this template for all spec.md files:
# [Feature Name] Evaluation Specification
## Requirements
Format: `[IS-EVAL-IMPLEMENTED] IDENTIFIER: example case`
- G = matches ground truth
- C = implemented via code
- L = implemented via LLM as judge using rubric
- O = not yet implemented
### [Category Name 1]
- [G] REQ-EVAL-XX-001: Description of first code-based requirement
- [C] REQ-EVAL-XX-002: Description of second code-based requirement
### [Category Name 2]
- [L] REQ-EVAL-XX-003: Description of LLM-judged requirement
- [O] REQ-EVAL-XX-004: Description of LLM-judged requirement
Template Rules:
REQ-EVAL-XX-NNN
XX = 2-3 letter eval abbreviation (e.g., AG for action_generation, AS for action_scenarios)NNN = Sequential 3-digit number starting at 001[G] = Ground truth validation (matches expected output)[C] = Code-based validation (deterministic checks)[L] = LLM-as-judge validation (quality assessment)[O] = Not yet implemented (planned for future)Use this template for all rubric.md files:
# [Feature Name] Reasoning Trace Rubric
## Format
`[PASS/FAIL] RUBRIC-ID: Criterion description`
## Based on: [Concrete example with specific values]
### [Category Name]
- [ ] RUB-XX-001: Specific, objective criterion
- [ ] RUB-XX-002: Another specific criterion
Template Rules:
RUB-XX-NNN (matches spec.md abbreviation)- [ ] format for LLM judge to mark pass/failThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.