Skill

scenario

Author and manage holdout behavioral validation scenarios with acceptance vectors for AI agent evaluation.

Install

npx claudepluginhub boshu2/agentops --plugin agentops

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Author and manage holdout scenarios for behavioral validation. Scenarios

Supporting Assets

references/scenario-schema.mdscripts/validate.sh

SKILL.md

Similar Skills

using-git-worktrees

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

168.3k

subagent-driven-development

3 files

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

superpowers

168.3k

dispatching-parallel-agents

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

168.3k

Stats

Stars314

Forks32

Last CommitApr 24, 2026

Actions

View Source View Plugin View on GitHub View README

Scenario Skill

Author and manage holdout scenarios for behavioral validation. Scenarios define what the system should do in narrative form, with measurable acceptance vectors and satisfaction scoring. They live in .agents/holdout/ so implementing agents cannot see them during development.

Quick Start

# Initialize holdout directory
/scenario init

# Add a scenario from a description
/scenario add "user can authenticate with valid credentials"

# List all active scenarios
/scenario list

# Validate scenarios against the schema
/scenario validate

Execution Steps

Step 1: Initialize Holdout Directory

ao scenario init

Creates .agents/holdout/ with a README.md explaining holdout isolation rules. If the directory already exists, this is a no-op.

The README makes clear:

Implementing agents MUST NOT read .agents/holdout/
Only evaluator agents and humans should author scenarios
Hook enforcement prevents implementing agents from accessing holdout files

Step 2: Author Scenarios

Provide a narrative description and the skill generates a schema-compliant JSON scenario file.

ao scenario add "user can authenticate with valid credentials"

The skill will:

Generate an ID (s-YYYY-MM-DD-NNN)
Prompt for or infer the narrative, expected outcome, and acceptance vectors
Set default satisfaction threshold (0.8)
Write to .agents/holdout/s-YYYY-MM-DD-NNN.json

You can also author scenarios manually by writing JSON that conforms to schemas/scenario.v1.schema.json. See Scenario Schema Reference.

Step 3: Validate Scenarios

ao scenario validate

Validates every .json file in .agents/holdout/ against schemas/scenario.v1.schema.json. Reports:

Schema violations (missing fields, wrong types)
Duplicate IDs
Stale scenarios (status = "active" but date > 90 days old)
Acceptance vectors with no check command

Step 4: List Scenarios

ao scenario list

Displays all scenarios with:

ID, goal, status, source, date
Satisfaction threshold
Count of acceptance vectors

Filter options:

ao scenario list --status active
ao scenario list --status draft
ao scenario list --status retired

Step 5: Integration with Validation

Scenarios are consumed by STEP 1.8 in the /validation skill. During validation, the evaluator agent:

Loads all active scenarios from .agents/holdout/
Runs each acceptance vector's check command
Computes a satisfaction score per scenario (0.0-1.0)
Aggregates into an overall holdout score
Fails the validation gate if any scenario falls below its threshold

Key Rules

Holdout Isolation

Scenarios are holdout data. The implementing agent must never see them. This prevents the agent from overfitting to specific test cases instead of building correct general behavior.

Scenarios live in .agents/holdout/, which is outside the codebase
A hook enforces that implementing agents cannot read holdout files
Only evaluator agents, humans, or the /validation skill access scenarios

Satisfaction Scoring

Scenarios use continuous satisfaction scoring (0.0-1.0), not boolean pass/fail. This enables:

Partial credit for incomplete implementations
Trend tracking across iterations
Threshold tuning per scenario based on criticality

Each acceptance vector produces a score, and the scenario's overall score is the weighted average across all vectors.

Authorship Rules

Scenarios should be written by humans or by evaluator agents
The implementing agent MUST NOT author its own scenarios
The source field tracks provenance: human, agent, or prod-telemetry
When an evaluator agent writes scenarios, it should operate in a separate session with no access to implementation details

Scenario Lifecycle

Status	Meaning
`active`	Scenario is evaluated during validation
`retired`	Scenario passed consistently; kept for reference
`blocked`	Scenario cannot be evaluated (missing dependency)
`draft`	Scenario is incomplete; not yet evaluated

Reference Documents

Scenario Schema Reference -- full field documentation and example JSON for the scenario schema

Troubleshooting

Problem	Cause	Fix
`validate` reports missing fields	Schema version mismatch	Check `version` field matches schema expectation
Scenario not picked up by validation	Status is not `active`	Set `"status": "active"` in the JSON
Implementing agent read holdout	Hook not installed	Run `ao scenario init` to verify hook setup
Duplicate ID error	Two scenarios share an ID	Rename one using `s-YYYY-MM-DD-NNN` format
Stale scenario warning	Active scenario older than 90 days	Review and retire or refresh the scenario
Score always 0.0	Check command returns non-zero	Debug the check command independently

scenario

Install

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

scenario

Install

Tool Access

Preview

Supporting Assets

SKILL.md

Scenario Skill

Quick Start

Execution Steps

Step 1: Initialize Holdout Directory

Step 2: Author Scenarios

Step 3: Validate Scenarios

Step 4: List Scenarios

Step 5: Integration with Validation

Key Rules

Holdout Isolation

Satisfaction Scoring

Authorship Rules

Scenario Lifecycle

Reference Documents

Troubleshooting

See Also

Similar Skills

Scenario Skill

Quick Start

Execution Steps

Step 1: Initialize Holdout Directory

Step 2: Author Scenarios

Step 3: Validate Scenarios

Step 4: List Scenarios

Step 5: Integration with Validation

Key Rules

Holdout Isolation

Satisfaction Scoring

Authorship Rules

Scenario Lifecycle

Reference Documents

Troubleshooting

See Also