npx claudepluginhub boshu2/agentops --plugin agentopsThis skill uses the workspace's default tool permissions.
Author and manage holdout scenarios for behavioral validation. Scenarios
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Author and manage holdout scenarios for behavioral validation. Scenarios
define what the system should do in narrative form, with measurable
acceptance vectors and satisfaction scoring. They live in .agents/holdout/
so implementing agents cannot see them during development.
# Initialize holdout directory
/scenario init
# Add a scenario from a description
/scenario add "user can authenticate with valid credentials"
# List all active scenarios
/scenario list
# Validate scenarios against the schema
/scenario validate
ao scenario init
Creates .agents/holdout/ with a README.md explaining holdout isolation
rules. If the directory already exists, this is a no-op.
The README makes clear:
.agents/holdout/Provide a narrative description and the skill generates a schema-compliant JSON scenario file.
ao scenario add "user can authenticate with valid credentials"
The skill will:
s-YYYY-MM-DD-NNN).agents/holdout/s-YYYY-MM-DD-NNN.jsonYou can also author scenarios manually by writing JSON that conforms to
schemas/scenario.v1.schema.json. See Scenario Schema Reference.
ao scenario validate
Validates every .json file in .agents/holdout/ against
schemas/scenario.v1.schema.json. Reports:
ao scenario list
Displays all scenarios with:
Filter options:
ao scenario list --status active
ao scenario list --status draft
ao scenario list --status retired
Scenarios are consumed by STEP 1.8 in the /validation skill. During
validation, the evaluator agent:
.agents/holdout/Scenarios are holdout data. The implementing agent must never see them. This prevents the agent from overfitting to specific test cases instead of building correct general behavior.
.agents/holdout/, which is outside the codebase/validation skill access scenariosScenarios use continuous satisfaction scoring (0.0-1.0), not boolean pass/fail. This enables:
Each acceptance vector produces a score, and the scenario's overall score is the weighted average across all vectors.
source field tracks provenance: human, agent, or prod-telemetry| Status | Meaning |
|---|---|
active | Scenario is evaluated during validation |
retired | Scenario passed consistently; kept for reference |
blocked | Scenario cannot be evaluated (missing dependency) |
draft | Scenario is incomplete; not yet evaluated |
| Problem | Cause | Fix |
|---|---|---|
validate reports missing fields | Schema version mismatch | Check version field matches schema expectation |
| Scenario not picked up by validation | Status is not active | Set "status": "active" in the JSON |
| Implementing agent read holdout | Hook not installed | Run ao scenario init to verify hook setup |
| Duplicate ID error | Two scenarios share an ID | Rename one using s-YYYY-MM-DD-NNN format |
| Stale scenario warning | Active scenario older than 90 days | Review and retire or refresh the scenario |
| Score always 0.0 | Check command returns non-zero | Debug the check command independently |
/validation -- consumes scenarios at STEP 1.8 for holdout evaluation/council -- multi-model review can generate scenario suggestions/vibe -- code quality validation (complementary to behavioral scenarios)