Skill

agentops-dataset

Creates or extends a JSONL dataset for AgentOps release-readiness evaluation gates by inferring the agent's domain from the codebase and generating realistic rows.

testing

developer-tools

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agentops-accelerator:agentops-dataset

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generate a small, realistic JSONL dataset for the agent under evaluation.

SKILL.md

85 lines · ~851 tokens

Stats

LanguagePython

Parent stars10

Parent forks8

MaintenanceExcellent

Last CommitJul 16, 2026

Actions

View Source View Plugin View on GitHub View README

AgentOps Dataset

Generate a small, realistic JSONL dataset for the agent under evaluation. Default location: .agentops/data/smoke.jsonl (referenced from agentops.yaml). These rows are repo-side release-gate inputs: keep them reviewable and deterministic, not a full replacement for Foundry dataset management.

Step 0 - Prerequisites

pip install "agentops-accelerator @ git+https://github.com/Azure/agentops.git@main" if agentops is missing.
Run agentops eval analyze first. If it reports missing dataset columns or recommends agentops-dataset, use this skill before the first eval run.
If agentops.yaml does not exist, run agentops init first (the init wizard will prompt for the agent reference, project endpoint, and dataset path, then create a starter .agentops/data/smoke.jsonl).

Step 1 - Pick the columns

Read agentops.yaml (and the agent code) to figure out the agent type, then choose the row schema:

Agent type	Required columns	Optional columns
Direct model / Q&A	`input`, `expected`	-
RAG	`input`, `expected`, `context`	-
Conversational	`input`, `expected`	-
Tool-using agent	`input`, `expected`, `tool_calls`	`tool_definitions`

input is always the user prompt. expected is the gold answer. context is the retrieved passage(s). tool_calls is a list of {name, arguments} describing the expected tool invocations.

Step 2 - Ground the rows in the codebase

Read the README, system prompt, tool definitions, and any sample fixtures.
Generate 5–10 rows that exercise the agent's actual capabilities.
If the domain is unclear, generate a tiny generic draft and clearly flag it as a placeholder.

Step 3 - Write the JSONL

One JSON object per line, no trailing commas, UTF-8:

{"input": "What is the refund policy?", "expected": "Refunds within 30 days...", "context": "Refund policy: ..."}

Save to the path referenced by dataset: in agentops.yaml (default .agentops/data/smoke.jsonl).

This file is the AgentOps source of truth. In Foundry cloud evaluation, AgentOps syncs it to a stable Foundry dataset version by default and reuses the same Foundry dataset version while the JSONL content is unchanged. If the user forces dataset_sync.mode: inline, Foundry may show generated eval-data-* backing assets in the project Data/Datasets page.

Step 4 - Sanity-check

Run a quick eval and confirm rows are picked up:

agentops eval run

Open .agentops/results/latest/report.md and confirm the row count matches.

Guardrails

Do not invent customer data, real names, or sensitive content.
Keep rows short - datasets are meant to be quick gates, not full QA suites.
If the user already has a domain dataset, prefer pointing agentops.yaml at that file rather than generating new rows.
If the user asks why Foundry shows eval-data-*, explain that those are cloud-eval backing assets from inline compatibility mode; normal cloud runs should use the stable agentops-* Foundry dataset.

agentops-dataset

Popularity

Invocation

Context Preview

SKILL.md

agentops-dataset

Popularity

Invocation

Context Preview

SKILL.md

AgentOps Dataset

Step 0 - Prerequisites

Step 1 - Pick the columns

Step 2 - Ground the rows in the codebase

Step 3 - Write the JSONL

Step 4 - Sanity-check

Guardrails

Similar Skills

AgentOps Dataset

Step 0 - Prerequisites

Step 1 - Pick the columns

Step 2 - Ground the rows in the codebase

Step 3 - Write the JSONL

Step 4 - Sanity-check

Guardrails

Similar Skills