Help us improve
Share bugs, ideas, or general feedback.
From agentops-toolkit
Create or extend a JSONL evaluation dataset for AgentOps. Trigger on "create dataset", "generate test data", "JSONL", "more eval rows". Infer the agent's domain from the codebase and produce realistic rows; never fabricate data when the domain is unclear.
npx claudepluginhub azure/agentops --plugin agentops-acceleratorHow this skill is triggered — by the user, by Claude, or both
Slash command
/agentops-toolkit:agentops-datasetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate a small, realistic JSONL dataset for the agent under
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Generate a small, realistic JSONL dataset for the agent under
evaluation. Default location: .agentops/data/smoke.jsonl (referenced
from agentops.yaml).
pip install "agentops-toolkit @ git+https://github.com/Azure/agentops.git@develop" if agentops is missing.agentops eval analyze first. If it reports missing dataset columns or
recommends agentops-dataset, use this skill before the first eval run.agentops.yaml does not exist, run agentops init first (the init
wizard will prompt for the agent reference, project endpoint, and
dataset path, then create a starter .agentops/data/smoke.jsonl).Read agentops.yaml (and the agent code) to figure out the agent type,
then choose the row schema:
| Agent type | Required columns | Optional columns |
|---|---|---|
| Direct model / Q&A | input, expected | - |
| RAG | input, expected, context | - |
| Conversational | input, expected | - |
| Tool-using agent | input, expected, tool_calls | tool_definitions |
input is always the user prompt. expected is the gold answer.
context is the retrieved passage(s). tool_calls is a list of
{name, arguments} describing the expected tool invocations.
One JSON object per line, no trailing commas, UTF-8:
{"input": "What is the refund policy?", "expected": "Refunds within 30 days...", "context": "Refund policy: ..."}
Save to the path referenced by dataset: in agentops.yaml (default
.agentops/data/smoke.jsonl).
This file is the AgentOps source of truth. In Foundry cloud evaluation,
AgentOps syncs it to a stable Foundry dataset version by default and reuses the
same Foundry dataset version while the JSONL content is unchanged. If the user
forces dataset_sync.mode: inline, Foundry may show generated eval-data-*
backing assets in the project Data/Datasets page.
Run a quick eval and confirm rows are picked up:
agentops eval run
Open .agentops/results/latest/report.md and confirm the row count
matches.
agentops.yaml at that file rather than generating new rows.eval-data-*, explain that those are
cloud-eval backing assets from inline compatibility mode; normal cloud runs
should use the stable agentops-* Foundry dataset.