Manages Arize ML evaluation datasets with ax CLI: list/get metadata, CRUD, append examples, export data, handle versions/spaces.
From awesome-copilotnpx claudepluginhub ctr26/dotfiles --plugin awesome-copilotThis skill uses the workspace's default tool permissions.
references/ax-profiles.mdreferences/ax-setup.mdFetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Retrieves current documentation, API references, and code examples for libraries, frameworks, SDKs, CLIs, and services via Context7 CLI. Ideal for API syntax, configs, migrations, and setup queries.
Uses ctx7 CLI to fetch current library docs, manage AI coding skills (install/search/generate), and configure Context7 MCP for AI editors.
question, answer, context)System-managed fields on examples (id, created_at, updated_at) are auto-generated by the server -- never include them in create or append payloads.
Proceed directly with the task — run the ax command you need. Do NOT check versions, env vars, or profiles upfront.
If an ax command fails, troubleshoot based on the error:
command not found or version error → see references/ax-setup.md401 Unauthorized / missing API key → run ax profiles show to inspect the current profile. If the profile is missing or the API key is wrong: check .env for ARIZE_API_KEY and use it to create/update the profile via references/ax-profiles.md. If .env has no key either, ask the user for their Arize API key (https://app.arize.com/admin > API Keys).env for ARIZE_SPACE_ID, or run ax spaces list -o json, or ask the user.env for ARIZE_DEFAULT_PROJECT, or ask, or run ax projects list -o json --limit 100 and present as selectable optionsax datasets listBrowse datasets in a space. Output goes to stdout.
ax datasets list
ax datasets list --space-id SPACE_ID --limit 20
ax datasets list --cursor CURSOR_TOKEN
ax datasets list -o json
| Flag | Type | Default | Description |
|---|---|---|---|
--space-id | string | from profile | Filter by space |
--limit, -l | int | 15 | Max results (1-100) |
--cursor | string | none | Pagination cursor from previous response |
-o, --output | string | table | Output format: table, json, csv, parquet, or file path |
-p, --profile | string | default | Configuration profile |
ax datasets getQuick metadata lookup -- returns dataset name, space, timestamps, and version list.
ax datasets get DATASET_ID
ax datasets get DATASET_ID -o json
| Flag | Type | Default | Description |
|---|---|---|---|
DATASET_ID | string | required | Positional argument |
-o, --output | string | table | Output format |
-p, --profile | string | default | Configuration profile |
| Field | Type | Description |
|---|---|---|
id | string | Dataset ID |
name | string | Dataset name |
space_id | string | Space this dataset belongs to |
created_at | datetime | When the dataset was created |
updated_at | datetime | Last modification time |
versions | array | List of dataset versions (id, name, dataset_id, created_at, updated_at) |
ax datasets exportDownload all examples to a file. Use --all for datasets larger than 500 examples (unlimited bulk export).
ax datasets export DATASET_ID
# -> dataset_abc123_20260305_141500/examples.json
ax datasets export DATASET_ID --all
ax datasets export DATASET_ID --version-id VERSION_ID
ax datasets export DATASET_ID --output-dir ./data
ax datasets export DATASET_ID --stdout
ax datasets export DATASET_ID --stdout | jq '.[0]'
| Flag | Type | Default | Description |
|---|---|---|---|
DATASET_ID | string | required | Positional argument |
--version-id | string | latest | Export a specific dataset version |
--all | bool | false | Unlimited bulk export (use for datasets > 500 examples) |
--output-dir | string | . | Output directory |
--stdout | bool | false | Print JSON to stdout instead of file |
-p, --profile | string | default | Configuration profile |
Agent auto-escalation rule: If an export returns exactly 500 examples, the result is likely truncated — re-run with --all to get the full dataset.
Export completeness verification: After exporting, confirm the row count matches what the server reports:
# Get the server-reported count from dataset metadata
ax datasets get DATASET_ID -o json | jq '.versions[-1] | {version: .id, examples: .example_count}'
# Compare to what was exported
jq 'length' dataset_*/examples.json
# If counts differ, re-export with --all
Output is a JSON array of example objects. Each example has system fields (id, created_at, updated_at) plus all user-defined fields:
[
{
"id": "ex_001",
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z",
"question": "What is 2+2?",
"answer": "4",
"topic": "math"
}
]
ax datasets createCreate a new dataset from a data file.
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.csv
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.json
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.jsonl
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.parquet
| Flag | Type | Required | Description |
|---|---|---|---|
--name, -n | string | yes | Dataset name |
--space-id | string | yes | Space to create the dataset in |
--file, -f | path | yes | Data file: CSV, JSON, JSONL, or Parquet |
-o, --output | string | no | Output format for the returned dataset metadata |
-p, --profile | string | no | Configuration profile |
Use --file - to pipe data directly — no temp file needed:
echo '[{"question": "What is 2+2?", "answer": "4"}]' | ax datasets create --name "my-dataset" --space-id SPACE_ID --file -
# Or with a heredoc
ax datasets create --name "my-dataset" --space-id SPACE_ID --file - << 'EOF'
[{"question": "What is 2+2?", "answer": "4"}]
EOF
To add rows to an existing dataset, use ax datasets append --json '[...]' instead — no file needed.
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Column headers become field names |
| JSON | .json | Array of objects |
| JSON Lines | .jsonl | One object per line (NOT a JSON array) |
| Parquet | .parquet | Column names become field names; preserves types |
Format gotchas:
null becomes empty string. Use JSON/Parquet to preserve types.[{...}, {...}]) in a .jsonl file will fail — use .json extension instead.pandas/pyarrow to read locally: pd.read_parquet("examples.parquet").ax datasets appendAdd examples to an existing dataset. Two input modes -- use whichever fits.
Generate the payload directly -- no temp files needed:
ax datasets append DATASET_ID --json '[{"question": "What is 2+2?", "answer": "4"}]'
ax datasets append DATASET_ID --json '[
{"question": "What is gravity?", "answer": "A fundamental force..."},
{"question": "What is light?", "answer": "Electromagnetic radiation..."}
]'
ax datasets append DATASET_ID --file new_examples.csv
ax datasets append DATASET_ID --file additions.json
ax datasets append DATASET_ID --json '[{"q": "..."}]' --version-id VERSION_ID
| Flag | Type | Required | Description |
|---|---|---|---|
DATASET_ID | string | yes | Positional argument |
--json | string | mutex | JSON array of example objects |
--file, -f | path | mutex | Data file (CSV, JSON, JSONL, Parquet) |
--version-id | string | no | Append to a specific version (default: latest) |
-o, --output | string | no | Output format for the returned dataset metadata |
-p, --profile | string | no | Configuration profile |
Exactly one of --json or --file is required.
Schema validation before append: If the dataset already has examples, inspect its schema before appending to avoid silent field mismatches:
# Check existing field names in the dataset
ax datasets export DATASET_ID --stdout | jq '.[0] | keys'
# Verify your new data has matching field names
echo '[{"question": "..."}]' | jq '.[0] | keys'
# Both outputs should show the same user-defined fields
Fields are free-form: extra fields in new examples are added, and missing fields become null. However, typos in field names (e.g., queston vs question) create new columns silently -- verify spelling before appending.
ax datasets deleteax datasets delete DATASET_ID
ax datasets delete DATASET_ID --force # skip confirmation prompt
| Flag | Type | Default | Description |
|---|---|---|---|
DATASET_ID | string | required | Positional argument |
--force, -f | bool | false | Skip confirmation prompt |
-p, --profile | string | default | Configuration profile |
Users often refer to datasets by name rather than ID. Resolve a name to an ID before running other commands:
# Find dataset ID by name
ax datasets list -o json | jq '.[] | select(.name == "eval-set-v1") | .id'
# If the list is paginated, fetch more
ax datasets list -o json --limit 100 | jq '.[] | select(.name | test("eval-set")) | {id, name}'
input, expected_output)
--file - (see the Create Dataset section)ax datasets create --name "eval-set-v1" --space-id SPACE_ID --file eval_data.csvax datasets get DATASET_ID# Find the dataset
ax datasets list
# Append inline or from a file (see Append Examples section for full syntax)
ax datasets append DATASET_ID --json '[{"question": "...", "answer": "..."}]'
ax datasets append DATASET_ID --file additional_examples.csv
ax datasets list -- find the datasetax datasets export DATASET_ID -- download to filejq '.[] | .question' dataset_*/examples.json# List versions
ax datasets get DATASET_ID -o json | jq '.versions'
# Export that version
ax datasets export DATASET_ID --version-id VERSION_ID
ax datasets export DATASET_IDax datasets append DATASET_ID --file new_rows.csvax datasets create --name "eval-set-v2" --space-id SPACE_ID --file updated_data.json# Count examples
ax datasets export DATASET_ID --stdout | jq 'length'
# Extract a single field
ax datasets export DATASET_ID --stdout | jq '.[].question'
# Convert to CSV with jq
ax datasets export DATASET_ID --stdout | jq -r '.[] | [.question, .answer] | @csv'
Examples are free-form JSON objects. There is no fixed schema -- columns are whatever fields you provide. System-managed fields are added by the server:
| Field | Type | Managed by | Notes |
|---|---|---|---|
id | string | server | Auto-generated UUID. Required on update, forbidden on create/append |
created_at | datetime | server | Immutable creation timestamp |
updated_at | datetime | server | Auto-updated on modification |
| (any user field) | any JSON type | user | String, number, boolean, null, nested object, array |
arize-tracearize-experimentarize-prompt-optimization| Problem | Solution |
|---|---|
ax: command not found | See references/ax-setup.md |
401 Unauthorized | API key is wrong, expired, or doesn't have access to this space. Fix the profile using references/ax-profiles.md. |
No profile found | No profile is configured. See references/ax-profiles.md to create one. |
Dataset not found | Verify dataset ID with ax datasets list |
File format error | Supported: CSV, JSON, JSONL, Parquet. Use --file - to read from stdin. |
platform-managed column | Remove id, created_at, updated_at from create/append payloads |
reserved column | Remove time, count, or any source_record_* field |
Provide either --json or --file | Append requires exactly one input source |
Examples array is empty | Ensure your JSON array or file contains at least one example |
not a JSON object | Each element in the --json array must be a {...} object, not a string or number |
See references/ax-profiles.md § Save Credentials for Future Use.