npx claudepluginhub arize-ai/arize-claude-code-plugin --plugin arize-platformThis skill uses the workspace's default tool permissions.
Manage datasets in the Arize AI platform using the `ax` CLI.
Creates, manages, and queries Arize datasets and examples using ax CLI. Handles CRUD, appending examples, exporting data, file-based creation for test sets, golden datasets, and ML evaluations.
Manages Arize AI projects via ax CLI: lists with pagination/options, resolves names to IDs, gets details, creates, deletes. Includes jq parsing examples for scripting.
Manages Arize ML platform resources like models, monitors, prompts, evaluators, dashboards, spaces via arize_toolkit CLI. Lists, creates, updates, deletes resources, configures profiles, handles admin tasks from terminal.
Share bugs, ideas, or general feedback.
Manage datasets in the Arize AI platform using the ax CLI.
The user must have:
pip install arize-ax-cli)ax config init)ax datasets list
Options:
--output <format> - Output format: table (default), json, csv, parquet--profile <name> - Use specific configuration profile--limit <n> - Limit number of results--offset <n> - Skip first n results (pagination)Examples:
# List as table (default)
ax datasets list
# List as JSON
ax datasets list --output json
# List with pagination
ax datasets list --limit 10 --offset 0
# Use production profile
ax datasets list --profile production
Extracting Dataset IDs:
To find a specific dataset ID for use in other operations:
# Get all dataset IDs and names as JSON
ax datasets list --output json | jq '.[] | {id: .id, name: .name}'
# Find a dataset ID by name
ax datasets list --output json | jq -r '.[] | select(.name == "Training Data") | .id'
# Save dataset ID to a variable
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "Training Data") | .id')
echo "Found dataset: $DATASET_ID"
# Use the ID in subsequent commands
ax datasets get "$DATASET_ID"
ax datasets delete "$DATASET_ID"
Without jq (using grep):
# List with grep to find dataset
ax datasets list --output json | grep -A 2 "Training Data" | grep "id"
# More reliable pattern
ax datasets list --output json | grep -B 1 '"name": "Training Data"' | grep "id" | cut -d'"' -f4
Retrieve information about a specific dataset:
ax datasets get <dataset-id>
Options:
--output <format> - Output format--profile <name> - Configuration profile to useExamples:
# Get dataset details
ax datasets get ds_abc123xyz
# Get as JSON
ax datasets get ds_abc123xyz --output json
# Get from production environment
ax datasets get ds_abc123xyz --profile production
Create a dataset from a file:
ax datasets create --file <path> [options]
Supported File Formats:
.csv).json, .jsonl).parquet)Options:
--name <name> - Dataset name (required or inferred from filename)--description <text> - Dataset description--profile <name> - Configuration profile to useExamples:
# Create from CSV
ax datasets create --file data.csv --name "Training Data" --description "Production training set"
# Create from JSON
ax datasets create --file examples.json --name "Test Examples"
# Create from Parquet
ax datasets create --file dataset.parquet --name "Large Dataset"
# Use staging profile
ax datasets create --file data.csv --name "Test Data" --profile staging
Remove a dataset from Arize:
ax datasets delete <dataset-id>
Options:
--profile <name> - Configuration profile to use--yes or -y - Skip confirmation promptExamples:
# Delete with confirmation
ax datasets delete ds_abc123xyz
# Delete without confirmation
ax datasets delete ds_abc123xyz --yes
# Delete from production
ax datasets delete ds_abc123xyz --profile production
⚠️ Warning: Deletion is permanent. Always verify the dataset ID before deleting.
Export dataset examples to various formats:
ax datasets get <dataset-id> --output <format>
Export Formats:
json - JSON formatcsv - Comma-separated valuesparquet - Apache Parquet formatExamples:
# Export to JSON
ax datasets get ds_abc123xyz --output json > dataset.json
# Export to CSV
ax datasets get ds_abc123xyz --output csv > dataset.csv
# Export to Parquet
ax datasets get ds_abc123xyz --output parquet > dataset.parquet
When working across different environments (dev, staging, production):
# List datasets in production
ax datasets list --profile production
# Create dataset in staging
ax datasets create --file test_data.csv --profile staging
# Get dataset from dev environment
ax datasets get ds_dev_123 --profile dev
For accounts with many datasets, use pagination:
# First page (10 items)
ax datasets list --limit 10 --offset 0
# Second page
ax datasets list --limit 10 --offset 10
# Third page
ax datasets list --limit 10 --offset 20
# 1. List all datasets and find the one you want
ax datasets list --output json | jq '.[] | {id: .id, name: .name}'
# 2. Extract the specific dataset ID by name
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "Production Data") | .id')
# 3. Get detailed information about that dataset
ax datasets get "$DATASET_ID"
# 4. Export the dataset if needed
ax datasets get "$DATASET_ID" --output csv > dataset_export.csv
# 1. Create dataset
ax datasets create --file data.csv --name "My Dataset"
# 2. Find the new dataset ID
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "My Dataset") | .id')
echo "Created dataset: $DATASET_ID"
# 3. Verify details
ax datasets get "$DATASET_ID"
# 1. Export existing dataset
ax datasets get ds_abc123 --output csv > dataset.csv
# 2. Modify the CSV file (manual editing)
# 3. Create new version
ax datasets create --file dataset.csv --name "Updated Dataset v2"
# 1. Export from production
ax datasets get ds_prod_123 --profile production --output json > prod_data.json
# 2. Import to staging
ax datasets create --file prod_data.json --name "Production Copy" --profile staging
# 1. List all datasets
ax datasets list --output json > all_datasets.json
# 2. Review and identify datasets to delete (manual review)
# 3. Delete old datasets
ax datasets delete ds_old_001 --yes
ax datasets delete ds_old_002 --yes
Human-readable table with columns for ID, Name, Created, and Status.
Structured JSON with full dataset metadata:
{
"id": "ds_abc123xyz",
"name": "Training Data",
"description": "Production training set",
"created_at": "2024-01-15T10:30:00Z",
"num_examples": 1000,
"size_bytes": 52428800
}
Comma-separated values, useful for importing into spreadsheets or pandas.
Efficient columnar format, ideal for large datasets and data processing.
ax datasets listax config showax config show --expandax config initSupported formats are CSV, JSON (including JSONL), and Parquet. Check:
For very large datasets:
For datasets with many examples:
--limit to restrict output sizeax datasets get ds_abc123 --output json > dataset.json
--limit and --offsetDATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "My Dataset") | .id')
ax datasets list --output json | jq '.[] | .id'ax datasets list --output json | jq '.[] | {id, name}'ax datasets get "$DATASET_ID" to confirm before deletingprod, staging, devUse this skill when users want to:
Don't use this skill for:
/arize-graphql-analytics instead)/setup-arize-cli instead)