Skill

arize-datasets

Manages Arize AI datasets via ax CLI: list with pagination, get details, create from CSV/JSON/Parquet files, delete, export data, extract IDs with jq. For Arize ML platform ops.

Python

Bash

ai-ml

cli-tools

npx claudepluginhub arize-ai/arize-claude-code-plugin --plugin arize-platform

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Manage datasets in the Arize AI platform using the `ax` CLI.

SKILL.md

Similar Skills

arize-dataset

Creates, manages, and queries Arize datasets and examples using ax CLI. Handles CRUD, appending examples, exporting data, file-based creation for test sets, golden datasets, and ML evaluations.

2 files

arize-skills

arize-projects

Manages Arize AI projects via ax CLI: lists with pagination/options, resolves names to IDs, gets details, creates, deletes. Includes jq parsing examples for scripting.

arize-platform

arize-toolkit-cli

Manages Arize ML platform resources like models, monitors, prompts, evaluators, dashboards, spaces via arize_toolkit CLI. Lists, creates, updates, deletes resources, configures profiles, handles admin tasks from terminal.

1 file

arize-toolkit

Stats

Parent Repo Stars15

Parent Repo Forks1

Last CommitFeb 25, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Arize AX Datasets

Manage datasets in the Arize AI platform using the ax CLI.

Prerequisites

The user must have:

Arize AX CLI installed (pip install arize-ax-cli)
CLI configured with valid credentials (ax config init)

Core Dataset Commands

List All Datasets

ax datasets list

Options:

--output <format> - Output format: table (default), json, csv, parquet
--profile <name> - Use specific configuration profile
--limit <n> - Limit number of results
--offset <n> - Skip first n results (pagination)

Examples:

# List as table (default)
ax datasets list

# List as JSON
ax datasets list --output json

# List with pagination
ax datasets list --limit 10 --offset 0

# Use production profile
ax datasets list --profile production

Extracting Dataset IDs:

To find a specific dataset ID for use in other operations:

# Get all dataset IDs and names as JSON
ax datasets list --output json | jq '.[] | {id: .id, name: .name}'

# Find a dataset ID by name
ax datasets list --output json | jq -r '.[] | select(.name == "Training Data") | .id'

# Save dataset ID to a variable
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "Training Data") | .id')
echo "Found dataset: $DATASET_ID"

# Use the ID in subsequent commands
ax datasets get "$DATASET_ID"
ax datasets delete "$DATASET_ID"

Without jq (using grep):

# List with grep to find dataset
ax datasets list --output json | grep -A 2 "Training Data" | grep "id"

# More reliable pattern
ax datasets list --output json | grep -B 1 '"name": "Training Data"' | grep "id" | cut -d'"' -f4

Get Dataset Details

Retrieve information about a specific dataset:

ax datasets get <dataset-id>

Options:

--output <format> - Output format
--profile <name> - Configuration profile to use

Examples:

# Get dataset details
ax datasets get ds_abc123xyz

# Get as JSON
ax datasets get ds_abc123xyz --output json

# Get from production environment
ax datasets get ds_abc123xyz --profile production

Create a New Dataset

Create a dataset from a file:

ax datasets create --file <path> [options]

Supported File Formats:

CSV (.csv)
JSON (.json, .jsonl)
Parquet (.parquet)

Options:

--name <name> - Dataset name (required or inferred from filename)
--description <text> - Dataset description
--profile <name> - Configuration profile to use

Examples:

# Create from CSV
ax datasets create --file data.csv --name "Training Data" --description "Production training set"

# Create from JSON
ax datasets create --file examples.json --name "Test Examples"

# Create from Parquet
ax datasets create --file dataset.parquet --name "Large Dataset"

# Use staging profile
ax datasets create --file data.csv --name "Test Data" --profile staging

Delete a Dataset

Remove a dataset from Arize:

ax datasets delete <dataset-id>

Options:

--profile <name> - Configuration profile to use
--yes or -y - Skip confirmation prompt

Examples:

# Delete with confirmation
ax datasets delete ds_abc123xyz

# Delete without confirmation
ax datasets delete ds_abc123xyz --yes

# Delete from production
ax datasets delete ds_abc123xyz --profile production

⚠️ Warning: Deletion is permanent. Always verify the dataset ID before deleting.

Export Dataset Data

Export dataset examples to various formats:

ax datasets get <dataset-id> --output <format>

Export Formats:

json - JSON format
csv - Comma-separated values
parquet - Apache Parquet format

Examples:

# Export to JSON
ax datasets get ds_abc123xyz --output json > dataset.json

# Export to CSV
ax datasets get ds_abc123xyz --output csv > dataset.csv

# Export to Parquet
ax datasets get ds_abc123xyz --output parquet > dataset.parquet

Working with Multiple Profiles

When working across different environments (dev, staging, production):

# List datasets in production
ax datasets list --profile production

# Create dataset in staging
ax datasets create --file test_data.csv --profile staging

# Get dataset from dev environment
ax datasets get ds_dev_123 --profile dev

Pagination for Large Results

For accounts with many datasets, use pagination:

# First page (10 items)
ax datasets list --limit 10 --offset 0

# Second page
ax datasets list --limit 10 --offset 10

# Third page
ax datasets list --limit 10 --offset 20

Common Workflows

Workflow 1: Find Dataset by Name and Get Details

# 1. List all datasets and find the one you want
ax datasets list --output json | jq '.[] | {id: .id, name: .name}'

# 2. Extract the specific dataset ID by name
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "Production Data") | .id')

# 3. Get detailed information about that dataset
ax datasets get "$DATASET_ID"

# 4. Export the dataset if needed
ax datasets get "$DATASET_ID" --output csv > dataset_export.csv

Workflow 2: Create and Verify Dataset

# 1. Create dataset
ax datasets create --file data.csv --name "My Dataset"

# 2. Find the new dataset ID
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "My Dataset") | .id')
echo "Created dataset: $DATASET_ID"

# 3. Verify details
ax datasets get "$DATASET_ID"

Workflow 2: Export, Modify, and Re-upload

# 1. Export existing dataset
ax datasets get ds_abc123 --output csv > dataset.csv

# 2. Modify the CSV file (manual editing)

# 3. Create new version
ax datasets create --file dataset.csv --name "Updated Dataset v2"

Workflow 3: Migrate Dataset Between Environments

# 1. Export from production
ax datasets get ds_prod_123 --profile production --output json > prod_data.json

# 2. Import to staging
ax datasets create --file prod_data.json --name "Production Copy" --profile staging

Workflow 4: Cleanup Old Datasets

# 1. List all datasets
ax datasets list --output json > all_datasets.json

# 2. Review and identify datasets to delete (manual review)

# 3. Delete old datasets
ax datasets delete ds_old_001 --yes
ax datasets delete ds_old_002 --yes

Output Format Examples

Table Format (Default)

Human-readable table with columns for ID, Name, Created, and Status.

JSON Format

Structured JSON with full dataset metadata:

{
  "id": "ds_abc123xyz",
  "name": "Training Data",
  "description": "Production training set",
  "created_at": "2024-01-15T10:30:00Z",
  "num_examples": 1000,
  "size_bytes": 52428800
}

CSV Format

Comma-separated values, useful for importing into spreadsheets or pandas.

Parquet Format

Efficient columnar format, ideal for large datasets and data processing.

Troubleshooting

"Dataset not found"

Verify dataset ID: ax datasets list
Check you're using the correct profile: ax config show
Ensure the dataset exists in the current space/project

"Permission denied" or "Unauthorized"

Check API key is valid: ax config show --expand
Verify the key has dataset permissions in Arize
Try re-authenticating: ax config init

"File format not supported"

Supported formats are CSV, JSON (including JSONL), and Parquet. Check:

File extension is correct
File is not corrupted
File content matches the extension

Large dataset creation fails

For very large datasets:

Check file size and network stability
Try breaking into smaller chunks
Use Parquet format for better compression
Consider using the Arize Python SDK for programmatic uploads

Output is too large

For datasets with many examples:

Use --limit to restrict output size

Export to file instead of viewing in terminal:

ax datasets get ds_abc123 --output json > dataset.json

Use pagination with --limit and --offset

Tips

Extract dataset IDs by name:

DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "My Dataset") | .id')

Use JSON output for scripting: ax datasets list --output json | jq '.[] | .id'
List IDs and names together: ax datasets list --output json | jq '.[] | {id, name}'
Pipe to files for export: Always redirect large outputs to files
Verify before delete: Use ax datasets get "$DATASET_ID" to confirm before deleting
Profile naming: Use descriptive names like prod, staging, dev
Save IDs to variables: Store dataset IDs in shell variables for reuse in scripts
Check limits: Some operations may have rate limits or quotas

Next Steps

View dataset details in Arize UI: https://app.arize.com
Use datasets in experiments and evaluations
Integrate with Arize Python SDK for programmatic access
Set up CI/CD pipelines using the CLI

When to Use This Skill

Use this skill when users want to:

✅ List all datasets in their Arize account
✅ Get details about a specific dataset
✅ Create a new dataset from a local file
✅ Delete datasets they no longer need
✅ Export dataset data to different formats
✅ Work with datasets across multiple environments
✅ Troubleshoot dataset-related CLI issues

Don't use this skill for:

❌ GraphQL queries (use /arize-graphql-analytics instead)
❌ Installing/configuring the CLI (use /setup-arize-cli instead)
❌ Managing projects, models, or other Arize resources beyond datasets