Guides dataset upload, validation, management, and preparation for DataRobot ML projects using Python SDK. Useful for data quality checks before training.
npx claudepluginhub datarobot-oss/datarobot-agent-skills --plugin datarobot-agent-skillsThis skill uses the workspace's default tool permissions.
This skill provides guidance for preparing and managing data in DataRobot, including uploading datasets, validating data quality, and managing dataset versions.
Guides DataRobot model training workflows: project creation, dataset upload, AutoML configuration, progress monitoring, feature engineering, and model selection using Python SDK.
Automates DataRobot operations via Composio toolkit and Rube MCP. Discovers tools with RUBE_SEARCH_TOOLS, manages connections via RUBE_MANAGE_CONNECTIONS, and executes with RUBE_MULTI_EXECUTE_TOOL.
Create and manage Hugging Face Hub datasets: initialize repos, configure prompts/metadata, stream row updates, and query/transform data with DuckDB SQL.
Share bugs, ideas, or general feedback.
This skill provides guidance for preparing and managing data in DataRobot, including uploading datasets, validating data quality, and managing dataset versions.
Most common use case: Upload and validate a dataset
upload_dataset(file_path, dataset_name) to upload datavalidate_dataset(dataset_id) to check data qualityget_dataset_schema(dataset_id) to review structureExample: "Upload sales_data.csv and check if it's ready for training"
Use this skill when you need to:
User request: "Upload my sales_data.csv file and check if it's ready for training."
Agent workflow:
User request: "Prepare a prediction dataset based on the training data structure from project abc123."
Agent workflow:
This skill guides you to use the DataRobot Python SDK directly. Install the SDK if needed:
pip install datarobot
Use these DataRobot SDK methods for data management:
Dataset Operations:
dr.Dataset.create_from_file(file_path, name) - Upload datasetdr.Dataset.get(dataset_id) - Get dataset detailsdr.Dataset.list() - List all datasetsdataset.row_count - Get row countdataset.column_count - Get column countDataset Information:
dataset.name - Dataset namedataset.id - Dataset IDdataset.created_at - Creation timestampSee the Common Patterns section below for complete examples.
This skill includes executable helper scripts that Claude can run directly:
scripts/upload_dataset.py - Upload a dataset file to DataRobotUsage example:
# Upload dataset
python scripts/upload_dataset.py sales_data.csv "Sales Data Q4 2024"
Claude can run this script directly or use it as reference when writing code.
import datarobot as dr
import os
# Initialize client
client = dr.Client(
token=os.getenv("DATAROBOT_API_TOKEN"),
endpoint=os.getenv("DATAROBOT_ENDPOINT")
)
# Upload dataset
dataset = dr.Dataset.create_from_file(
file_path="sales_data.csv",
name="Sales Data Q4 2024"
)
print(f"Dataset ID: {dataset.id}")
print(f"Rows: {dataset.row_count}, Columns: {dataset.column_count}")
# Get dataset details
dataset_info = dr.Dataset.get(dataset.id)
print(f"Dataset name: {dataset_info.name}")
print(f"Created: {dataset_info.created_at}")
import datarobot as dr
# List all datasets
datasets = dr.Dataset.list()
print(f"Found {len(datasets)} datasets")
# Search for specific dataset
for dataset in datasets:
if "sales" in dataset.name.lower():
print(f"Found: {dataset.name} (ID: {dataset.id})")
# Get specific dataset
dataset = dr.Dataset.get("abc123")
print(f"Dataset: {dataset.name}")
print(f"Size: {dataset.row_count} rows x {dataset.column_count} columns")
Common checks to perform:
Common errors and solutions:
pip install datarobot
import datarobot as dr
import os
client = dr.Client(
token=os.getenv("DATAROBOT_API_TOKEN"),
endpoint=os.getenv("DATAROBOT_ENDPOINT", "https://app.datarobot.com")
)