Help us improve
Share bugs, ideas, or general feedback.
From dominodatalab
Work with Domino Datasets - high-performance, versioned filesystem storage. Covers dataset creation, snapshots for versioning, sharing across projects, mounting paths (/domino/datasets/), and performance optimization. Use when managing data storage, creating reproducible data versions, or sharing data between projects.
npx claudepluginhub anthropics/claude-plugins-official --plugin dominodatalabHow this skill is triggered — by the user, by Claude, or both
Slash command
/dominodatalab:datasetsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill helps users work with Domino Datasets - high-performance, versioned filesystem storage for data science projects.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Share bugs, ideas, or general feedback.
This skill helps users work with Domino Datasets - high-performance, versioned filesystem storage for data science projects.
Activate this skill when users want to:
A Domino Dataset is:
training-data)from domino import Domino
domino = Domino("project-owner/project-name")
# Create a new dataset
dataset = domino.datasets_create(
name="training-data",
description="Training data for classification model"
)
Dataset paths differ based on your project type. Domino has two project types with different mount structures.
DFS projects use /domino as the root:
/domino
|--/datasets
|--/local <== Local datasets and snapshots
|--/clapton <== Read-write dataset for owner and editor, read-only for reader
|--/mingus <== Read-write dataset for owner and editor, read-only for reader
|--/snapshots <== Snapshot folder organized by dataset
|--/clapton <== Read-write for owner and editor, read-only for reader
|--/tag1 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot number
|--/2
|--/mingus
|--/tag2
|--/1
|--/2
|--/ella <== Read-write shared dataset for owner and editor, Read-only for reader
|--/davis <== Read-write shared dataset for owner and editor, Read-only for reader
|--/snapshots <== Shared datasets snapshots organized by dataset
|--/ella <== Read-write for owner and editor, read-only for reader
|--/tag3 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot number
|--/2
|--/davis
|--/tag4
|--/1
|--/2
| Dataset Type | Path |
|---|---|
| Local datasets | /domino/datasets/local/{dataset-name}/ |
| Local snapshots | /domino/datasets/local/snapshots/{dataset-name}/{tag-or-number}/ |
| Shared datasets | /domino/datasets/{dataset-name}/ |
| Shared snapshots | /domino/datasets/snapshots/{dataset-name}/{tag-or-number}/ |
Git-based projects use /mnt as the root:
/mnt
|--/data <== Local datasets and snapshots
|--/clapton <== Read-write dataset for owner and editor, read-only for reader
|--/mingus <== Read-write dataset for owner and editor, read-only for reader
|--/snapshots <== Snapshot folder organized by dataset
|--/clapton <== Read-write for owner and editor, read-only for reader
|--/tag1 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot number
|--/2
|--/mingus
|--/tag2
|--/1
|--/2
|--/imported
|--/data
|--/ella <== Read-write shared dataset for owner and editor, read-only for reader
|--/davis <== Read-write shared dataset for owner and editor, read-only for reader
|--/snapshots <== Shared dataset snapshots organized by dataset
|--/ella <== Read-write for owner and editor, read-only for reader
|--/tag3 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot number
|--/2
|--/davis
|--/tag4
|--/1
|--/2
| Dataset Type | Path |
|---|---|
| Local datasets | /mnt/data/{dataset-name}/ |
| Local snapshots | /mnt/data/snapshots/{dataset-name}/{tag-or-number}/ |
| Shared datasets | /mnt/imported/data/{dataset-name}/ |
| Shared snapshots | /mnt/imported/data/snapshots/{dataset-name}/{tag-or-number}/ |
Check which paths exist in your execution:
import os
if os.path.exists("/domino/datasets"):
print("DFS Project")
dataset_root = "/domino/datasets/local"
elif os.path.exists("/mnt/data"):
print("Git-Based Project")
dataset_root = "/mnt/data"
Both project types follow the same permission model:
import pandas as pd
# Git-Based Project
df = pd.read_csv("/mnt/data/training-data/customers.csv")
# DFS Project
df = pd.read_csv("/domino/datasets/local/training-data/customers.csv")
# List files
import os
files = os.listdir("/mnt/data/training-data/") # Git-Based
files = os.listdir("/domino/datasets/local/training-data/") # DFS
# For large uploads, use CLI
domino upload /local/path/to/data /mnt/data/training-data/
import shutil
# Copy from local to dataset
shutil.copy("local_file.csv", "/mnt/data/training-data/")
# Write directly
df.to_csv("/mnt/data/training-data/processed.csv", index=False)
A snapshot is a read-only, immutable version of your dataset at a point in time. Use snapshots for:
# Via Python SDK
snapshot = domino.datasets_snapshot(
dataset_name="training-data",
tag="v1.0"
)
Or via UI:
v1.0, production)# Latest snapshot
df = pd.read_csv("/mnt/data/training-data/data.csv")
# Specific tagged snapshot
df = pd.read_csv("/mnt/data/training-data@v1.0/data.csv")
Tags provide friendly names for snapshots:
production: Current production datav1.0, v2.0: Version numbers2024-01-15: Date-based tagsTags can be moved to different snapshots:
# Move 'production' tag to latest snapshot
domino.datasets_tag(
dataset_name="training-data",
snapshot_id="snapshot-123",
tag="production"
)
# Import dataset from another project
# Configured in project settings
df = pd.read_csv("/mnt/data/shared-dataset/data.csv")
| Data Type | Storage |
|---|---|
| Large training data | Domino Dataset |
| Model artifacts | /mnt/artifacts/ |
| Code | Git/Project files |
| Temporary files | /tmp/ |
/mnt/data/my-dataset/
├── raw/
│ ├── customers.csv
│ └── transactions.csv
├── processed/
│ ├── features.parquet
│ └── labels.parquet
└── metadata/
└── schema.json
# Parquet for tabular data (faster, smaller)
df.to_parquet("/mnt/data/dataset/data.parquet")
# Feather for pandas DataFrames
df.to_feather("/mnt/data/dataset/data.feather")
# HDF5 for numerical arrays
import h5py
with h5py.File("/mnt/data/dataset/data.h5", "w") as f:
f.create_dataset("features", data=features)
Include README and schema:
# Write metadata
metadata = {
"created": "2024-01-15",
"source": "Customer database",
"columns": {"id": "int", "name": "string", "value": "float"}
}
with open("/mnt/data/dataset/metadata.json", "w") as f:
json.dump(metadata, f)
# Create snapshot before processing
domino.datasets_snapshot(
dataset_name="training-data",
tag="pre-processing"
)
# Then modify data
process_data()
# Read in chunks
chunks = pd.read_csv(
"/mnt/data/dataset/large_file.csv",
chunksize=100000
)
for chunk in chunks:
process(chunk)
import dask.dataframe as dd
# Read without loading into memory
df = dd.read_parquet("/mnt/data/dataset/large_data.parquet")
# Process lazily
result = df.groupby("category").mean().compute()
import numpy as np
# Memory-map large arrays
data = np.memmap(
"/mnt/data/dataset/features.dat",
dtype='float32',
mode='r',
shape=(1000000, 100)
)