Skill

version-ml-data

Version ML datasets using DVC with remote storage backends, build reproducible data pipelines, and track data lineage alongside Git. Use for large datasets, experiment reproducibility, and compliance auditing.

Git

Python

ai-ml

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agent-almanac:version-ml-data

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteEditBashGrepGlob

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> See [Extended Examples](references/EXAMPLES.md) for complete configuration files and templates.

Supporting Files

references/EXAMPLES.md

SKILL.md

331 lines · ~2.8k tokens

Stats

LanguageR

Stars21

Forks3

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

version-ml-data

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

version-ml-data

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Version ML Data

When to Use

Inputs

Procedure

Step 1: Initialize DVC in Git Repository

Step 2: Configure Remote Storage Backend

Step 3: Version Datasets with DVC

Step 4: Build Reproducible Data Pipelines

Step 5: Share and Reproduce Data Versions

Step 6: Integrate with MLflow and CI/CD

Validation

Common Pitfalls

Related Skills

Similar Skills

Version ML Data

When to Use

Inputs

Procedure

Step 1: Initialize DVC in Git Repository

Step 2: Configure Remote Storage Backend

Step 3: Version Datasets with DVC

Step 4: Build Reproducible Data Pipelines

Step 5: Share and Reproduce Data Versions

Step 6: Integrate with MLflow and CI/CD

Validation

Common Pitfalls

Related Skills

Similar Skills