From dataset-splitter
Splits datasets like CSV into training, validation, and test sets with ratios and stratification using Python for ML workflows. Activates on split dataset requests.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin dataset-splitterThis skill is limited to using the following tools:
Split datasets into training, validation, and testing sets with configurable ratios and stratification options.
Selects and implements train/validation/test split strategies based on data characteristics like time, groups, imbalance, and size. Guides sklearn usage for model evaluation frameworks.
Parse, transform, clean, and analyze CSV files: auto-detect formats, filter/sort/merge/pivot, generate stats/outliers, with Python (pandas) and JavaScript examples.
Create and manage Hugging Face Hub datasets: initialize repos, configure prompts/metadata, stream row updates, and query/transform data with DuckDB SQL.
Share bugs, ideas, or general feedback.
Split datasets into training, validation, and testing sets with configurable ratios and stratification options.
This skill automates the process of dividing a dataset into subsets for training, validating, and testing machine learning models. It ensures proper data preparation and facilitates robust model evaluation.
This skill activates when you need to:
User request: "Split the data in 'my_data.csv' into 70% training, 15% validation, and 15% testing sets."
The skill will:
User request: "Create a train-test split of 'large_dataset.csv' with an 80/20 ratio."
The skill will:
This skill can be integrated with other data processing and model training tools within the Claude Code ecosystem to create a complete machine learning workflow.
The skill produces structured output relevant to the task.