Skill

ml-best-practices

Guides ML workflows for clustering, classification, regression, time series forecasting, statistical testing, model comparison, and data analysis. Structures notebooks with markdown analysis cells, visualizations, and summaries.

ai-ml

data-engineering

npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin data-agent-kit-starter-pack

Tool Access

This skill uses the workspace's default tool permissions.

Preview

I want to read a story about the data, not just run code. Ensure every code cell

SKILL.md

Similar Skills

agent-introspection-debugging

169.4k

Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.

everything-claude-code

canary-watch

169.4k

Monitors deployed URLs for regressions in HTTP status, console errors, performance metrics, content, network, and APIs after deploys, merges, or upgrades.

everything-claude-code

frontend-patterns

169.4k

Provides React and Next.js patterns for component composition, compound components, state management, data fetching, performance optimization, forms, routing, and accessible UIs.

everything-claude-code

Stats

Stars30

Forks2

Last CommitApr 14, 2026

Actions

View Source View Plugin View on GitHub View README

ML Best Practices

I want to read a story about the data, not just run code. Ensure every code cell is followed by a markdown cell analyzing the results. End the notebook with a summary comprehensively answering the prompt.

If there is a good match between the user's request and a corresponding example plan, then adapt the example plan to fully answer the user's request:

Clustering:

Identify distinct groups based on their features.

Understand the schema and field descriptions.
Visualize features referenced in the prompt (e.g., with histograms, scatterplots).
Transform dates into timestamps.
Before applying encoders, check if the dataset already contains pre-encoded features and prefer existing numerical representations.
Prefer to keep data instead of dropping it when possible.
Transform ordinal data with an ordinal encoder.
Transform nominal data with a one hot encoder.
Standardize numerical features.
Perform clustering with a range of values, and collect the silhouette score.
Choose the optimal number of clusters based on the silhouette score.
Use dimensionality reduction (e.g., PCA) to project the data into two dimensions.
Scatterplot the samples in two dimensions with cluster labels as the hue.
Scatterplot the samples in two dimensions with a discrete feature as the hue.
Describe the clusters in text by feature distributions or typical feature values.
Conclusion: comprehensively answer the prompt in a final markdown cell.

Time Series Forecasting:

Develop a predictive model to estimate future values based on historical trends. How might different modeling approaches impact the prediction accuracy?

Understand the schema and field descriptions.
Visualize the target feature over time at a reasonable granularity.
Always perform a chronological split on the data to create training, validation, and test sets.
Are there seasonal trends?
Test for stationarity.
Discuss possible modeling approaches. How might different modeling approaches impact the prediction accuracy?
Train two time series forecasting models to predict the target feature. Use previous seasonality and stationarity information as model hyperparameters.
Predict the target feature for the training and validation sets.
Optionally, hypertune models with the validation set.
Visualize the actual and predicted target feature vs time for each model on the training and validation sets.
Evaluate the validation performance with error metrics.
Select a model.
Retrain the selected model on the test and validation sets.
Predict the test values with the selected model.
Visualize the average target feature and the predicted test values.
Conclusion: comprehensively answer the prompt in a final markdown cell.

Exploratory Data Analysis / Anomaly Detection:

Identify and describe any outliers, unusual patterns, or significant trends observed in the data. Provide visualizations to support your findings.

Understand the schema and field descriptions.
Visualize the target feature distribution in a way that shows outliers.
Identify and describe any outliers in the target feature.
Visualize relationships between the target feature and other features.
Identify and describe unusual patterns or significant trends.
Visualize patterns and trends.
Conclusion: comprehensively answer the prompt in a final markdown cell.

Classification:

Given the data, can we classify by the target feature?

Understand the schema and field descriptions.
Identify rows that don't make sense. How many are there and what do they contain?
Identify rows without a target value. How many are there and what do they contain?
Drop rows that don't match the schema or don't have the target value (if it is reasonable to do so).
Split data into training, validation, and test sets.
Create features to represent when data are missing, if this is meaningful.
Handle missing data. Prefer to keep data instead of dropping it when possible.
Before applying encoders, check if the dataset already contains pre-encoded features and prefer existing numerical representations.
Transform ordinal data with an ordinal encoder.
Transform nominal data with a one hot encoder.
Standardize numerical features.
Train multiple models.
If there is evidence of overfitting, regularize and retrain the model.
If there is evidence of underfitting, consider adding or engineering features.
Evaluate the models.
Create confusion matrices.
Conclusion: comprehensively answer the prompt in a final markdown cell.

Regression:

Predict the continuous valued target feature.

Understand the schema and field descriptions.
Identify rows that don't make sense. How many are there and what do they contain?
Identify rows without a target value. How many are there and what do they contain?
Develop an understanding of the data and determine how to handle missing values. This should make sense in the business context.
Identify any potential sources of group leakage. Aggregate where appropriate to prevent this.
Visualize target feature.
Split data into training, validation, and test sets.
Handle missing data. Prefer to keep data instead of dropping it when possible.
Before applying encoders, check if the dataset already contains pre-encoded features and prefer existing numerical representations.
Transform ordinal data with an ordinal encoder.
Transform nominal data with a one hot encoder. Restrict high cardinality categorical features to a tractable size.
Standardize numerical features.
Train multiple models.
Visualize the actual vs predicted values on training and validation data.
If there is evidence of overfitting, regularize and retrain the model.
If there is evidence of underfitting, consider adding or engineering features.
Evaluate the model error.
Conclusion: comprehensively answer the prompt in a final markdown cell.

Comparing ML Models:

Evaluate and compare multiple models to determine which is most suitable for production based on predictive power, robustness, and viability.

Understand the schema and align metrics with business goals (e.g., cost of false positives vs. false negatives).
Establish baselines: define a naive baseline (majority class/mean) and a simple ML baseline (e.g., Logistic/Linear Regression).
Ensure rigorous validation: use identical, fixed data splits for all models and perform $k$-fold cross-validation.
If data is temporal, use chronological splits for validation.
Select and report metrics beyond accuracy (e.g., F1-Score, PR-AUC, MAE, RMSE) that reflect business impact.
Use bootstrapping to calculate 95% confidence intervals for key metrics to determine statistical significance.
Perform slice-based error analysis: evaluate model performance across key subpopulations and demographics to identify bias or specific failure modes.
Inspect and compare confusion matrices, residual plots, and calibration curves.
Evaluate operational trade-offs: consider inference latency, training time, compute cost, and model size.
Assess interpretability using tools like SHAP or LIME where transparency is required.
Conclusion: Recommend the optimal model for the specific use case, justifying the choice with both performance and production viability.

No match:

Understand the schema and field descriptions.
Identify rows that don't make sense. How many are there and what do they contain?
Identify rows without a target value. How many are there and what do they contain?
Drop rows that don't match the schema or don't have the target value (if it is reasonable to do so).
Create features to represent when data are missing, if this is meaningful.
Handle missing data. Prefer to keep data instead of dropping it when possible.
Before applying encoders, check if the dataset already contains pre-encoded features and prefer existing numerical representations.
Transform ordinal data with an ordinal encoder.
Transform nominal data with a one hot encoder.
Standardize numerical features.
Conclusion: comprehensively answer the prompt in a final markdown cell.

Essential ML Practices

[!IMPORTANT] ALWAYS follow these ML practices

Strict Featurization Ordering: For supervised learning ALWAYS split the dataset into training and test data BEFORE fitting preprocessing pipelines (e.g. scaling, encoding). Fit the pipelines on the training data and test data independently.
Handling Missing or NULL Values: ALWAYS check for and handle missing and NULL values. First, analyze their frequency. Then, decide whether to keep them, drop them or impute them with a contextually appropriate value, and explain your reasoning.