Master machine learning foundations - algorithms, preprocessing, feature engineering, and evaluation
Provides core machine learning workflows using scikit-learn for preprocessing, feature engineering, and model evaluation. Use when users need to build ML pipelines, handle missing data, encode features, or perform cross-validation.
/plugin marketplace add pluginagentmarketplace/custom-plugin-machine-learning/plugin install machine-learning-assistant@pluginagentmarketplace-machine-learningThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyMaster the building blocks of machine learning: from raw data to trained models.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
# 1. Load and split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# 2. Create pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', RandomForestClassifier(random_state=42))
])
# 3. Train and evaluate
pipeline.fit(X_train, y_train)
score = pipeline.score(X_test, y_test)
print(f"Accuracy: {score:.4f}")
| Step | Purpose | Implementation |
|---|---|---|
| Missing Values | Handle NaN/None | SimpleImputer(strategy='median') |
| Scaling | Normalize ranges | StandardScaler() or MinMaxScaler() |
| Encoding | Convert categories | OneHotEncoder() or LabelEncoder() |
| Outliers | Remove extremes | IQR method or Z-score |
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
# Define column types
numeric_features = ['age', 'income', 'score']
categorical_features = ['gender', 'city', 'category']
# Create preprocessor
preprocessor = ColumnTransformer([
('num', Pipeline([
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
]), numeric_features),
('cat', Pipeline([
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('encoder', OneHotEncoder(handle_unknown='ignore'))
]), categorical_features)
])
| Technique | Use Case | Example |
|---|---|---|
| Polynomial | Non-linear relationships | PolynomialFeatures(degree=2) |
| Binning | Discretize continuous | KBinsDiscretizer(n_bins=5) |
| Log Transform | Right-skewed data | np.log1p(x) |
| Interaction | Feature combinations | x1 * x2 |
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report
# Cross-validation
cv_scores = cross_val_score(model, X, y, cv=5, scoring='f1_weighted')
print(f"CV F1: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")
# Detailed report
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
| Strategy | When to Use |
|---|---|
KFold | Standard, balanced data |
StratifiedKFold | Imbalanced classification |
TimeSeriesSplit | Temporal data |
GroupKFold | Grouped samples |
# TODO: Create a pipeline that:
# 1. Imputes missing values
# 2. Scales features
# 3. Trains a logistic regression
# TODO: Implement 5-fold stratified CV
# and report mean and std of F1 score
import pytest
import numpy as np
from sklearn.datasets import make_classification
def test_preprocessing_pipeline():
"""Test preprocessing handles missing values."""
X, y = make_classification(n_samples=100, n_features=10)
X[0, 0] = np.nan # Introduce missing value
pipeline = create_preprocessing_pipeline()
X_transformed = pipeline.fit_transform(X)
assert not np.isnan(X_transformed).any()
assert X_transformed.shape[0] == X.shape[0]
def test_no_data_leakage():
"""Verify preprocessing doesn't leak test data."""
X_train, X_test = X[:80], X[80:]
pipeline.fit(X_train)
X_test_transformed = pipeline.transform(X_test)
# Check that test transform uses train statistics
assert pipeline.named_steps['scaler'].mean_ is not None
| Problem | Cause | Solution |
|---|---|---|
NaN in prediction | Missing imputer | Add SimpleImputer to pipeline |
Shape mismatch | Inconsistent features | Use ColumnTransformer |
Memory error | Too many one-hot features | Use max_categories or hashing |
Poor CV variance | Data leakage | Check preprocessing order |
01-ml-fundamentalssupervised-learningVersion: 1.4.0 | Status: Production Ready
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.