Build production-ready classification and regression models with hyperparameter tuning
Build production-ready classification and regression models with hyperparameter tuning. Use when creating ML models to compare algorithms, tune parameters, and handle class imbalance.
/plugin marketplace add pluginagentmarketplace/custom-plugin-machine-learning/plugin install machine-learning-assistant@pluginagentmarketplace-machine-learningThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyBuild, tune, and evaluate classification and regression models.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, train_test_split
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Evaluate
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='f1_weighted')
print(f"CV F1: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")
print(f"Test Accuracy: {model.score(X_test, y_test):.4f}")
| Algorithm | Best For | Complexity |
|---|---|---|
| Logistic Regression | Baseline, interpretable | O(n*d) |
| Random Forest | Tabular, general | O(ndtrees) |
| XGBoost | Competitions, accuracy | O(ndtrees) |
| SVM | High-dim, small data | O(n²) |
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
classifiers = {
'lr': LogisticRegression(max_iter=1000, class_weight='balanced'),
'rf': RandomForestClassifier(n_estimators=100, class_weight='balanced'),
'xgb': XGBClassifier(n_estimators=100, eval_metric='logloss')
}
| Algorithm | Best For | Key Param |
|---|---|---|
| Ridge | Multicollinearity | alpha |
| Lasso | Feature selection | alpha |
| Random Forest | Non-linear | n_estimators |
| XGBoost | Best accuracy | learning_rate |
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
param_dist = {
'n_estimators': randint(50, 300),
'max_depth': randint(3, 15),
'min_samples_split': randint(2, 20),
'min_samples_leaf': randint(1, 10)
}
search = RandomizedSearchCV(
RandomForestClassifier(random_state=42),
param_dist,
n_iter=50,
cv=5,
scoring='f1_weighted',
n_jobs=-1,
random_state=42
)
search.fit(X_train, y_train)
print(f"Best params: {search.best_params_}")
print(f"Best CV score: {search.best_score_:.4f}")
| Technique | Implementation |
|---|---|
| Class Weights | class_weight='balanced' |
| SMOTE | imblearn.over_sampling.SMOTE() |
| Threshold Tuning | Adjust prediction threshold |
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline
pipeline = Pipeline([
('smote', SMOTE(random_state=42)),
('classifier', RandomForestClassifier())
])
from sklearn.model_selection import cross_validate
import pandas as pd
def compare_models(models, X, y, cv=5):
results = []
for name, model in models.items():
cv_results = cross_validate(
model, X, y, cv=cv,
scoring=['accuracy', 'f1_weighted', 'roc_auc_ovr_weighted'],
return_train_score=True
)
results.append({
'model': name,
'train_acc': cv_results['train_accuracy'].mean(),
'test_acc': cv_results['test_accuracy'].mean(),
'test_f1': cv_results['test_f1_weighted'].mean(),
'test_auc': cv_results['test_roc_auc_ovr_weighted'].mean()
})
return pd.DataFrame(results).round(4)
# TODO: Compare 3 different classifiers using cross-validation
# Report F1 score for each
# TODO: Use RandomizedSearchCV to tune XGBoost
# Find optimal n_estimators, max_depth, learning_rate
import pytest
from sklearn.datasets import make_classification
def test_classifier_trains():
"""Test classifier can fit and predict."""
X, y = make_classification(n_samples=100, random_state=42)
model = get_classifier()
model.fit(X[:80], y[:80])
predictions = model.predict(X[80:])
assert len(predictions) == 20
assert set(predictions).issubset({0, 1})
def test_handles_imbalance():
"""Test model handles imbalanced classes."""
X, y = make_classification(n_samples=100, weights=[0.9, 0.1])
model = get_balanced_classifier()
model.fit(X, y)
predictions = model.predict(X)
# Should predict both classes
assert len(set(predictions)) == 2
| Problem | Cause | Solution |
|---|---|---|
| Overfitting | Model too complex | Reduce depth, add regularization |
| Underfitting | Model too simple | Increase complexity |
| Class imbalance | Skewed data | Use SMOTE or class weights |
| Slow training | Large data | Use LightGBM, reduce estimators |
02-supervised-learningml-fundamentalsclusteringVersion: 1.4.0 | Status: Production Ready
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.