Scikit-learn model training skill with cross-validation, hyperparameter tuning, pipeline construction, and model serialization. Enables automated ML model development using scikit-learn's comprehensive toolkit.
Trains scikit-learn models with cross-validation, hyperparameter tuning, pipeline construction, and model serialization.
npx claudepluginhub a5c-ai/babysitterThis skill is limited to using the following tools:
README.mdTrain machine learning models using scikit-learn with cross-validation, hyperparameter tuning, and pipeline construction.
This skill provides comprehensive capabilities for training machine learning models using scikit-learn. It supports the full model development workflow from data preprocessing through model training, evaluation, and serialization.
pip install scikit-learn>=1.0.0 joblib pandas numpy
# For ONNX export
pip install skl2onnx onnxruntime
# For additional preprocessing
pip install category_encoders imbalanced-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report
import joblib
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Train model
model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
model.fit(X_train, y_train)
# Cross-validation
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
print(f"CV Accuracy: {cv_scores.mean():.3f} (+/- {cv_scores.std() * 2:.3f})")
# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
# Save model
joblib.dump(model, 'model.joblib')
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
# Define preprocessing
numeric_features = ['age', 'income', 'score']
categorical_features = ['category', 'region']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)
]
)
# Create full pipeline
pipeline = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', GradientBoostingClassifier())
])
# Train
pipeline.fit(X_train, y_train)
from sklearn.model_selection import GridSearchCV
# Define parameter grid
param_grid = {
'classifier__n_estimators': [50, 100, 200],
'classifier__max_depth': [3, 5, 10, None],
'classifier__learning_rate': [0.01, 0.1, 0.2]
}
# Grid search
grid_search = GridSearchCV(
pipeline,
param_grid,
cv=5,
scoring='f1_weighted',
n_jobs=-1,
verbose=2
)
grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best score: {grid_search.best_score_:.3f}")
# Get best model
best_model = grid_search.best_estimator_
from sklearn.feature_selection import SelectFromModel, RFE
from sklearn.ensemble import RandomForestClassifier
# Method 1: SelectFromModel
selector = SelectFromModel(
RandomForestClassifier(n_estimators=100, random_state=42),
threshold='median'
)
X_selected = selector.fit_transform(X_train, y_train)
# Method 2: Recursive Feature Elimination
rfe = RFE(
estimator=RandomForestClassifier(n_estimators=100, random_state=42),
n_features_to_select=10,
step=1
)
X_rfe = rfe.fit_transform(X_train, y_train)
# Get selected features
selected_features = X.columns[rfe.support_].tolist()
const sklearnTrainingTask = defineTask({
name: 'sklearn-model-training',
description: 'Train a scikit-learn model with cross-validation',
inputs: {
modelType: { type: 'string', required: true },
trainDataPath: { type: 'string', required: true },
targetColumn: { type: 'string', required: true },
hyperparameters: { type: 'object', default: {} },
cvFolds: { type: 'number', default: 5 },
scoringMetric: { type: 'string', default: 'accuracy' }
},
outputs: {
modelPath: { type: 'string' },
cvScores: { type: 'array' },
bestScore: { type: 'number' },
featureImportances: { type: 'object' }
},
async run(inputs, taskCtx) {
return {
kind: 'skill',
title: `Train ${inputs.modelType} model`,
skill: {
name: 'sklearn-model-trainer',
context: {
operation: 'train_with_cv',
modelType: inputs.modelType,
trainDataPath: inputs.trainDataPath,
targetColumn: inputs.targetColumn,
hyperparameters: inputs.hyperparameters,
cvFolds: inputs.cvFolds,
scoringMetric: inputs.scoringMetric
}
},
io: {
inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
}
};
}
});
| Model | Use Case | Pros | Cons |
|---|---|---|---|
| LogisticRegression | Binary/multiclass, interpretable | Fast, interpretable | Linear boundary |
| RandomForestClassifier | General purpose | Robust, handles nonlinearity | Can overfit |
| GradientBoostingClassifier | High accuracy needed | State-of-art performance | Slower training |
| SVC | Small/medium datasets | Effective in high dimensions | Slow on large data |
| XGBClassifier | Competition/production | Fast, accurate | Many hyperparameters |
| Model | Use Case | Pros | Cons |
|---|---|---|---|
| LinearRegression | Baseline, interpretable | Simple, fast | Assumes linearity |
| Ridge/Lasso | Regularization needed | Prevents overfitting | Still linear |
| RandomForestRegressor | General purpose | Handles nonlinearity | Can overfit |
| GradientBoostingRegressor | High accuracy | Excellent performance | Slower |
| SVR | Small datasets | Robust to outliers | Slow scaling |
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.