From harnessml
Use when evaluating your model ensemble — what's in it, what's missing, and whether the models are actually providing diverse perspectives on the problem.
npx claudepluginhub msilverblatt/harness-ml --plugin harnessmlThis skill uses the workspace's default tool permissions.
Use when evaluating your model ensemble — what's in it, what's missing, and whether the models are actually providing diverse perspectives on the problem.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Use when evaluating your model ensemble — what's in it, what's missing, and whether the models are actually providing diverse perspectives on the problem.
An ensemble of 6 models that all make the same predictions is effectively one model. The power of ensembling comes from combining models that make different mistakes — when one model is wrong, others compensate.
Diversity comes from three sources:
Source #1 is the strongest. Source #3 is the weakest.
Inductive bias: Monotonic relationships, additive effects.
Strengths: Stable, interpretable, calibrates well, handles high-dimensional sparse data. Often the best-calibrated model in the ensemble even when not the most discriminative.
Weaknesses: Cannot learn interactions or non-linear relationships without explicit feature engineering.
When they help the ensemble: They anchor predictions in linear signal that trees might overfit. If your linear model has a high meta-learner coefficient, it means the trees are noisy on cases where the linear signal is clear.
Inductive bias: Axis-aligned splits, interaction detection, sequential error correction.
Strengths: Handle mixed types, learn interactions natively, robust to outliers, built-in feature importance.
Why try all three:
Two of the three often end up with >0.95 correlation and should be differentiated with different feature sets. But which two varies by dataset — try all three.
Inductive bias: Bagging reduces variance through decorrelated trees.
Strengths: Naturally resistant to overfitting, gives calibrated probability estimates, useful as a diversity anchor against boosted models.
When it helps the ensemble: Its errors are decorrelated from boosted trees because it doesn't sequentially correct — it averages. This is exactly the diversity ensembles need.
Inductive bias: Learned representations, smooth decision boundaries.
Strengths: Can learn complex non-linear relationships, representation learning, flexible architecture.
Weaknesses: Need more data, harder to regularize, less interpretable, training sensitive to hyperparameters.
When they help the ensemble: On datasets with enough rows (10k+), neural networks learn different representations than trees. If your neural net has low correlation with tree models, it's capturing genuinely different signal.
SVM: Different decision boundary geometry. Useful when classes are separable in high-dimensional space.
HistGBM: sklearn's histogram-based gradient boosting. Similar to LightGBM but integrates with sklearn pipeline. Good for when you want a second tree-based model with different implementation details.
GAM (Generalized Additive Model): Learns smooth non-linear relationships for each feature independently. Highly interpretable. Useful as a "step up from linear" that doesn't overfit like trees can.
NGBoost: Probabilistic predictions with uncertainty. If your use case values uncertainty quantification, NGBoost provides it natively.
features(action="diversity")
pipeline(action="diagnostics")
Remove each model from the ensemble one at a time and measure the impact. Models whose removal hurts performance are carrying their weight. Models whose removal is neutral or positive should be removed or replaced.
Don't dismiss a model family after one configuration. A logistic regression with the wrong features looks terrible. A logistic regression with well-engineered features can be the best-calibrated model in the ensemble.
Minimum fair trial:
The most powerful diversification lever. Give different models different feature sets:
models(action="update", name="lr_main", features=[...]) # linear-friendly
models(action="update", name="xgb_main", features=[...]) # all features
models(action="update", name="mlp_main", features=[...]) # normalized numerics
This creates genuine diversity because models literally see different data.