From harnessml
Use before creating any experiment. This is the thinking step. If you skip it, you'll run experiments that don't teach you anything.
npx claudepluginhub msilverblatt/harness-ml --plugin harnessmlThis skill uses the workspace's default tool permissions.
Use before creating any experiment. This is the thinking step. If you skip it, you'll run experiments that don't teach you anything.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Use before creating any experiment. This is the thinking step. If you skip it, you'll run experiments that don't teach you anything.
Before you create an experiment, answer these five questions. If you can't answer all five, you're not ready.
Not "try lower learning rate." A question:
The question determines what you look at in the results. Without it, you'll just check whether the aggregate metric went up or down — and learn nothing.
One change per experiment. If you change learning_rate AND add features AND switch calibration, you can't attribute the result to anything.
Exception: Mechanically coupled changes. Lowering learning_rate requires raising n_estimators to compensate — that's one logical change, not two. Document the coupling in the hypothesis.
Be specific. Not "the metric improves." Think about:
This is what makes the result interpretable. If you predicted fold 3 would improve and it did, that's confirmation. If folds you didn't expect to change moved instead, that's a different signal entirely.
Equally important. If the metric gets worse:
Write it in this structure:
Changing [single variable] because [mechanism/reasoning] expecting [specific predicted outcome with magnitude] which would mean [what it tells us about the model/data].
Example:
Changing reg_lambda from 0 to 0.5 on xgb_main because folds 2 and 6 show >0.02 train-test Brier gap, suggesting the model memorizes low-sample folds. Expecting folds 2 and 6 to improve by 0.005-0.01, other folds neutral, overall Brier improves 0.003-0.005. Which would mean the ensemble is currently overfitting on edge cases and uniform regularization is sufficient to address it.
Before your first attempt, sketch a rough plan for the strategy:
This prevents the most common failure: trying one config, seeing it fail, and abandoning the entire strategy. You committed to investigating the question, not to one parameterization of it.
Sometimes the right move is not to run an experiment:
domain-research or diagnosis to generate a real hypothesis.