From harnessml
Use when executing an experiment. Load `experiment-design` first to ensure you've thought through the hypothesis. This skill covers the mechanics and the discipline of execution.
npx claudepluginhub msilverblatt/harness-ml --plugin harnessmlThis skill uses the workspace's default tool permissions.
Use when executing an experiment. Load `experiment-design` first to ensure you've thought through the hypothesis. This skill covers the mechanics and the discipline of execution.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Use when executing an experiment. Load experiment-design first to ensure you've thought through the hypothesis. This skill covers the mechanics and the discipline of execution.
experiments(action="create",
description="...",
hypothesis="..."
)
The hypothesis is required. It should follow the structure from experiment-design: what you're changing, why, what you expect, and what it would mean.
For follow-up experiments within the same strategy, link to the parent:
experiments(action="create",
description="...",
hypothesis="...",
parent_id="exp-001",
branching_reason="Adjusting parameterization based on fold 3 diagnosis"
)
experiments(action="write_overlay",
experiment_id="exp-001",
overlay={"models.xgb_main.params.reg_lambda": 0.5}
)
Overlays modify config in isolation. Production config is never touched until you explicitly promote.
experiments(action="run",
experiment_id="exp-001",
primary_metric="brier"
)
Or for quick iterations within a strategy:
experiments(action="quick_run",
description="...",
hypothesis="...",
overlay={...}
)
This is the most important step. Do not skip it.
Load the diagnosis skill. Read the results with intention:
pipeline(action="diagnostics")
pipeline(action="compare_latest")
Do not move on until you can explain what happened and why.
experiments(action="log_result",
experiment_id="exp-001",
conclusion="...",
verdict="...",
metrics={...},
baseline_metrics={...}
)
The conclusion is about what you learned, not just what the numbers say. A good conclusion:
A bad conclusion restates the metrics.
Also record the learning in the project notebook:
notebook(action="write", type="finding", content="...", experiment_id="exp-001")
The experiment journal captures the structured result. The notebook captures the insight — what this means for the bigger picture. If your theory of the target changed, write a theory entry too.
Every experiment ends with one of:
When pursuing a strategy (e.g., "add regularization", "try neural network", "engineer temporal features"):
The experiment journal should show a trail of attempts with diagnosis between each one, not isolated experiments.
When an experiment improves the model AND you understand why:
experiments(action="promote", experiment_id="exp-001", primary_metric="brier")
Promote when:
Do not promote just because the primary metric improved. An unexplained improvement is a time bomb.
| Verdict | Meaning |
|---|---|
| keep | Improvement is understood and consistent. Promote. |
| partial | Directional improvement or improvement on a subset. Worth logging for future combination. |
| revert | After exhaustive attempts, the strategy is structurally flawed. Document why. |
| inconclusive | Ambiguous results. May revisit with more information. |
Verdicts are consequences of understanding, not goals. You don't run experiments to reach a verdict — you run them to learn. The verdict follows naturally from what you learned.