AI-driven ML experimentation framework for Claude Code
npx claudepluginhub msilverblatt/harness-mlAI-driven ML experimentation — add data, discover features, build models, run experiments via natural language
An Agent-Computer Interface (ACI) for machine learning.
Built on protomcp and the Model Context Protocol.
Note: HarnessML is an active research project exploring what agent-driven ML workflows can look like. It is functional and demonstrates the core ideas, but it has not been stress-tested for production use. Expect rough edges, missing error messages, and workflows that assume a cooperative agent. Feedback and contributions welcome.
Training machine learning models with coding agents is a frustrating experience. They generate endless boilerplate, waste tokens debugging it, and forget why they were running an experiment in the first place. Coding agents want to write code, so every experiment becomes an engineering project instead of a scientific one.
HarnessML fixes this. Built on protomcp, the agent calls structured MCP tools instead of writing training loops. Data ingestion, feature engineering, cross-validation, calibration, ensembling, and diagnostics all run through deterministic tool calls. The experiment lifecycle is a server-defined workflow with dynamic tool visibility — the agent literally cannot skip logging results or start a new experiment without completing the current one, because the tools for those steps aren't visible until the right moment.
Every experiment requires a hypothesis. Every run is fingerprinted and logged. Experiments survive session boundaries.
https://github.com/user-attachments/assets/b5205517-c9c2-403b-8bbb-d0b15a79e807
In Claude Code:
/plugin marketplace add msilverblatt/harness-ml
/plugin install harnessml@msilverblatt-harness-ml
This installs the MCP server, experiment discipline skills, and everything Claude needs to start building models.
git clone https://github.com/msilverblatt/harness-ml.git && cd harness-ml
uv sync
uv run harness-setup
For full setup options, see For Humans.
The agent never writes training loops. It declares intent through MCP tool calls:
# Ingest data
data(action="ingest", path="data/raw/housing.csv")
# Add models
models(action="add", name="xgb_main", type="xgboost", features=[...])
models(action="add", name="lgb_main", type="lightgbm", features=[...])
# Train and evaluate
pipeline(action="run_backtest")
# → CV splits, training, calibration, ensemble, metrics, logging — all automatic
# Compare to previous
pipeline(action="compare_latest")
# → "RMSE: 24,312 → 22,847 (improved)"
# Iterate with discipline
experiments(action="create", hypothesis="Adding neighborhood features captures location premium")
pipeline(action="run_backtest")
# → Isolated overlay — production config untouched until promoted
Every experiment requires a hypothesis. Every run is fingerprinted and logged. Experiments survive session boundaries.
https://github.com/user-attachments/assets/c180d2b2-7ed1-4805-a08a-01b6fb3738ac
examples/titanic · examples/ames-housing · examples/wine-quality
A companion web dashboard that runs alongside the agent, giving you full observability into what it's doing and how the model is performing. See the harness-studio repo for details.
Project vitals, experiment verdict breakdown, primary metric trend with error bars, live MCP activity feed, and a mini pipeline DAG -- all updating in real time as the agent works.
Interactive pipeline topology. Click any node for full config details. Models added by experiments show with dashed borders and EXP badges. Running nodes pulse during training.