Search everything...

Marketplace

harness-ml

AI-driven ML experimentation framework for Claude Code

npx claudepluginhub msilverblatt/harness-ml

README

View full README on GitHub

1 Plugin

harnessml

AI-driven ML experimentation — add data, discover features, build models, run experiments via natural language

1mo

v0.2.0

Related Marketplaces

mempalace

50.6K

0plugins

No description available.

payload-marketplace

42.1K

0plugins

Development marketplace for Payload

antigravity-awesome-skills

35.9K

0plugins

Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.

Stats

Plugins1

Stars2

UpdatedMar 24, 2026

Links

View on GitHub View Marketplace JSON

harness-ml

AI-driven ML experimentation framework for Claude Code

npx claudepluginhub msilverblatt/harness-ml

README

An Agent-Computer Interface (ACI) for machine learning.
Built on protomcp and the Model Context Protocol.

Claude Code for Machine Learning

Note: HarnessML is an active research project exploring what agent-driven ML workflows can look like. It is functional and demonstrates the core ideas, but it has not been stress-tested for production use. Expect rough edges, missing error messages, and workflows that assume a cooperative agent. Feedback and contributions welcome.

Training machine learning models with coding agents is a frustrating experience. They generate endless boilerplate, waste tokens debugging it, and forget why they were running an experiment in the first place. Coding agents want to write code, so every experiment becomes an engineering project instead of a scientific one.

HarnessML fixes this. Built on protomcp, the agent calls structured MCP tools instead of writing training loops. Data ingestion, feature engineering, cross-validation, calibration, ensembling, and diagnostics all run through deterministic tool calls. The experiment lifecycle is a server-defined workflow with dynamic tool visibility — the agent literally cannot skip logging results or start a new experiment without completing the current one, because the tools for those steps aren't visible until the right moment.

Every experiment requires a hypothesis. Every run is fingerprinted and logged. Experiments survive session boundaries.

https://github.com/user-attachments/assets/b5205517-c9c2-403b-8bbb-d0b15a79e807

Quick Start

Install the Plugin

In Claude Code:

/plugin marketplace add msilverblatt/harness-ml
/plugin install harnessml@msilverblatt-harness-ml

This installs the MCP server, experiment discipline skills, and everything Claude needs to start building models.

Full Setup (with Studio + Demo)

git clone https://github.com/msilverblatt/harness-ml.git && cd harness-ml
uv sync
uv run harness-setup

For full setup options, see For Humans.

How It Works

The agent never writes training loops. It declares intent through MCP tool calls:

# Ingest data
data(action="ingest", path="data/raw/housing.csv")

# Add models
models(action="add", name="xgb_main", type="xgboost", features=[...])
models(action="add", name="lgb_main", type="lightgbm", features=[...])

# Train and evaluate
pipeline(action="run_backtest")
# → CV splits, training, calibration, ensemble, metrics, logging — all automatic

# Compare to previous
pipeline(action="compare_latest")
# → "RMSE: 24,312 → 22,847 (improved)"

# Iterate with discipline
experiments(action="create", hypothesis="Adding neighborhood features captures location premium")
pipeline(action="run_backtest")
# → Isolated overlay — production config untouched until promoted

Every experiment requires a hypothesis. Every run is fingerprinted and logged. Experiments survive session boundaries.

See It Work

Raw CSV to stacked ensemble, under a minute

https://github.com/user-attachments/assets/c180d2b2-7ed1-4805-a08a-01b6fb3738ac

Tuned model 5 minutes later

_{examples/titanic · examples/ames-housing · examples/wine-quality}

Harness Studio

A companion web dashboard that runs alongside the agent, giving you full observability into what it's doing and how the model is performing. See the harness-studio repo for details.

Dashboard

Project vitals, experiment verdict breakdown, primary metric trend with error bars, live MCP activity feed, and a mini pipeline DAG -- all updating in real time as the agent works.

Pipeline DAG

Interactive pipeline topology. Click any node for full config details. Models added by experiments show with dashed borders and EXP badges. Running nodes pulse during training.

Activity Monitor

View full README on GitHub

1 Plugin

harnessml

AI-driven ML experimentation — add data, discover features, build models, run experiments via natural language

1mo

v0.2.0

Related Marketplaces

mempalace

50.6K

0plugins

No description available.

payload-marketplace

42.1K

0plugins

Development marketplace for Payload

antigravity-awesome-skills

35.9K

0plugins

Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.

Stats

Plugins1

Stars2

UpdatedMar 24, 2026

Links

View on GitHub View Marketplace JSON

Claude Code for Machine Learning

Note: HarnessML is an active research project exploring what agent-driven ML workflows can look like. It is functional and demonstrates the core ideas, but it has not been stress-tested for production use. Expect rough edges, missing error messages, and workflows that assume a cooperative agent. Feedback and contributions welcome.

Every experiment requires a hypothesis. Every run is fingerprinted and logged. Experiments survive session boundaries.

https://github.com/user-attachments/assets/b5205517-c9c2-403b-8bbb-d0b15a79e807

Quick Start

Install the Plugin

In Claude Code:

/plugin marketplace add msilverblatt/harness-ml
/plugin install harnessml@msilverblatt-harness-ml

This installs the MCP server, experiment discipline skills, and everything Claude needs to start building models.

Full Setup (with Studio + Demo)

git clone https://github.com/msilverblatt/harness-ml.git && cd harness-ml
uv sync
uv run harness-setup

For full setup options, see For Humans.

How It Works

The agent never writes training loops. It declares intent through MCP tool calls:

# Ingest data
data(action="ingest", path="data/raw/housing.csv")

# Add models
models(action="add", name="xgb_main", type="xgboost", features=[...])
models(action="add", name="lgb_main", type="lightgbm", features=[...])

# Train and evaluate
pipeline(action="run_backtest")
# → CV splits, training, calibration, ensemble, metrics, logging — all automatic

# Compare to previous
pipeline(action="compare_latest")
# → "RMSE: 24,312 → 22,847 (improved)"

# Iterate with discipline
experiments(action="create", hypothesis="Adding neighborhood features captures location premium")
pipeline(action="run_backtest")
# → Isolated overlay — production config untouched until promoted

Every experiment requires a hypothesis. Every run is fingerprinted and logged. Experiments survive session boundaries.

See It Work

Raw CSV to stacked ensemble, under a minute

https://github.com/user-attachments/assets/c180d2b2-7ed1-4805-a08a-01b6fb3738ac

Tuned model 5 minutes later

_{examples/titanic · examples/ames-housing · examples/wine-quality}

Harness Studio

A companion web dashboard that runs alongside the agent, giving you full observability into what it's doing and how the model is performing. See the harness-studio repo for details.

Dashboard

Project vitals, experiment verdict breakdown, primary metric trend with error bars, live MCP activity feed, and a mini pipeline DAG -- all updating in real time as the agent works.

Pipeline DAG

Interactive pipeline topology. Click any node for full config details. Models added by experiments show with dashed borders and EXP badges. Running nodes pulse during training.