Search everything...

Stats

Actions

Available In

probabl-skills

Name: probabl-skills
Author: probabl-ai

By probabl-ai

Orchestrate end-to-end machine learning experimentation workflows using the Python data science stack, from project scaffolding and exploratory analysis through pipeline design, evaluation, testing, and iteration tracking.

Publisher marketplaceprobabl-skills@probabl-skills · marketplace and plugin share one repository (probabl-ai/skills)

npx claudepluginhub probabl-ai/skills --plugin probabl-skills

Popularity

Stars

Top 10%

Med: 0·Avg: 527

Copy clicks

Med: 0·Avg: 1

What's Inside

Skills14

audit-ml-pipeline

/audit-ml-pipeline

Owns the `audit/` folder: one `# %%` (jupytext percent) Python file per experiment, aligned 1:1 with `experiments/NN_<short_name>.py` and `journal/NN_<short_name>.md`, that loads the experiment's skore report **read-only** and uses bare-last-expression cells whose `__repr__` carries the audit's signal. The agent executes the audit file via the bundled in-process runner (`audit-ml-pipeline/scripts/run_cells.py` — IPython `InteractiveShell.run_cell`), which streams a markdown digest of each cell's stdout + last-expression repr to stdout (optionally also to a file). The digest fuels narrative work (the `JOURNAL.md` Status + History update, follow-up questions about a past experiment, cross-experiment comparison). Stops at "audit/NN_*.py is placed, executed, and the digest is available." Never calls `skore.evaluate(...)` or `project.put(...)`. TRIGGER — any of: - `iterate-ml-experiment` § 4 record-outcome — audit is dispatched FIRST (replaces scratch probes for metric extraction). - The user asks "audit experiment 02", "show me what 03 looks like", "re-audit 04 against the new report". - An experiment was re-run (same `put()` key overwritten) and the matching audit file needs re-execution. - The user wants a human-readable narrative of a past experiment without firing the full `iterate-from-skore` flow. SKIP when: the design note isn't approved yet (route to `iterate-ml-experiment`); the experiment hasn't been run (no report on disk); the agent feature isn't installed (delegate to `python-env-manager` § "Agent feature"); the user is mining the report to source the *next* experiment (`iterate-from-skore`); the user wants to explore the **raw dataset** rather than a finished run's skore report (`explore-ml-data` — audit reads a report, not the data). HOW TO USE: confirm the four-way stem pairing exists (`journal/NN_*.md` approved + `experiments/NN_*.py` exists + smoke test passed + report under that key in the Project), then place `audit/NN_<short_name>.py` from `templates/audit.py`, substituting the package name + the literal Project init block copied from `experiments/<stem>.py`. Execute via the bundled runner: `pixi run -e agent python .agents/skills/audit-ml-pipeline/scripts/run_cells.py audit/<stem>.py`. **Read the Stop conditions and emit the Pre-flight checklist before any write or shell command.** Always invoke `python-api` for skore symbol signatures — never write them from memory.

build-ml-pipeline

/build-ml-pipeline

Declare the pipeline from data source to predictor as a **skrub DataOps graph** (not as a bare `sklearn.Pipeline`). Every step is either a pure-Python function (stateless) attached via `.skb.apply_func`, or a sklearn-compatible estimator (stateful) attached via `.skb.apply`. Stops at the declared object — no fit, split, tuning, persistence, or evaluation. TRIGGER — any of: - Writing or editing code that declares any link in the chain *data source → predictor*: loaders, preprocessing, encoders / imputers / scalers, feature steps, composition objects (`Pipeline`, `ColumnTransformer`, skrub `tabular_pipeline`, `nn.Module`), or the final estimator. - A pure-Python data-processing function destined for the pipeline path (cleans / derives / reshapes) — whether wrapped via `FunctionTransformer`, `skrub.@deferred` / `skrub.var`, a custom `BaseEstimator` subclass, or just called in the training path before the estimator. - A step is added, removed, swapped, or reordered inside an existing pipeline declaration. - A bare `sklearn.Pipeline` / `make_pipeline` is being used as the top-level — fire to redirect into a skrub DataOps graph. - The user asks to build / declare / set up a pipeline / classifier / regressor for X. SKIP when: `.fit(...)` calls / training loops / `Trainer.fit` / epoch loops; train/test split or cross-validation splitting; hyperparameter search; persistence (`joblib.dump`, checkpointing); evaluation / metrics / scoring; inference over a pre-trained model; pure EDA; library-choice questions with no concrete declaration in play. HOW TO USE: consult before the first declarative line and on every structural edit (added/swapped step, changed input columns, changed estimator family). Don't re-consult for cosmetic edits. **First, read the Stop conditions and emit the Pre-flight checklist as visible text before any code.** Always invoke `python-api` to confirm skrub / sklearn symbol names and signatures before typing — don't guess from memory.

data-science-python-stack

/data-science-python-stack

Opinionated Python stack for data-science / ML work — one library per job, organized into tiers (mandatory / user choice / optional / transitive). SKILL.md is the index; per-library `references/<library>.md` files carry scope, "pick this when" / "pick something else when", and pairings. TRIGGER when (any of these): (1) **a library import fails** in this stack's domain — the answer is install, not substitute (see § "Missing dependency"); (2) **a library choice has to be made** — explicitly (the user asks "which library for X?") or implicitly (code is about to introduce a new dependency, or the project is being scaffolded and the tabular library hasn't been picked yet); (3) starting a new Python data-science / ML project; (4) the user or current code reaches for a substitute outside the stack (xgboost, lightgbm, black, isort, flake8, poetry, hatch), or reaches for `mlflow` to log params/metrics, or for `cross_val_score` + handwritten reporting — redirect: tracking → `skore` Project API, evaluation / reporting → `skore` report classes, `mlflow` stays only for model serving / registry. SKIP when: the project is non-Python; the work is web / backend / infra unrelated to data science; the library is already chosen and installed and the task is implementation inside it (bug fix, feature work, refactor) with no new dependency in play. HOW TO USE: **read this SKILL.md end-to-end before recommending or installing anything** — picking from a single index entry hides the tier (whether the library is mandatory, a user-choice, optional, or already transitively present) and the pairings, and both matter. Then read the linked `references/<library>.md` for the chosen library's scope and tradeoffs. Don't silently substitute one library for another; if no entry fits, surface the gap to the user.

evaluate-ml-pipeline

/evaluate-ml-pipeline

Methodology for evaluating a single sklearn-compatible learner (in particular, the `SkrubLearner` produced by `build-ml-pipeline`). Owns: which entry point to call (`skore.evaluate` first, the explicit report classes when needed), which cross-validator to pick from scikit-learn's catalogue, how to consume the structural metadata (`groups`, `times`, …) attached at build time via `.skb.mark_as_X(split_kwargs=...)`. Stops at "what does the report say". Defaults (metrics, plots) come from skore; only override on explicit user request. TRIGGER when: code calls `cross_val_score`, `cross_validate`, `classification_report`, or any handwritten metric print (`print(mean_squared_error(...))`); code calls `.skb.cross_validate(...)` (route through skore for richer output); user asks how to score, evaluate, or compare a single learner; user asks how to pick a cross-validator; user wants to see a report / metrics / diagnostic plots for a fitted learner. SKIP when: declaring the pipeline (use `build-ml-pipeline`); hyperparameter / model search (separate skill); fitting, persisting, or serving the final model; tracking or comparing experiments across multiple runs over time (separate skill). HOW TO USE: invoke before any evaluation call. **First, read the "Stop conditions" block at the top of the body and emit the Pre-flight checklist as visible text in your response — both are mandatory before any evaluation code is written.** The structural facts about the data (group keys, time ordering) should already be encoded at the X marker via `split_kwargs` — if they aren't and you can't tell from the data, return to `build-ml-pipeline` and ask the user. For symbol-level lookups, defer to `python-api` (skore symbols) and `python-api` (splitters); don't guess names from memory.

explore-ml-data

/explore-ml-data

Owns data understanding BEFORE any model is designed. Places and executes `data/eda.py` (a jupytext `# %%` script) via the shared in-process runner, reads the streamed digest, then writes a persisted `data/eda.md` report (plus linked `data/eda_<table>.html` skrub `TableReport` pages) and the `## Data understanding (EDA)` section of `journal/JOURNAL.md`. The point is to surface the dataset facts — shape, dtypes, missingness, cardinality, target balance / skew, datetime / group structure, feature associations — that JUSTIFY the later learner / splitter / metric decisions, so the user understands *why* the modelling choices are made. Uses `skrub.TableReport` for dataframe overviews and the shared runner `audit-ml-pipeline/scripts/run_cells.py`. Stops at "EDA executed, `data/eda.md` + HTML written, JOURNAL EDA section updated." Never designs the model, never edits `src/<pkg>/`, never modifies the user's raw data files. TRIGGER — any of: - `iterate-ml-experiment` § 0 bootstrap, BEFORE the baseline design note — the G-EDA gate fires here (run / skip). - The user asks to "explore the data", "do an EDA", "profile the dataset", "what does the data look like", "understand the data". - A new or changed data source needs (re-)understanding before the next experiment. SKIP when: the workspace isn't scaffolded / bootstrapped yet — `iterate-ml-experiment` § 0 owns bootstrap ordering and will dispatch here at the G-EDA step; don't run standalone ahead of scaffolding (route to `iterate-ml-experiment` / `organize-ml- workspace`); there is no data to explore yet; the user wants to inspect a finished run's skore report rather than the raw dataset (`audit-ml-pipeline`); the user is past data understanding and wants pipeline / evaluation mechanics (`build-ml-pipeline` / `evaluate-ml-pipeline`); a pure symbol lookup (`python-api`); EDA is already recorded (`data/eda.md` + the JOURNAL EDA section exist) and the user is not asking to refresh it. HOW TO USE: run the Detection step (does `data/eda.md` + the JOURNAL EDA section already exist?), emit the Pre-flight checklist as visible text, read the Stop conditions, then place `data/eda.py` from `templates/eda.py`, execute it via the shared runner, read the digest, and author `data/eda.md` + the JOURNAL EDA section. Always resolve skrub / pandas / polars symbols via `python-api`, never from memory.

Stats

Version0.6.0

ReleasedJun 23, 2026

LanguagePython

Stars55

Forks4

MaintenanceExcellent

LicenseBSD-3-Clause

Last CommitJul 10, 2026

AddedMay 17, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

probabl-skills56

README

Probabl Skills

A set of skills to partner with you throughout your machine learning experimentation journey. It helps you with:

organizing your workspace
building your machine learning pipeline with the right libraries while ensuring good methodologies
evaluating and storing your results so you can easily audit and get insights from them
coupling it with Skore Hub to get a comprehensive view of your experiments and their results
iterating on your next experiments using insights from Skore diagnostics and your own feedback

So we aim to let you focus on the science, with AI agents handling the implementation, guided by two important ingredients: great libraries for maintainability and good methodologies to run experiments correctly.

In practice, from a prompt such as:

╭────────────────────────────────────────────────────────────────────────╮
│ > Given the context in the file `data/README.md` and the data located  │
│   in `data/`, let's build a first machine learning pipeline that will  │
│   serve as baseline for the next experiments that we are going to run  │
│   together.                                                            │
╰────────────────────────────────────────────────────────────────────────╯

you can expect your agent to start experimenting with you. The skills work well with models such as Claude Opus and Sonnet and give great results with smaller models such as Qwen 3.6 30B or DeepSeek v4 Flash. As for agent harnesses, we tested them with Claude Code, OpenCode, Cursor, and GitHub Copilot and found no significant difference in terms of skill invocation.

Install

You can install the skills using the skore CLI that you can install from PyPI or from conda-forge and run the following command.

First install skore-cli:

# with pip
pip install skore-cli
# with uv
uv tool install skore-cli
# with pixi
pixi global install skore-cli

Then run the following command:

skore skills install

You can use uvx or pixi exec to install the skore CLI and directly run the command in an isolated environment:

uvx --from skore-cli skore skills install

pixi exec --spec skore-cli skore skills install

If you prefer npx, then you can use:

npx skills add probabl-ai/skills

Alternative — Claude Code plugin marketplace

If you only use Claude Code and prefer the native plugin flow, this repo is also a Claude Code plugin marketplace:

/plugin marketplace add probabl-ai/skills

/plugin install probabl-skills@probabl-skills

/plugin update pulls new releases.

Skills in detail

ML pipeline lifecycle

Skill	Description
explore-ml-data	Explore the dataset before designing any model.
build-ml-pipeline	Build a machine learning pipeline from the data source to the learner, including multi-tables engineering.
evaluate-ml-pipeline	Evaluate a complex machine learning pipeline and get structured reports including metrics, plots, and diagnostics.
test-ml-pipeline	Make sure that your machine learning pipeline is production-ready statistically and functionally.
smoke-test-ml-pipeline	Stress test your machine learning pipeline on future data to make sure it works.
audit-ml-pipeline	Once testing and the experiment is done, audit by loading a skore report and investigate.

Iteration loop

Skill	Description
iterate-ml-experiment	Design, keep track of experiments and iterate on them.
iterate-from-skore	Use skore to run diagnostics and checks that can be reported and addressed in the next experiment.
iterate-from-user	As a user be in the loop and propose new experiments — free-text, a scientific article URL, or a resource link (GitHub issue / spec / reference repo).

Workspace and tooling

Skill	Description
organize-ml-workspace	An organized workspace to keep track of your experiments.
python-code-style	Enforce good practices out-of-the-box for the Python ecosystem for your code.
python-env-manager	Bootstrapping the experiment setup based on your favorite Python environment manager.
data-science-python-stack	Opinionated one-library-per-job Python stack, organized into mandatory / user-choice / optional / transitive tiers.

View full README on GitHub

probabl-skills

Popularity

What's Inside

Confidence

README

Probabl Skills

Install

Alternative — Claude Code plugin marketplace

Skills in detail

ML pipeline lifecycle

Iteration loop

Workspace and tooling

Similar Plugins

experiment-tracking-setup

ds

machine-learning-ops

superml

ml-pipeline-automation

datarobot-agent-skills

Probabl Skills

Install

Alternative — Claude Code plugin marketplace

Skills in detail

ML pipeline lifecycle

Iteration loop

Workspace and tooling

Popularity

Health & Quality

Similar Plugins

experiment-tracking-setup

ds

machine-learning-ops

superml

ml-pipeline-automation

datarobot-agent-skills