Help us improve
Share bugs, ideas, or general feedback.
From probabl-skills
Converts ML pipeline audit digest rows with severity `issue` or `tip` into actionable Backlog items by following each row's documentation URL. Meant to be triggered after an experiment has finished and the user wants to mine the diagnostic report for next steps.
npx claudepluginhub probabl-ai/skills --plugin probabl-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/probabl-skills:iterate-from-skoreThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Source: the audit digest at `scratch/audit/<stem>/audit.md`,
Audits ML experiments by executing read-only bare-expression Python files with skore reports and streaming a markdown digest. Triggered after experiment iteration, on user request, or after re-runs.
Transforms audit findings into sprint-ready work items with effort estimates, acceptance criteria, and stakeholder rationale. Use after audits to create tickets from reports.
Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.
Share bugs, ideas, or general feedback.
Source: the audit digest at scratch/audit/<stem>/audit.md,
produced by audit-ml-pipeline at § 4 record-outcome.
Output: a set of Backlog-candidate rows + a short human
summary, handed back to iterate-ml-experiment. The parent skill
writes the rows to JOURNAL.md Backlog and re-presents the
sourcing menu so the user can promote one via B<N>.
The digest carries two sections that matter here:
## Checks summary — a DataFrame whose rows each have code,
severity (passed / issue / tip), and documentation_url.
Each issue / tip row → one Backlog candidate, with the
documentation_url driving the Item text.## Metrics summary — task-appropriate headline metrics
(regression / classification / multiclass). Used to ground the
human summary paragraph ("the run achieved X but the SKD003
check flagged Y"). Does not drive Backlog rows on its own.Nothing else. The audit template intentionally stops at these two sections; deeper accessors (residuals, importance, calibration, …) are out of scope here.
The audit already opened the Project, loaded the report, called the two accessors, and rendered the output as markdown. Re-doing that work here would duplicate the cost of materialising Display objects, risk drift between two walks, and require the agent environment this skill should not need. Reading the digest as text is cheaper and deterministic.
This skill never writes journal/ files (including
JOURNAL.md) — the parent owns those. It returns two artifacts as
conversation text:
Backlog-candidate rows — one row per actionable check from the digest. Each row carries:
Item: one-line experiment idea derived from the check's
documentation_url content. Phrase as an experiment idea,
not as a metric reading.Source: audit:<stem>:checks.<code> (e.g.
audit:01_baseline:checks.SKD003). The citation is
load-bearing for dedup.Summary — one paragraph for the user: how many findings were surfaced, the top 2-3 by severity, the headline numbers from the metrics summary as context. Keep it dense.
If the parent's Backlog already contains a row with the same
Source citation, drop the candidate — do not duplicate. The
summary should note the number of dropped duplicates ("4 new
findings; 2 were already in Backlog from prior mining").
If the digest's checks summary has no issue / tip rows (only
passed), return zero candidate rows and a summary that says so
explicitly: "the report looks clean on the checks surface; no
actionable findings on this turn." The parent will note this in
JOURNAL.md Status and the user picks user next.
If the digest at scratch/audit/<stem>/audit.md cannot be read
(file missing, audit never executed, audit errored), do not
fabricate findings from memory and do not re-run probes. Return
zero rows and a summary that explains the access failure. The
parent surfaces the gap to the user; recovery is owned by
audit-ml-pipeline (re-run the audit runner, fix the auth, …).
journal/ files. That includes JOURNAL.md.
This skill returns rows as conversation text; the parent writes
them.audit-ml-pipeline; never call
project.get(...) from iterate-from-skore.## Checks summary rows drive Backlog candidates. The
metrics summary is context for the human paragraph; it does not
produce Backlog rows on its own. Deeper diagnostic surfaces
(residuals, feature importance, calibration, …) are not in the
audit template and not in scope here.documentation_url. For each issue / tip
check, fetch the linked skore docs page (via WebFetch) and
derive the Backlog Item from what the page recommends. Do not
invent mitigations from training-data memory of skore.B<N>).Source citation.
Read JOURNAL.md Backlog before emitting; skip any candidate
whose Source matches an existing row.Read tool call; fetching the doc URL is a WebFetch call.
No pixi run python …, no python -c …. The only side effect
this skill triggers is re-executing the audit runner (via
audit-ml-pipeline) when the digest is missing.done
experiment lives at scratch/audit/<stem>/audit.md. If
multiple done experiments exist, default to the most recent
— surface the choice to the user only if they ask.Read tool.## Checks summary section. For every row whose
severity is issue or tip:
documentation_url with WebFetch. The page
describes what the check tests and what to try next.Item from the page's recommended
mitigations, phrased as a one-line experiment idea.audit:<stem>:checks.<code> (e.g.
audit:01_baseline:checks.SKD003).JOURNAL.md
Backlog. Drop candidates whose citation already exists.## Metrics summary for context only — the
headline metrics anchor the human summary paragraph.Backlog candidates (from: audit digest of <prev_stem>):
- Item: <one-line experiment idea derived from the docs URL>
Source: audit:<prev_stem>:checks.<code>
- Item: ...
Source: ...
- ...
Dropped as duplicates (already in Backlog): <N>
Summary:
<one paragraph for the user — counts, top 2-3 highlights, the
headline metrics for context, and the doc URLs of the surfaced
checks. Dense, not chatty.>
iterate-ml-experiment consumes this:
JOURNAL.md Backlog with stable
B<N> indices appended at the end.B<N> row directly or pick user if
the findings prompt a different direction.iterate-ml-experiment — the caller; owns the design notes
(including JOURNAL.md).audit-ml-pipeline — the producer of the digest this
skill reads. The two skills share the same diagnostic surface
but have opposite directions: audit-ml-pipeline opens the
Project and renders the digest (write side); iterate-from-skore
consumes the digest as text and follows the check doc URLs (read
side).evaluate-ml-pipeline — for "what does the report say"
before "what should we try next". The narrative read side; not
used by this skill.iterate-from-user — the sibling sourcing strategy; sources
from the user (article, resource, or free text) when the
digest's findings aren't the right starting point.