Skill

iterate-from-skore

Converts ML pipeline audit digest rows with severity `issue` or `tip` into actionable Backlog items by following each row's documentation URL. Meant to be triggered after an experiment has finished and the user wants to mine the diagnostic report for next steps.

Python

ai-ml

npx claudepluginhub probabl-ai/skills --plugin probabl-skills

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/probabl-skills:iterate-from-skore

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Source: the audit digest at `scratch/audit/<stem>/audit.md`,

SKILL.md

217 lines · ~2.5k tokens

Similar Skills

audit-ml-pipeline

Audits ML experiments by executing read-only bare-expression Python files with skore reports and streaming a markdown digest. Triggered after experiment iteration, on user request, or after re-runs.

5 files

probabl-skills

backlog-generator

Transforms audit findings into sprint-ready work items with effort estimates, acceptance criteria, and stakeholder rationale. Use after audits to create tickets from reports.

design-system-ops

solo-retro

Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.

2 files10 tools

solo

Stats

LanguageShell

Stars13

Forks1

MaintenanceExcellent

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Iterate from skore

Source: the audit digest at scratch/audit/<stem>/audit.md, produced by audit-ml-pipeline at § 4 record-outcome. Output: a set of Backlog-candidate rows + a short human summary, handed back to iterate-ml-experiment. The parent skill writes the rows to JOURNAL.md Backlog and re-presents the sourcing menu so the user can promote one via B<N>.

What this skill consumes

The digest carries two sections that matter here:

## Checks summary — a DataFrame whose rows each have code, severity (passed / issue / tip), and documentation_url. Each issue / tip row → one Backlog candidate, with the documentation_url driving the Item text.
## Metrics summary — task-appropriate headline metrics (regression / classification / multiclass). Used to ground the human summary paragraph ("the run achieved X but the SKD003 check flagged Y"). Does not drive Backlog rows on its own.

Nothing else. The audit template intentionally stops at these two sections; deeper accessors (residuals, importance, calibration, …) are out of scope here.

Why read the digest (not re-walk the Project)

The audit already opened the Project, loaded the report, called the two accessors, and rendered the output as markdown. Re-doing that work here would duplicate the cost of materialising Display objects, risk drift between two walks, and require the agent environment this skill should not need. Reading the digest as text is cheaper and deterministic.

Output contract (read this before the body)

This skill never writes journal/ files (including JOURNAL.md) — the parent owns those. It returns two artifacts as conversation text:

Backlog-candidate rows — one row per actionable check from the digest. Each row carries:
- Item: one-line experiment idea derived from the check's documentation_url content. Phrase as an experiment idea, not as a metric reading.
- Source: audit:<stem>:checks.<code> (e.g. audit:01_baseline:checks.SKD003). The citation is load-bearing for dedup.
Summary — one paragraph for the user: how many findings were surfaced, the top 2-3 by severity, the headline numbers from the metrics summary as context. Keep it dense.

If the parent's Backlog already contains a row with the same Source citation, drop the candidate — do not duplicate. The summary should note the number of dropped duplicates ("4 new findings; 2 were already in Backlog from prior mining").

Empty-checks outcome

If the digest's checks summary has no issue / tip rows (only passed), return zero candidate rows and a summary that says so explicitly: "the report looks clean on the checks surface; no actionable findings on this turn." The parent will note this in JOURNAL.md Status and the user picks user next.

Inaccessible-digest fallback

If the digest at scratch/audit/<stem>/audit.md cannot be read (file missing, audit never executed, audit errored), do not fabricate findings from memory and do not re-run probes. Return zero rows and a summary that explains the access failure. The parent surfaces the gap to the user; recovery is owned by audit-ml-pipeline (re-run the audit runner, fix the auth, …).

Stop conditions

Don't write journal/ files. That includes JOURNAL.md. This skill returns rows as conversation text; the parent writes them.
Don't re-open the skore Project from this skill. The audit already did. Reading the digest as text is the contract — see § "Why read the digest". If the digest is missing, re-execute the audit runner via audit-ml-pipeline; never call project.get(...) from iterate-from-skore.
Only ## Checks summary rows drive Backlog candidates. The metrics summary is context for the human paragraph; it does not produce Backlog rows on its own. Deeper diagnostic surfaces (residuals, feature importance, calibration, …) are not in the audit template and not in scope here.
Follow the documentation_url. For each issue / tip check, fetch the linked skore docs page (via WebFetch) and derive the Backlog Item from what the page recommends. Do not invent mitigations from training-data memory of skore.
Don't pick a single "winning" finding for the user. Emit one row per actionable check. The user picks via the parent's sourcing menu (B<N>).
Dedup against existing Backlog rows by Source citation. Read JOURNAL.md Backlog before emitting; skip any candidate whose Source matches an existing row.
Don't author acceptance criteria. Backlog rows are experiment ideas, not goals with target deltas. The user judges the result after the run.
No Python execution from this skill. Reading the digest is a Read tool call; fetching the doc URL is a WebFetch call. No pixi run python …, no python -c …. The only side effect this skill triggers is re-executing the audit runner (via audit-ml-pipeline) when the digest is missing.

The inspection loop

Locate the digest. The audit digest for the latest done experiment lives at scratch/audit/<stem>/audit.md. If multiple done experiments exist, default to the most recent — surface the choice to the user only if they ask.
Read the digest as text. Use the Read tool.
Walk the ## Checks summary section. For every row whose severity is issue or tip:
- Follow documentation_url with WebFetch. The page describes what the check tests and what to try next.
- Draft the Backlog Item from the page's recommended mitigations, phrased as a one-line experiment idea.
- Citation: audit:<stem>:checks.<code> (e.g. audit:01_baseline:checks.SKD003).
Dedup against the existing Backlog. Read JOURNAL.md Backlog. Drop candidates whose citation already exists.
Read the ## Metrics summary for context only — the headline metrics anchor the human summary paragraph.
Compose the return block below.

What is returned

Backlog candidates (from: audit digest of <prev_stem>):
  - Item:    <one-line experiment idea derived from the docs URL>
    Source:  audit:<prev_stem>:checks.<code>
  - Item:    ...
    Source:  ...
  - ...

Dropped as duplicates (already in Backlog): <N>

Summary:
  <one paragraph for the user — counts, top 2-3 highlights, the
  headline metrics for context, and the doc URLs of the surfaced
  checks. Dense, not chatty.>

iterate-ml-experiment consumes this:

Writes the candidate rows into JOURNAL.md Backlog with stable B<N> indices appended at the end.
Surfaces the summary verbatim to the user.
Re-presents the sourcing menu with the enriched Backlog visible so the user can pick a B<N> row directly or pick user if the findings prompt a different direction.

Companion skills

iterate-ml-experiment — the caller; owns the design notes (including JOURNAL.md).
audit-ml-pipeline — the producer of the digest this skill reads. The two skills share the same diagnostic surface but have opposite directions: audit-ml-pipeline opens the Project and renders the digest (write side); iterate-from-skore consumes the digest as text and follows the check doc URLs (read side).
evaluate-ml-pipeline — for "what does the report say" before "what should we try next". The narrative read side; not used by this skill.
iterate-from-user — the sibling sourcing strategy; sources from the user (article, resource, or free text) when the digest's findings aren't the right starting point.

iterate-from-skore

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

iterate-from-skore

Popularity

Invocation

Context Preview

SKILL.md

Iterate from skore

What this skill consumes

Why read the digest (not re-walk the Project)

Output contract (read this before the body)

Empty-checks outcome

Inaccessible-digest fallback

Stop conditions

The inspection loop

What is returned

Companion skills

Similar Skills

Help us improve

Iterate from skore

What this skill consumes

Why read the digest (not re-walk the Project)

Output contract (read this before the body)

Empty-checks outcome

Inaccessible-digest fallback

Stop conditions

The inspection loop

What is returned

Companion skills