From agent-loops
Iterative, self-checking exploratory analysis of a dataset that proposes hypotheses, writes and runs analysis code to test them, and records findings only when verified.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-loops:data-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A **hypothesis → verify** reflection loop over a dataset. The artifact is a findings report; the
A hypothesis → verify reflection loop over a dataset. The artifact is a findings report; the feedback signal is verification — a finding only counts if re-running the computation confirms it at a meaningful effect size. The discipline this enforces: no insight without a number behind it. A plausible claim the data does not support is discarded, not softened; every line in the report can be reproduced from the dataset.
Use this for open-ended, self-checking exploration of a bound dataset where each finding must survive an independent re-computation. Default to broad exploration across the columns; if the user gives a focus question, let it steer the hypotheses. Not for diagnosing one known anomaly or for checking an external claim against the literature.
Resolve bindings interactively. If loop.run.yaml exists in the working dir, load it, confirm the
values in one line, and skip to the loop. Otherwise: on Claude Code (the AskUserQuestion tool is
available) infer a likely value for each binding and present it as the recommended option; on other
hosts ask each as a quoted plain-text prompt. Then write loop.run.yaml (format:
examples/run.example.yaml) and confirm the values before creating any other files.
| binding | meaning | default | how to infer |
|---|---|---|---|
<dataset> | data file to analyze (CSV/TSV/Parquet/…); read-only ground truth | — | scan the working dir for a data file |
<question> | optional analysis focus; omit to explore broadly | — | ask the user; else leave unbound |
<report> | output findings file | <sandbox_root>/findings.md | — |
<analysis_cmd> | interpreter that runs analysis snippets in the user's env | python3 | pyproject.toml/.venv/uv in the working dir |
<sandbox_root> | where snippets + ledger live | ./sandbox | — |
<budget> | max iterations | 8 | — |
<patience> | stop after N consecutive iters with no new verified finding | 2 | — |
Analysis snippets run in the user's environment via <analysis_cmd>, so they may use whatever the
user has installed. Keep helper code stdlib-first (csv, statistics): if a snippet needs
pandas/numpy, probe with try/except ImportError and degrade to a stdlib path, or offer a
consented uv pip install "pandas==<ver>" — never assume the package is installed.
Copy this checklist and tick items off:
<dataset> (shape, types, ranges, missingness); record nothing as a finding.<question>; not already settled).<sandbox_root>/iter<N>/analysis.py, run with <analysis_cmd>, redirect to out.txt.<report> (verified); else log refuted, do not add it.<patience>) or <budget>.Iteration 0 — profile. Write and run a snippet that reports the shape of <dataset>: columns,
inferred types, row count, and a quick summary (ranges, category counts, missingness). This grounds
the hypotheses; record nothing as a finding yet.
Then, until stop (dry or budget):
<question> steer it; do not repeat a hypothesis already settled.<sandbox_root>/iter<N>/analysis.py that loads <dataset> and computes the
relevant statistic plus an effect size (a group-mean difference, a rate gap, a correlation —
not just a yes/no). Run it with <analysis_cmd>, redirecting output to
<sandbox_root>/iter<N>/out.txt (never flood your context).<report>: the claim, the exact numbers, the effect size,
and the method (so it is reproducible). Mark it verified.refuted in the ledger and do not add it to
the report. A null result is a real outcome, not a failure to hide.<sandbox_root>/ledger.tsv, tab-separated, never commas in the text. Header:
iter hypothesis effect status
status ∈ {profile, verified, refuted}. Example:
iter hypothesis effect status
0 dataset profile - profile
1 enterprise orders average higher value than consumer 185 vs 109 (+70%) verified
2 returns differ by region North 0.16 vs South 0.14 (negligible) refuted
3 mobile has a higher return rate than web/store 0.30 vs 0.10 verified
Report the best outcome: the <report> path, the count of verified findings, and the hypotheses
refuted (so the user sees what was checked and ruled out, not just what survived).
<report> carries the figures and the
method that produced it; if you cannot compute it, you cannot claim it.<dataset> — never modify it, because it is the ground truth every finding is checked
against. The sandbox is self-contained (no ../ escapes).<patience> consecutive iterations add no new verified finding.<budget> iterations reached.npx claudepluginhub gaasher/agent-loop-skills --plugin agent-loopsAdversarially verifies data-backed claims in a results draft by reproducing each number and stress-testing against outliers, confounds, and reversals before publication.
Generates and tests hypotheses from tabular data using LLMs, integrating literature insights with data-driven methods for empirical research.