From agent-loops
Diagnoses root causes of known data anomalies (metric spikes/drops, outliers) by forming candidate explanations, testing each against the data, and eliminating refuted ones until one cause survives and is confirmed.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-loops:anomaly-investigationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A **form → test → eliminate → confirm** loop — root-cause analysis as a search. The artifact is an
A form → test → eliminate → confirm loop — root-cause analysis as a search. The artifact is an investigation log; the feedback signal is the count of live candidate explanations, driven down toward a single cause that is confirmed, not merely consistent. Each iteration you test one candidate against the data and drop the ones the data refutes, narrowing the field until one survives.
The discipline this enforces: a cause is "root" only when it both survives an honest attempt to refute it and makes a positive prediction that checks out (e.g. "if this is the cause, removing it restores normal" — and it does). A story that merely could explain the anomaly is a hypothesis, not a finding.
Use this when an anomaly is already in hand — you know roughly what looks wrong and want the cause diagnosed by elimination against the data. Default to a broad initial slate of mutually distinguishable causes, then test the one that splits the field fastest; if the anomaly is vague, your first job is to make it precise (iteration 0). Not for open-ended exploration of a dataset with no anomaly to chase (use data-analysis), and not for verifying an external claim against the literature (use claim-verify).
Resolve bindings interactively. If loop.run.yaml exists in the working dir, load it, confirm the
values in one line, and skip to the loop. Otherwise: on Claude Code (the AskUserQuestion tool is
available) infer a likely value for each binding and present it as the recommended option; on other
hosts ask each as a quoted plain-text prompt. Then write loop.run.yaml (format:
examples/run.example.yaml) and confirm the values before creating any other files.
| binding | meaning | default | how to infer |
|---|---|---|---|
<dataset> | data (or logs) to investigate; read-only ground truth | — | scan the working dir for a data/log file |
<anomaly> | what looks wrong: the metric, where/when, and how big the deviation is | — | ask the user; make precise in iter 0 |
<analysis_cmd> | interpreter that runs analysis snippets in the user's env | python3 | pyproject.toml/.venv/uv in the working dir |
<log> | output investigation log | <sandbox_root>/investigation.md | — |
<sandbox_root> | where snippets + ledger live | ./sandbox | — |
<budget> | max iterations | 8 | — |
Analysis snippets run in the user's environment via <analysis_cmd>, so they may use whatever the
user has installed. Keep helper code stdlib-first (csv, statistics): if a snippet needs
pandas/numpy, probe with try/except ImportError and degrade to a stdlib path, or offer a
consented uv pip install "pandas==<ver>" — never assume the package is installed.
Copy this checklist and tick items off:
<log>.<sandbox_root>/iter<N>/test.py, run with <analysis_cmd>, redirect to out.txt.<budget>.Iteration 0 — characterize. Quantify the anomaly precisely: write and run a snippet that pins down
what deviated, where/when, and how big the deviation is against the normal baseline (the same
metric on surrounding periods/segments). Then form an initial slate of candidate causes — mutually
distinguishable explanations, broad enough to contain the truth (a real change, a composition/mix
shift, a data-quality bug, a measurement change, seasonality, an outlier segment). List them in
<log> as the live candidates. Record nothing as confirmed yet.
Then, until stop (one confirmed cause, or budget):
<sandbox_root>/iter<N>/test.py that computes the thing that would refute or
support it (slice by segment/source/time, recompute the metric, compare distributions). Run it
with <analysis_cmd>, redirecting output to <sandbox_root>/iter<N>/out.txt (never flood your
context).<log> with the evidence; drop it from the live set.Observational equivalence. Two mechanistically different candidates can make identical predictions in the data you have (e.g. a bot flood and a pipeline double-count both look like "sessions spike, conversions flat" in daily aggregates). When that happens you cannot separate them here — do not pick one arbitrarily. Report them as a single confirmed cause at the resolution of the available data, and name the additional data that would distinguish them (finer-grained logs, raw event records, an upstream check). Distinguish, too, the mechanism (how the metric moved) from the root cause (why the inputs were wrong) — confirming the mechanism is progress, but is not the cause.
<sandbox_root>/ledger.tsv, tab-separated, never commas in the text. Header:
iter candidate_tested verdict live_candidates
verdict ∈ {characterize, refuted, supported, confirmed}. Example:
iter candidate_tested verdict live_candidates
0 characterize anomaly + slate characterize 5
1 real drop across all segments refuted 4
2 one segment's conversions fell refuted 3
3 one source's sessions inflated supported 2
4 removing that source restores normal confirmed 1
Report the confirmed root cause with its confirming evidence, the alternatives and how each was ruled out, and — if you stop without a single confirmed cause — the remaining live candidates and the test that would separate them.
<log>.<dataset> — never modify it, because it is the ground truth every test is checked
against. The sandbox is self-contained (no ../ escapes).<budget> iterations reached without a single confirmed cause; report the live set.npx claudepluginhub gaasher/agent-loop-skills --plugin agent-loopsInvestigates surprising, impossible, or contradictory results in data pipelines, models, or experiments. Guides root-cause analysis before any adjustments.
Systematically investigates causal relationships to identify true root causes rather than correlations or symptoms. Tests competing explanations and designs interventions addressing underlying drivers.
Investigates bugs using falsifiable hypotheses, systematic elimination, and structured logging. Useful for root-cause analysis of complex or intermittent failures.