Help us improve
Share bugs, ideas, or general feedback.
From mycelium
Reviews code and analysis changes in data-science, ML, or bioinformatics projects with six specialized parallel agents, producing a prioritized Critical/Important/Minor/Nit report. Also supports grill and tripwires modes.
npx claudepluginhub arjunrajlaboratory/mycelium --plugin myceliumHow this command is triggered — by the user, by Claude, or both
Slash command
/mycelium:reviewThe summary Claude sees in its command listing — used to decide when to auto-load this command
# Mycelium — Review Review code, analysis, or documentation changes for the kinds of mistakes that matter in scientific data work. Two modes plus one opt-in follow-up: - **Default**: dispatch six specialized sub-agents in parallel, then synthesize a single prioritized report. - **`grill`**: walk the user through every consequential analytical decision conversationally, one question at a time. - **Tripwires (opt-in, offered after default)**: perturb inputs and verify the pipeline actually fails at named scientific boundaries. Three categories — fault-injection, metamorphic, and kno...
/reviewRuns a 12-angle code review with skeptical triage for P0/P1 findings, or renders a requirement-to-test traceability tree. Also supports --deep flag for all-angle triage.
/sc-adversarial-reviewRuns multi-model adversarial review using Codex, Gemini, and Claude on files, directories, staged changes, branches, or PRs for diverse critiques.
/ask-allDispatches a question to GPT, Gemini, Grok, and configured OpenRouter models in parallel, then synthesizes independent verdicts and flags disagreements.
/consensusRuns multi-agent consensus review on code changes, docs/specs, or launch decisions using git PR/diff context and parallel agents for fast pragmatic outcomes.
/reviewReviews documents or free-text topics with multiple AI models independently, then synthesizes and converges on a unified review.
/cc-councilRuns multi-agent council review on code targets (files, dirs, PRs, git diffs, architecture) with protocols, scoped scoring, voting, and orchestration. Produces findings, weighted scores, and optional auto-fixes.
Share bugs, ideas, or general feedback.
Review code, analysis, or documentation changes for the kinds of mistakes that matter in scientific data work. Two modes plus one opt-in follow-up:
grill: walk the user through every consequential analytical decision
conversationally, one question at a time.skills/core/references/review/deep-tripwires.md.The point of progressive disclosure is that the main skill stays under your context budget and only loads the per-domain checklists into the sub-agents that need them. Don't read the per-agent checklists yourself unless you are debugging the skill — pass the file paths to the sub-agents.
LLM coding agents tend to make the same kinds of mistakes when generating analysis code, and many are silent: code that runs cleanly and produces a confidently wrong number. This skill encodes the catalog of those failure modes, partitions them across six specialists so each can hold a tight prompt, and aggressively prunes false positives in synthesis. Severity calibration matters more than raw recall — a list of fifty nits buries the one critical finding.
The skill works on any of the following review scopes. If the user did not specify, ask once which scope they want.
| Scope | How to obtain the diff |
|---|---|
PR (/mycelium:review <PR#> or PR URL) | gh pr diff <num> plus gh pr view <num> --json files,body,title for context |
Commit / commit range (/mycelium:review <sha> or <sha1>..<sha2>) | git show <sha> or git diff <sha1>..<sha2> |
| Working tree (default if no scope given on a dirty repo) | git diff HEAD plus git status |
| Branch vs main | git diff main...HEAD |
| Pasted diff (user dumped a diff into chat) | Use the diff verbatim |
Whole analysis directory (/mycelium:review analysis/<name>) | Read the directory; treat all files as "added" |
For all scopes: also collect the relevant context — the analysis's
UPPER_SNAKE_CASE.md documentation file, specification.md if present,
.living/decisions.md, and any installed convention packs in
.living/conventions/ — and pass these into each sub-agent so it can ground
findings in the project's stated intent rather than inferring from code alone.
Determine the scope from the user's invocation. If unclear, ask one short
question ("review the working tree, last commit, or a specific PR?") rather
than guessing. For PRs, fetch with gh. For commits/working tree, use git.
Capture the diff, the list of touched files, and any relevant context files
into local variables (or a scratch file in /tmp/) so the sub-agents share a
common substrate.
If the Agent tool is available to you, send a single message with six
concurrent Agent tool calls (subagent_type general-purpose is fine; for
very large diffs prefer Explore). If the Agent tool is not available
(common when this skill runs from inside a sub-agent context that doesn't
expose it), execute each sub-agent's checklist in-line: read each checklist
file in turn, apply it to the diff, collect findings, then proceed to
synthesis. Either way the output contract and the synthesis steps are the
same.
Each sub-agent (or in-line pass) gets:
skills/core/references/review/<agent>.mdskills/core/references/review/README.md exactly. The required
fields per finding are severity (major | minor), file, line
(or range), category, summary, evidence (a 1–5 line verbatim
code snippet — the synthesis pass renders this directly under each
finding), why_it_matters (one or two sentences specific to this
analysis), suggested_fix, and confidence (high | medium | low). Anything the agent considered and decided not to flag goes
in a separate not_flagged list with file, line, considered,
and reason — synthesis uses this to dedupe across agents.
Sub-agents should also return a decisions list of consequential
analytical choices in their scope (per the README contract); these
roll up into the report's "Key decisions in this analysis" section.The six sub-agents and their checklist files:
| # | Sub-agent | Checklist file | Focus |
|---|---|---|---|
| 1 | stats-causal | skills/core/references/review/stats-causal.md | Test selection, multiple comparisons, p-hacking, causal claims, effect-size reporting, study design |
| 2 | data-pipeline-leakage | skills/core/references/review/data-pipeline-leakage.md | Train/test contamination, time-series look-ahead, joins, missing values, dedup, units, batch effects, ML evaluation |
| 3 | bioinformatics | skills/core/references/review/bioinformatics.md | Gene names, reference genome, scRNA-seq pipeline, RNA-seq DE, double dipping, pseudoreplication |
| 4 | llm-failure-modes | skills/core/references/review/llm-failure-modes.md | Try/except antipatterns, hallucinated APIs, default-parameter smuggling, sycophancy/forking-paths drift, fabricated tool output |
| 5 | doc-schema-fidelity | skills/core/references/review/doc-schema-fidelity.md | Docstrings/specs/schemas/READMEs vs reality, definition drift, comment freshness, undocumented behavior |
| 6 | code-quality | skills/core/references/review/code-quality.md | Duplicate sources of truth, boolean flag pairs, misleading names, premature abstractions, secrets, import hacks, BC cruft, file organization, logging consistency |
The bioinformatics agent should self-skip with a one-line "no biology here" if the diff doesn't touch genomic data — don't spawn it if the project clearly isn't biology, but err on the side of running it for any project that has ever touched a sequence file, gene table, or single-cell object.
Read skills/core/references/review/synthesis.md and follow it. The short
form:
Write the report to a file under .living/outputs/reviews/ named
YYYY-MM-DD-<scope-slug>.md (e.g., 2026-04-24-pr-127.md,
2026-04-24-working-tree.md). The structure:
# Review — <scope> — YYYY-MM-DD
**Scope**: <PR / commit range / working tree / pasted diff>
**Files reviewed**: N
**Sub-agents run**: 6 (or list which were skipped and why)
## Key decisions in this analysis
- **<Decision>** — <one-line description>. <see F2 if linked, or no
link if informational>
- ...
## Questions for the analyst
Three to five open-ended questions whose answers would change which
findings matter most (analysis goal, replicate type, downstream use,
acceptable false-positive rate, registration status, deployment
context). The diff can't answer these — only the analyst can.
- <Question>
- ...
## Findings
### Statistics & causal inference
#### Major
##### F1. <short description>
`<file>:<line>`
```python
<1-5 lines verbatim>
Why it matters here: ... Fix: ...
... (Major / Minor under each category)
...
...
...
...
Print the path of the written file at the end. The chat reply should
surface the count of Major findings per category and the "Key decisions"
list so the user sees the shape of the report without opening the file.
### Step 5 — Offer tripwires (opt-in behavioral follow-up)
Read `skills/core/references/review/deep-tripwires.md` and follow it.
The short form (two terms before the steps: an **audit** is the
*default artifact* — a written document that walks the analysis, names
each tripwire that would apply, names the perturbation and expected
outcome, links each to a static finding, and surfaces what's missing
if you wanted to execute. No code runs. **Instrumentation** is the
four observability hooks the pipeline needs for tripwires to actually
execute: checkpoint emission, `--stop-after`, `analysis_labels.yml`,
and a drop ledger. Definitions and the full mode table are in
`deep-tripwires.md` "Three operating modes"):
1. Scan the static findings for `suggested_tripwire` tags emitted by
the sub-agents (see the output contract in `review/README.md`).
2. Detect whether the repo has any of the four instrumentation hooks
(checkpoint logging anywhere in the analysis scripts, a
`--stop-after` / `STOP_AFTER_CHECKPOINT` mechanism, an
`analysis_labels.yml` or equivalent at the analysis root, a drop
ledger of any name/shape, an existing tripwire runner under
`tools/`). Names and formats are illustrative — accept what the
project uses. This picks the default mode: `audit` if zero or
only some hooks are present, `scaffold` if the user asks to add
them, `run` if all four are present.
3. Use `AskUserQuestion` to offer a menu. **Lead every user-facing
surface with plain English; keep internal IDs in parentheses or
in artifacts.** See `deep-tripwires.md` "Talking to the user" for
the gloss table. Examples of well-phrased options: "Describe what
we'd test (no code runs)" rather than "audit mode"; "Propose
project-specific patches that would let us actually run the
tests" rather than "scaffold mode". Always include "skip" and the
audit option. Include "scaffold" if the hooks are partial or
absent. Include "run selected tripwires" if all four hooks are
present. The list of named tripwires on the menu is shaped by
which `suggested_tripwire` tags appeared in the static findings,
plus the starter four (missing counts, missing metadata sample,
label permutation, toy contrast direction) which are always
available. When naming individual tripwires in the menu, use the
plain-English glosses ("the report-numbers-still-match check")
not the internal IDs.
4. Execute the chosen mode per `deep-tripwires.md`. Scaffold-mode
and run-mode are **agent-improvised per analysis** — the skill
ships principles + examples, and the agent reads the project's
language / layout / existing helpers and adapts. Do not stamp
templates blindly; if the project already has a logging or
filter helper, build on it. Output paths:
- audit → `.living/outputs/reviews/YYYY-MM-DD-<scope-slug>-tripwires.md`
- scaffold → `…-tripwires-scaffold.md` (proposal, not auto-applied)
- run → `…-tripwires-run.md` (pass/fail per tripwire)
Print the written path(s) at the end.
**Reporting rule (applies to every mode).** When summarizing the
result back to the user in chat, lead with one English sentence
("Three places in your docs cite p-values that don't match the
source CSV"), then show the table / artifact as evidence. Don't
open with `Instrumentation detected: 0/4` or raw checkmark/X
output. The user wants to know what failed and why before they
need the test IDs.
Skip Step 5 entirely if the diff is a pure refactor with no data-flow
change AND no documentation change, or if the user invoked the skill
with a flag asking to suppress tripwires (e.g., `/mycelium:review
--no-tripwires`).
**Documentation-only diffs are NOT a skip case.** When report text or
analysis docs change without the pipeline changing, that's exactly
when the report-values-freshness tripwire is most useful — it catches
"someone edited the report number without regenerating the source
CSV" silently. For doc-only diffs, restrict the offered menu to the
freshness and known-answer categories (the fault-injection and
metamorphic tripwires don't apply when the pipeline wasn't touched)
rather than skipping Step 5.
### Step 6 — Post-action hook
Treat a review as a significant action: log a short entry to
`.living/learnings.md` if the review surfaced a recurring pattern (e.g., "the
analyze script is using `t-test` on count data — this is the third time this
pattern has come up, consider a convention"). Otherwise no logging is needed.
## Grill mode protocol
Triggered by the user invoking with `grill` in their request, e.g.,
`/mycelium:review grill` or "grill me on this analysis".
Read `skills/core/references/review/grill-mode.md` and follow it. The short
form:
1. Identify the consequential decisions in the analysis. These are anything
that, if changed, would meaningfully change the result: choice of
statistical test, multiple-comparison correction, sample-filtering
thresholds, normalization steps, clustering parameters, train/test split
strategy, choice of estimand, choice of reference (genome, baseline,
comparator), exclusion criteria, etc. Read `.living/decisions.md` and the
analysis script(s) to extract these.
2. **One question per turn**, conversationally phrased. Do not dump a
numbered list. The point is to feel like a thoughtful colleague over coffee,
not a checklist. Acknowledge the answer, then move to the next.
3. Track answers internally. After ~5–8 exchanges (or when the user signals
they want to wrap up), produce a short summary: which decisions had clear
justifications, which the user wasn't sure about, which deserve a
follow-up. Offer to file the unsure ones as `todo/` items.
4. If the user gives a justification that itself reveals a problem (e.g.,
"we used t-test because that's what the tutorial used"), don't lecture —
ask the next question that exposes the implication ("got it — and the
data here is counts, right? do we expect the t-test assumptions to hold
on counts?"). Trust theory of mind.
The aim of grill mode is to help the user converge on what they actually
believe about each choice without exhausting them. If at any point they say
"enough" or "let's stop," stop and write the partial summary.
## What this skill is NOT for
- Running the analysis (`/mycelium:analyze`)
- Generating a report or paper section (`/mycelium:report`)
- Repo-wide refactoring without an analysis context — use a generic code
reviewer for that
- Stylistic / linting work — let the linter handle it
- Validating raw data ingestion — `/mycelium:ingest` covers that with its
own checks
## Cross-references inside this skill
- `skills/core/references/review/synthesis.md` — synthesis & severity
calibration
- `skills/core/references/review/grill-mode.md` — grill protocol detail
- `skills/core/references/review/deep-tripwires.md` — behavioral
follow-up (Step 5)
- Per-agent checklists under `skills/core/references/review/` (six files;
loaded by sub-agents, not by you)
- The mycelium core `Post-Action Hook Protocol` from `commands/core.md`
governs what to log to `.living/` when a review surfaces a recurring
pattern