Slash Command

/review

Reviews code and analysis changes in data-science, ML, or bioinformatics projects with six specialized parallel agents, producing a prioritized Critical/Important/Minor/Nit report. Also supports grill and tripwires modes.

Python

SQL

data-engineering

ai-ml

npx claudepluginhub arjunrajlaboratory/mycelium --plugin mycelium

Popularity

Stars

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/mycelium:review

Model invocable

No pre-commands

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

# Mycelium — Review

Review code, analysis, or documentation changes for the kinds of mistakes that
matter in scientific data work. Two modes plus one opt-in follow-up:

- **Default**: dispatch six specialized sub-agents in parallel, then synthesize
  a single prioritized report.
- **`grill`**: walk the user through every consequential analytical decision
  conversationally, one question at a time.
- **Tripwires (opt-in, offered after default)**: perturb inputs and verify
  the pipeline actually fails at named scientific boundaries. Three
  categories — fault-injection, metamorphic, and kno...

Command Content

397 lines · ~5.1k tokens(exceeds 5k compaction limit)

Other plugins with /review

/review

Runs a 12-angle code review with skeptical triage for P0/P1 findings, or renders a requirement-to-test traceability tree. Also supports --deep flag for all-angle triage.

sonnet5 tools

great_cto

/sc-adversarial-review

101

Runs multi-model adversarial review using Codex, Gemini, and Claude on files, directories, staged changes, branches, or PRs for diverse critiques.

7 tools

sc-refactor

/ask-all

Dispatches a question to GPT, Gemini, Grok, and configured OpenRouter models in parallel, then synthesizes independent verdicts and flags disagreements.

6 tools

deliberation

/consensus

Runs multi-agent consensus review on code changes, docs/specs, or launch decisions using git PR/diff context and parallel agents for fast pragmatic outcomes.

claude-commands

/review

Reviews documents or free-text topics with multiple AI models independently, then synthesizes and converges on a unified review.

consensus

/cc-council

Runs multi-agent council review on code targets (files, dirs, PRs, git diffs, architecture) with protocols, scoped scoring, voting, and orchestration. Produces findings, weighted scores, and optional auto-fixes.

claude-code-expert

Stats

LanguagePython

Stars19

MaintenanceExcellent

Last CommitJun 8, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Mycelium — Review

Review code, analysis, or documentation changes for the kinds of mistakes that matter in scientific data work. Two modes plus one opt-in follow-up:

Default: dispatch six specialized sub-agents in parallel, then synthesize a single prioritized report.
grill: walk the user through every consequential analytical decision conversationally, one question at a time.
Tripwires (opt-in, offered after default): perturb inputs and verify the pipeline actually fails at named scientific boundaries. Three categories — fault-injection, metamorphic, and known-answer tests. Three modes — audit (default: write a behavioral spec describing which tripwires would apply, no code runs), scaffold (propose project-specific patches that add the instrumentation hooks), run (execute against existing instrumentation). Scaffold and run are agent-improvised per analysis rather than template-stamped. See skills/core/references/review/deep-tripwires.md.

The point of progressive disclosure is that the main skill stays under your context budget and only loads the per-domain checklists into the sub-agents that need them. Don't read the per-agent checklists yourself unless you are debugging the skill — pass the file paths to the sub-agents.

Why this skill exists

LLM coding agents tend to make the same kinds of mistakes when generating analysis code, and many are silent: code that runs cleanly and produces a confidently wrong number. This skill encodes the catalog of those failure modes, partitions them across six specialists so each can hold a tight prompt, and aggressively prunes false positives in synthesis. Severity calibration matters more than raw recall — a list of fifty nits buries the one critical finding.

What gets reviewed

The skill works on any of the following review scopes. If the user did not specify, ask once which scope they want.

Scope	How to obtain the diff
PR (`/mycelium:review <PR#>` or PR URL)	`gh pr diff <num>` plus `gh pr view <num> --json files,body,title` for context
Commit / commit range (`/mycelium:review <sha>` or `<sha1>..<sha2>`)	`git show <sha>` or `git diff <sha1>..<sha2>`
Working tree (default if no scope given on a dirty repo)	`git diff HEAD` plus `git status`
Branch vs main	`git diff main...HEAD`
Pasted diff (user dumped a diff into chat)	Use the diff verbatim
Whole analysis directory (`/mycelium:review analysis/<name>`)	Read the directory; treat all files as "added"

For all scopes: also collect the relevant context — the analysis's UPPER_SNAKE_CASE.md documentation file, specification.md if present, .living/decisions.md, and any installed convention packs in .living/conventions/ — and pass these into each sub-agent so it can ground findings in the project's stated intent rather than inferring from code alone.

Default mode protocol

Step 1 — Establish review scope

Determine the scope from the user's invocation. If unclear, ask one short question ("review the working tree, last commit, or a specific PR?") rather than guessing. For PRs, fetch with gh. For commits/working tree, use git.

Capture the diff, the list of touched files, and any relevant context files into local variables (or a scratch file in /tmp/) so the sub-agents share a common substrate.

Step 2 — Dispatch six sub-agents in parallel (or run their checklists in-line)

If the Agent tool is available to you, send a single message with six concurrent Agent tool calls (subagent_type general-purpose is fine; for very large diffs prefer Explore). If the Agent tool is not available (common when this skill runs from inside a sub-agent context that doesn't expose it), execute each sub-agent's checklist in-line: read each checklist file in turn, apply it to the diff, collect findings, then proceed to synthesis. Either way the output contract and the synthesis steps are the same.

Each sub-agent (or in-line pass) gets:

The diff (or path to it if large)
The list of context files to read
The path to its checklist reference: skills/core/references/review/<agent>.md
A clear instruction to follow the output contract in skills/core/references/review/README.md exactly. The required fields per finding are severity (major | minor), file, line (or range), category, summary, evidence (a 1–5 line verbatim code snippet — the synthesis pass renders this directly under each finding), why_it_matters (one or two sentences specific to this analysis), suggested_fix, and confidence (high | medium | low). Anything the agent considered and decided not to flag goes in a separate not_flagged list with file, line, considered, and reason — synthesis uses this to dedupe across agents. Sub-agents should also return a decisions list of consequential analytical choices in their scope (per the README contract); these roll up into the report's "Key decisions in this analysis" section.
The standing instruction to err on the side of NOT flagging when the evidence is weak — false positives are more costly than false negatives at this stage because synthesis can ask follow-up questions but cannot reconstruct missing certainty.

The six sub-agents and their checklist files:

#	Sub-agent	Checklist file	Focus
1	stats-causal	`skills/core/references/review/stats-causal.md`	Test selection, multiple comparisons, p-hacking, causal claims, effect-size reporting, study design
2	data-pipeline-leakage	`skills/core/references/review/data-pipeline-leakage.md`	Train/test contamination, time-series look-ahead, joins, missing values, dedup, units, batch effects, ML evaluation
3	bioinformatics	`skills/core/references/review/bioinformatics.md`	Gene names, reference genome, scRNA-seq pipeline, RNA-seq DE, double dipping, pseudoreplication
4	llm-failure-modes	`skills/core/references/review/llm-failure-modes.md`	Try/except antipatterns, hallucinated APIs, default-parameter smuggling, sycophancy/forking-paths drift, fabricated tool output
5	doc-schema-fidelity	`skills/core/references/review/doc-schema-fidelity.md`	Docstrings/specs/schemas/READMEs vs reality, definition drift, comment freshness, undocumented behavior
6	code-quality	`skills/core/references/review/code-quality.md`	Duplicate sources of truth, boolean flag pairs, misleading names, premature abstractions, secrets, import hacks, BC cruft, file organization, logging consistency

The bioinformatics agent should self-skip with a one-line "no biology here" if the diff doesn't touch genomic data — don't spawn it if the project clearly isn't biology, but err on the side of running it for any project that has ever touched a sequence file, gene table, or single-cell object.

Step 3 — Synthesize

Read skills/core/references/review/synthesis.md and follow it. The short form:

Aggregate findings across sub-agents and dedupe. Two sub-agents flagging the same line with different framings is the common case — keep the more actionable framing and add a one-line "see also" if the other framing adds value.
Recalibrate severity to two levels: Major (fix this — result invalid, misleading, or insecure) and Minor (consider improving — doesn't change the conclusion). Drop pure stylistic nits a linter would catch.
Identify the key analytical decisions in the work — the consequential choices that, if changed, would meaningfully change the result (estimand, sample-filtering thresholds, normalization, model / test choice, multiple-comparison handling, train/test strategy, reference choice, etc.). List them whether or not each has an associated finding.
Draft 3–5 questions for the analyst — meta-level questions whose answers change which findings matter most (e.g., "is this for wet-lab validation or paper figure?", "are donors technical or biological replicates?", "is the deployment seeing the same customers or new ones?"). The diff can't answer these.
For each finding, include: file:line, a 1–5 line code snippet verbatim from the source so the user sees the issue without opening the file, why it matters for this analysis specifically, and a one-sentence fix.
Group findings by category (the six sub-agent areas), and within each category list Major findings first, then Minor.
Number findings F1, F2, F3 ... globally so the Key decisions and Questions sections can link to them where useful.

Step 4 — Render the report

Write the report to a file under .living/outputs/reviews/ named YYYY-MM-DD-<scope-slug>.md (e.g., 2026-04-24-pr-127.md, 2026-04-24-working-tree.md). The structure:

# Review — <scope> — YYYY-MM-DD

**Scope**: <PR / commit range / working tree / pasted diff>
**Files reviewed**: N
**Sub-agents run**: 6 (or list which were skipped and why)

## Key decisions in this analysis

- **<Decision>** — <one-line description>. <see F2 if linked, or no
  link if informational>
- ...

## Questions for the analyst

Three to five open-ended questions whose answers would change which
findings matter most (analysis goal, replicate type, downstream use,
acceptable false-positive rate, registration status, deployment
context). The diff can't answer these — only the analyst can.

- <Question>
- ...

## Findings

### Statistics & causal inference
#### Major
##### F1. <short description>
`<file>:<line>`
```python
<1-5 lines verbatim>

Why it matters here: ... Fix: ...

Minor

F2. ...

Data pipeline & leakage

... (Major / Minor under each category)

Bioinformatics

...

LLM coding antipatterns

...

Documentation & schema fidelity

...

Code quality

...

What was checked but is fine

Statistics & causal inference:
...

Notes

Cross-cutting observations: compound findings sharing one remediation path, "did this code ever run" questions, etc.


Print the path of the written file at the end. The chat reply should
surface the count of Major findings per category and the "Key decisions"
list so the user sees the shape of the report without opening the file.

### Step 5 — Offer tripwires (opt-in behavioral follow-up)

Read `skills/core/references/review/deep-tripwires.md` and follow it.
The short form (two terms before the steps: an **audit** is the
*default artifact* — a written document that walks the analysis, names
each tripwire that would apply, names the perturbation and expected
outcome, links each to a static finding, and surfaces what's missing
if you wanted to execute. No code runs. **Instrumentation** is the
four observability hooks the pipeline needs for tripwires to actually
execute: checkpoint emission, `--stop-after`, `analysis_labels.yml`,
and a drop ledger. Definitions and the full mode table are in
`deep-tripwires.md` "Three operating modes"):

1. Scan the static findings for `suggested_tripwire` tags emitted by
   the sub-agents (see the output contract in `review/README.md`).
2. Detect whether the repo has any of the four instrumentation hooks
   (checkpoint logging anywhere in the analysis scripts, a
   `--stop-after` / `STOP_AFTER_CHECKPOINT` mechanism, an
   `analysis_labels.yml` or equivalent at the analysis root, a drop
   ledger of any name/shape, an existing tripwire runner under
   `tools/`). Names and formats are illustrative — accept what the
   project uses. This picks the default mode: `audit` if zero or
   only some hooks are present, `scaffold` if the user asks to add
   them, `run` if all four are present.
3. Use `AskUserQuestion` to offer a menu. **Lead every user-facing
   surface with plain English; keep internal IDs in parentheses or
   in artifacts.** See `deep-tripwires.md` "Talking to the user" for
   the gloss table. Examples of well-phrased options: "Describe what
   we'd test (no code runs)" rather than "audit mode"; "Propose
   project-specific patches that would let us actually run the
   tests" rather than "scaffold mode". Always include "skip" and the
   audit option. Include "scaffold" if the hooks are partial or
   absent. Include "run selected tripwires" if all four hooks are
   present. The list of named tripwires on the menu is shaped by
   which `suggested_tripwire` tags appeared in the static findings,
   plus the starter four (missing counts, missing metadata sample,
   label permutation, toy contrast direction) which are always
   available. When naming individual tripwires in the menu, use the
   plain-English glosses ("the report-numbers-still-match check")
   not the internal IDs.
4. Execute the chosen mode per `deep-tripwires.md`. Scaffold-mode
   and run-mode are **agent-improvised per analysis** — the skill
   ships principles + examples, and the agent reads the project's
   language / layout / existing helpers and adapts. Do not stamp
   templates blindly; if the project already has a logging or
   filter helper, build on it. Output paths:
   - audit → `.living/outputs/reviews/YYYY-MM-DD-<scope-slug>-tripwires.md`
   - scaffold → `…-tripwires-scaffold.md` (proposal, not auto-applied)
   - run → `…-tripwires-run.md` (pass/fail per tripwire)

Print the written path(s) at the end.

**Reporting rule (applies to every mode).** When summarizing the
result back to the user in chat, lead with one English sentence
("Three places in your docs cite p-values that don't match the
source CSV"), then show the table / artifact as evidence. Don't
open with `Instrumentation detected: 0/4` or raw checkmark/X
output. The user wants to know what failed and why before they
need the test IDs.

Skip Step 5 entirely if the diff is a pure refactor with no data-flow
change AND no documentation change, or if the user invoked the skill
with a flag asking to suppress tripwires (e.g., `/mycelium:review
--no-tripwires`).

**Documentation-only diffs are NOT a skip case.** When report text or
analysis docs change without the pipeline changing, that's exactly
when the report-values-freshness tripwire is most useful — it catches
"someone edited the report number without regenerating the source
CSV" silently. For doc-only diffs, restrict the offered menu to the
freshness and known-answer categories (the fault-injection and
metamorphic tripwires don't apply when the pipeline wasn't touched)
rather than skipping Step 5.

### Step 6 — Post-action hook

Treat a review as a significant action: log a short entry to
`.living/learnings.md` if the review surfaced a recurring pattern (e.g., "the
analyze script is using `t-test` on count data — this is the third time this
pattern has come up, consider a convention"). Otherwise no logging is needed.

## Grill mode protocol

Triggered by the user invoking with `grill` in their request, e.g.,
`/mycelium:review grill` or "grill me on this analysis".

Read `skills/core/references/review/grill-mode.md` and follow it. The short
form:

1. Identify the consequential decisions in the analysis. These are anything
   that, if changed, would meaningfully change the result: choice of
   statistical test, multiple-comparison correction, sample-filtering
   thresholds, normalization steps, clustering parameters, train/test split
   strategy, choice of estimand, choice of reference (genome, baseline,
   comparator), exclusion criteria, etc. Read `.living/decisions.md` and the
   analysis script(s) to extract these.
2. **One question per turn**, conversationally phrased. Do not dump a
   numbered list. The point is to feel like a thoughtful colleague over coffee,
   not a checklist. Acknowledge the answer, then move to the next.
3. Track answers internally. After ~5–8 exchanges (or when the user signals
   they want to wrap up), produce a short summary: which decisions had clear
   justifications, which the user wasn't sure about, which deserve a
   follow-up. Offer to file the unsure ones as `todo/` items.
4. If the user gives a justification that itself reveals a problem (e.g.,
   "we used t-test because that's what the tutorial used"), don't lecture —
   ask the next question that exposes the implication ("got it — and the
   data here is counts, right? do we expect the t-test assumptions to hold
   on counts?"). Trust theory of mind.

The aim of grill mode is to help the user converge on what they actually
believe about each choice without exhausting them. If at any point they say
"enough" or "let's stop," stop and write the partial summary.

## What this skill is NOT for

- Running the analysis (`/mycelium:analyze`)
- Generating a report or paper section (`/mycelium:report`)
- Repo-wide refactoring without an analysis context — use a generic code
  reviewer for that
- Stylistic / linting work — let the linter handle it
- Validating raw data ingestion — `/mycelium:ingest` covers that with its
  own checks

## Cross-references inside this skill

- `skills/core/references/review/synthesis.md` — synthesis & severity
  calibration
- `skills/core/references/review/grill-mode.md` — grill protocol detail
- `skills/core/references/review/deep-tripwires.md` — behavioral
  follow-up (Step 5)
- Per-agent checklists under `skills/core/references/review/` (six files;
  loaded by sub-agents, not by you)
- The mycelium core `Post-Action Hook Protocol` from `commands/core.md`
  governs what to log to `.living/` when a review surfaces a recurring
  pattern

/review

Popularity

Invocation

Context Preview

Command Content

Other plugins with /review

Help us improve

Help us improve

Find plugins for your project

/review

Popularity

Invocation

Context Preview

Command Content

Mycelium — Review

Why this skill exists

What gets reviewed

Default mode protocol

Step 1 — Establish review scope

Step 2 — Dispatch six sub-agents in parallel (or run their checklists in-line)

Step 3 — Synthesize

Step 4 — Render the report

Minor

F2. ...

Data pipeline & leakage

Bioinformatics

LLM coding antipatterns

Documentation & schema fidelity

Code quality

What was checked but is fine

Notes

Other plugins with /review

Help us improve

Mycelium — Review

Why this skill exists

What gets reviewed

Default mode protocol

Step 1 — Establish review scope

Step 2 — Dispatch six sub-agents in parallel (or run their checklists in-line)

Step 3 — Synthesize

Step 4 — Render the report

Minor

F2. ...

Data pipeline & leakage

Bioinformatics

LLM coding antipatterns

Documentation & schema fidelity

Code quality

What was checked but is fine

Notes