Help us improve
Share bugs, ideas, or general feedback.
From academic-research
Runs iterative academic manuscript revision via parallel critics (evidence, method, argument, expert). Invoke with /critic-loop for autonomous editing cycles.
npx claudepluginhub mronkko/claude-academic-research --plugin academic-researchHow this skill is triggered — by the user, by Claude, or both
Slash command
/academic-research:critic-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Invocation:** `/critic-loop <document-path> [--critics evidence,method,argument,expert] [--max-iter 4] [--no-test] [--render-cmd "quarto render {doc} --to gfm"]`
Revises academic drafts via a parallel critic loop (evidence, method, argument, expert) that iterates until no major issues remain.
Generates structured peer review reports for academic manuscripts, evaluating novelty, methodological rigor, clarity, impact, and ethics. Use when critiquing papers or providing reviewer feedback.
Simulates peer reviews for academic manuscripts by identifying perspectives from Zotero sources, building reviewer personas, and generating focused feedback on theory, methods, and findings.
Share bugs, ideas, or general feedback.
Invocation: /critic-loop <document-path> [--critics evidence,method,argument,expert] [--max-iter 4] [--no-test] [--render-cmd "quarto render {doc} --to gfm"]
Before any step below, verify the plugin has been configured:
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/setup/check_configured.py"
If the result is NOT CONFIGURED, stop immediately and tell the user:
The academic-research plugin has not been set up on this machine yet. Run
/setupfirst — the evidence critic depends on MCP citation lookups (Zotero, OpenAlex, Semantic Scholar), which require MCP servers that/setupregisters.
Do not launch critics or proceed with the loop. /setup is the
required first step.
If the result is configured, proceed.
Execute the editing loop documented in the manuscript-revision skill.
This skill is the procedure; that skill is the doctrine. Read it
first if you have not recently.
fact-checkfact-check is the standalone citation / claim audit. The evidence
critic inside this loop covers the same ground during iterative
revision.
Mutual-exclusion rule. Do NOT spawn a /fact-check run while this
loop is active, and skip it immediately after a loop that finished
with no unresolved evidence-critic MAJORs — the work is duplicate and
burns MCP / Zotero quota twice. Use fact-check instead of this loop
when the user wants a focused citation-only pre-submission report
without the method / argument / expert perspectives.
Required:
<document-path> — path to the authoring source (e.g.
manuscript/motivation_slr.qmd). Relative paths are resolved against
the project root.Optional:
--critics <csv> — comma-separated list of critic perspectives.
Defaults to evidence,method,argument,expert. Unknown names are
treated as custom perspectives; the main agent must write a focused
prompt for each (describe the lens in 1–2 sentences when launching
the Agent).--max-iter <N> — iteration cap. Default 4.--no-test — skip the test-suite gate. Only use when explicitly
asked (e.g., when the repo has no test suite yet). Red flag otherwise.--render-cmd <cmd> — shell template for the render step. {doc}
is substituted with the document path. Default:
quarto render {doc} --to gfm which produces a .md sibling of the
.qmd. For R Markdown, pass
Rscript -e "rmarkdown::render('{doc}', output_format='md_document')".--rendered-path <path> — override the rendered output path if the
render command writes somewhere non-obvious.Rendering is a first-class step of the loop, not a side-effect of "let's see the output". Three reasons the critics read the rendered file, not the source:
s['key'] lookup
or an unresolved @citekey crashes the renderer louder than any
grep. test_empirical_integrity.py catches the static version of
this; render catches the runtime version (stats module throws on a
downstream computation, pipeline output file is stale). Render
failure is a specific-class-of-integrity signal — treat it as a
test failure and route the fix through Step 1, not around it.{python} s['screen.n_included']
placeholders. The evidence critic's "does this claim match the
source?" is a different question when the claim is still an
expression.rendered.md; the
source file is too volatile.Document-format defaults assume Quarto → gfm, with the snapshot landing
at critic-reviews/iter-{N}/rendered.md. For other formats:
.md — no render needed; snapshot is a file copy..ipynb — jupyter nbconvert --to markdown {doc} (add
--execute if cells are runnable)..tex — latexmk -pdf {doc} yields a PDF, not markdown. Either
add a pandoc post-step back to markdown, or set
--rendered-path to the PDF and tell the critics to accept PDF..Rmd → html_document — default advice is md_document
because critics prefer markdown; --to html works, but if you
override the render command that way you must also pass
--rendered-path so the snapshot step still finds the output.Whenever you override --render-cmd to produce something other than a
sibling .md, pass --rendered-path to match. They vary together.
Create the iteration working directory up front (project-local; portable):
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/setup/ensure_dir.py" critic-reviews
Then:
iter = 1
while iter <= MAX_ITER:
── Step 1: test gate ──────────────────────────────────────────────
Unless --no-test: run every `test_*.py` script the project ships in
`scripts/`, in the order the project's CLAUDE.md names (typical order:
`test_citations.py`, `test_empirical_integrity.py`,
`test_systematic_review.py`). Common invocations: `pytest`,
`npm test`, or the bare `python3 scripts/test_<name>.py` sequence.
If any file fails:
- diagnose the failure
- fix the underlying cause (do not suppress or skip tests)
- re-run
- only proceed to Step 2 when all tests pass.
Each test file maps to a skill — a failure in `test_citations.py` is
a `grounded-citations` / `fact-check` regression; a failure in
`test_empirical_integrity.py` is an `empirical-integrity` regression;
a failure in `test_systematic_review.py` is a `systematic-review`
pipeline regression. Read the relevant skill if the fix is not
obvious.
If tests cannot be made to pass without the user's input, stop the loop and
surface the failure to the user. Do NOT call critics on a broken build.
── Step 2: render ──────────────────────────────────────────────────
Run the render command. Verify the rendered output file exists and is
non-empty. If render fails, treat as a test failure (go to Step 1 fix loop).
Snapshot: copy the rendered .md to critic-reviews/iter-{N}/rendered.md.
── Step 3: launch critics IN PARALLEL ─────────────────────────────
Single message, multiple Agent tool calls — one per critic. Each Agent call
uses subagent_type="general-purpose" (model="sonnet" is a reasonable default)
and receives the generic prompt preamble below plus the perspective prompt.
Save each Agent's returned text to critic-reviews/iter-{N}/critic-<name>.md.
── Step 4: adjudicate ─────────────────────────────────────────────
For every numbered item across all critics, pick one disposition:
applied — revision will be made this iteration
deferred — log reason (needs user input / needs new data / out of scope)
rejected — log disagreement with written reason
On inter-critic disagreement (e.g. expert wants a theory added, argument
wants scope tightened): write a brief adjudication note and decide.
Extra scrutiny for expert-critic MAJOR items: the expert critic does not
back its claims with a source lookup. Before applying a "missing seminal
work" flag, verify via MCP (OpenAlex / Semantic Scholar / Zotero) that the
named work exists and says what the critic claimed. If not, reject the item
and note that the expert critic's prompt may need tuning.
Write everything to critic-reviews/iter-{N}/decisions.md (format below).
── Step 5: apply edits ────────────────────────────────────────────
Apply every "applied" item to the authoring source (not the rendered
markdown). Use Edit/Write on the .qmd / .Rmd / .md source file.
── Step 6: termination check ──────────────────────────────────────
Exit the loop if ANY of these holds:
(a) All critic verdicts are SHIP, OR all are SHIP-WITH-REVISIONS with zero
[MAJOR] items remaining across the whole set (i.e. every MAJOR item
was applied this iteration);
(b) iter == MAX_ITER;
(c) Loop-back detected: the same MAJOR item was flagged in iter N-1 by
the same critic, marked "applied", and flagged again by that critic
in iter N.
On loop-back (case c): Step 5's "applied" items for the CURRENT iter have
already been written to disk. Do NOT roll them back — they may be partial
progress the author wants to keep. Instead, exit the loop immediately,
write the final report flagging the loop-back, and surface to the user
as a human-adjudication request (the critic and the main agent disagree
on whether the previous fix actually resolved the item). Do NOT proceed
to iter N+1.
iter += 1
Append the perspective-specific prompt below this preamble:
You are an <perspective> critic reviewing the manuscript at
critic-reviews/iter-{N}/rendered.md.
Research context: <one-paragraph summary of the project's research question,
scope, and data — pulled from the project's CLAUDE.md>.
Scope boundaries — STRICT:
Your domain is <perspective>. The other critics (evidence, method, argument,
expert — whichever are active in this run) cover their own domains in
parallel. Do NOT duplicate their work. If you see an issue that clearly
belongs to another critic's scope, skip it — they will catch it.
The per-perspective prompt below defines your exact scope.
Anti-sycophancy — STRICT:
Each iteration you see a revised manuscript. Evaluate the current iteration
on its own merits. Do NOT soften your assessment because the author has
"clearly been working hard" or because "progress has been made since the
last iteration". If a MAJOR issue remains after revision, flag it MAJOR
again. If a new MAJOR issue has been introduced by the revision, flag it.
The loop's purpose is to exit when no MAJOR issue remains, not to exit
because you are tired of flagging.
Your role: FLAG issues, do NOT rewrite. The author will adjudicate and apply.
<perspective-specific prompt — see Perspective prompts section>
Output format — strict, no prose outside this structure:
VERDICT: BLOCK | SHIP-WITH-REVISIONS | SHIP
ISSUES:
1. [MAJOR|MINOR|NIT] <section title or short quoted passage (~20 words)>
Issue: <one or two sentences>
Suggested revision: <concrete replacement prose or specific instruction>
2. ...
Severity rules:
MAJOR = factual error, direction reversal, fabricated citation,
misrepresentation, missing critical content, specific missing
seminal work, specific contradicted finding. Blocks publication.
MINOR = weak argument, thin evidence, unclear phrasing, "consider also".
NIT = style, word choice, optional rephrasing.
Verdict rules:
BLOCK = at least one MAJOR issue.
SHIP-WITH-REVISIONS = no MAJOR, but MINOR/NIT remain.
SHIP = no issues. Rare.
Return your report as the Agent result — the main agent will save it.
These are appended to the generic preamble. Keep them focused and non-overlapping.
Your scope: verify that every factual claim in the manuscript is honestly
supported by its source — either a cited paper or an authoritative pipeline-
output file. You do NOT evaluate method rigour, writing quality, or missing-
seminal-work judgments — those belong to method / argument / expert.
REQUIRED SUB-SKILL: verifying-citations. Read it before checking any
citation. It defines Stage 0 (resolve), Stage A (abstract), Stage B
(decide), Stage C (fulltext); the four-class classification (VERIFIED /
MINOR / MAJOR / UNVERIFIABLE); the always-escalate triggers (quoted
passages, specific statistics, method details, subgroup findings); the
per-citation (not per-mention) dispatch model; and the cross-mention
consistency check.
Group every mention by unique @citekey before reasoning about it. A
paper cited five times in the rendered manuscript is one citation
with five mentions, not five citations. Apply the staged rule once
per source (fetch each PDF or abstract at most once), classify each
mention independently, then run the cross-mention check across the
set.
Translate verifying-citations classifications into critic-loop's
[MAJOR|MINOR|NIT] output format:
- VERIFIED → no ISSUE entry needed.
- MINOR → [MINOR].
- MAJOR → [MAJOR].
- UNVERIFIABLE → [MAJOR] for this loop's purposes (the loop has to
resolve or defer; an author cannot ship a citation
they cannot verify). Suggested revision: "no PDF
attached for @key; resolve via enrich_pdfs.py or
replace the citation."
- Cross-mention finding → [MAJOR] (internal inconsistency in how the
manuscript uses one source is a publication-blocker).
For quantitative claims in prose and tables, check that numbers match the
authoritative results file (usually analysis/results/*.csv or *.json, per the
empirical-integrity skill). If a project-specific coded corpus exists (e.g.
analysis/results/coded_papers.csv for SLRs), spot-check prose synthesis
claims against coded entries.
Treat as [MAJOR]: prose number absent from or inconsistent with the
authoritative results file. (Citation-specific MAJOR criteria — missing
paper, direction reversal, fabricated quote, cross-mention inconsistency —
come from verifying-citations.)
Treat as [MINOR]: oversimplified finding from the results files; missing
caveat from a pipeline output.
Spot-check is acceptable when the unique-citation count exceeds 30;
prioritize citations whose mentions include quoted passages or specific
statistics (always-check), then high-stakes directional claims. Report
the sample size in the first ISSUES entry if spot-checking.
Your scope: methodological scrutiny — reviewer #2 energy. This lens applies
to both empirical and review papers. You do NOT verify citations against
sources (evidence critic's job), you do NOT evaluate prose quality (argument
critic's job), and you do NOT flag missing seminal works from your training
(expert critic's job).
Treat as MAJOR:
- causal language that overreaches the design (cross-sectional data should
say "associated with", not "predicts" or "causes");
- mediation claims without proper tests; moderator claims without
interactions;
- limitations section that omits the obvious threats (single coder, LLM
bias, language restriction, sample selection, overreliance on one data
source);
- missing disclosure of tools / models / prompt versions used;
- for reviews: missing disclosure of search strategy, screening reliability,
coder agreement, or LLM prompts used in the pipeline.
Treat as MINOR: imprecise method descriptions, missing effect sizes,
over-broad generalization, under-specified sample characteristics.
Your scope: the manuscript as academic prose AND as a coherent case for its
stated research question. You do NOT verify individual citations (evidence
critic), you do NOT evaluate methodological rigour (method critic), and you
do NOT flag missing seminal works (expert critic).
Check:
- Good academic writing conventions: topic sentences, paragraph unity,
clear logical flow between paragraphs, signposting between sections,
appropriate hedging, active voice where it strengthens clarity.
- Terms defined on first substantive use; jargon introduced with a brief
gloss; acronyms expanded on first occurrence.
- Consistent terminology (e.g. don't switch between "growth intentions",
"growth aspirations", and "growth motivation" for the same construct
without explaining the distinction).
- SYNTHESIS over enumeration: for review papers, the text should analyze
*across* cited studies, not merely march through them one at a time.
Long stretches of "Smith (2019) found X. Jones (2020) found Y. Kim (2021)
found Z." are a MAJOR flag — replace with thematic synthesis that names
patterns, tensions, or cumulative findings and cites multiple papers per
claim.
- Scope coherence: does the manuscript address its stated research question
consistently from Introduction through Discussion? Does Method scope
match Introduction scope? Does Findings deliver on Introduction's
promises? Does Discussion's contribution claim match what Findings
demonstrated?
- Venue fit: framing and structure appropriate to the target journal's
conventions.
Treat as MAJOR: single-article-description prose where synthesis is required;
Introduction promising a question Findings does not deliver; core term used
without definition; scope mismatch between sections; large-scale logical
disorder.
Treat as MINOR: weak topic sentences, paragraph length problems, inconsistent
hedging, structural imbalance.
Treat as NIT: word-choice issues, minor repetition, awkward phrasing.
Your scope: evaluate the manuscript the way a senior reviewer in the target
field would — using your own domain training, NOT by re-reading the cited
papers. You do NOT verify citations the author has made (evidence critic), you
do NOT scrutinise method sections line-by-line (method critic), and you do
NOT critique prose quality (argument critic). Your job is the expert-reader
gut check: what is missing from this manuscript that a seasoned reviewer in
the field would expect to see?
Check for:
- Missing seminal works or foundational theories that any competent
reviewer would expect to see. Name specific authors and works.
- Dated theoretical framings — does the review reflect the current state
of the field or a textbook version from ten years ago? Is there a major
recent development the manuscript misses?
- Contradictions with well-known findings in the field. If the manuscript
claims X but a well-established meta-analysis or stream of work says
otherwise, flag it and name the source.
- Plausibility of claimed "research gap" — does the gap actually exist, or
has it been addressed elsewhere in the literature?
- Interpretive fit — are constructs defined/grouped the way experts in the
field actually use them? Are well-known distinctions respected?
- Omissions of prominent scholars whose work on this topic is central.
Severity discipline is critical for this perspective:
- MAJOR ONLY when you can name a SPECIFIC missing seminal work, a SPECIFIC
contradicted finding, or a SPECIFIC dated framing. "I feel something is
missing" is NOT MAJOR.
- MINOR for "consider also" suggestions from your training — useful
additions but not blocking. If the flag is speculative, mark it MINOR
and say so.
- NIT for expert-stylistic preferences.
Do NOT hallucinate citations: if you name a work, you should be genuinely
confident it exists and is relevant. The main agent will verify expert-critic
MAJOR items against OpenAlex / Semantic Scholar / Zotero before applying, and
a hallucinated citation will be rejected.
Add via --critics. Examples of reasonable additions:
Each custom perspective needs a focused prompt following the same format as the defaults above, including a scope-boundaries clause that names the active critics it must not duplicate.
# Iteration {N} decisions
## Critic: evidence — VERDICT: BLOCK
- Item 1 [MAJOR]: <short restatement> — **applied**. <reason/how>
- Item 2 [MINOR]: <short restatement> — **deferred**. <reason>
- Item 3 [NIT]: <short restatement> — **rejected**. <reason>
## Critic: method — VERDICT: ...
...
## Adjudications
- Evidence item 1 conflicts with expert item 3 — <decision + reason>
After the loop exits, write critic-reviews/final-report.md:
# Critic-loop final report
**Document:** <path>
**Iterations run:** <N> / <MAX_ITER>
**Exit reason:** <(a) all critics satisfied / (b) iteration cap hit / (c) loop-back>
## Verdict timeline
| Iter | evidence | method | argument | expert |
|---|---|---|---|---|
| 1 | BLOCK | BLOCK | BLOCK | BLOCK |
| 2 | BLOCK | SHIP-WITH-REVISIONS | SHIP-WITH-REVISIONS | SHIP-WITH-REVISIONS |
| 3 | SHIP-WITH-REVISIONS | SHIP-WITH-REVISIONS | SHIP-WITH-REVISIONS | SHIP-WITH-REVISIONS |
## Item counts (cumulative across iterations)
| critic | MAJOR | MINOR | NIT | applied | deferred | rejected |
|---|---:|---:|---:|---:|---:|---:|
| evidence | 4 | 7 | 1 | 10 | 2 | 0 |
| method | 3 | 6 | 2 | 9 | 2 | 0 |
| argument | 2 | 9 | 6 | 14 | 3 | 0 |
| expert | 2 | 7 | 1 | 6 | 3 | 1 |
## Unresolved MAJOR items (if any)
- <critic> iter <N> item <M>: <restate>. Reason unresolved: <…>
## Deferred items carried forward
- <critic> iter <N> item <M>: <restate>. Reason: <…>
If there are zero unresolved MAJOR items and no loop-back, the final
report's verdict line should read **Final status: LOOP COMPLETE — no unresolved MAJORs.**. Otherwise **Final status: LOOP COMPLETE WITH UNRESOLVED ITEMS — see below.**.
After writing the final report, send the user a concise summary message
(~100 words): exit reason, iterations used, count of applied/deferred/
rejected items, and any unresolved MAJORs. Do not paste the full critic
reports into the chat — point to the files under critic-reviews/.
empirical-integrity skill — route it through the
pipeline instead).~/.config/academic-research/config.toml via
cat, head, tail, grep, less, more, awk, sed, a
Python script, or any other command. NEVER read that file. It
holds API keys. The critic loop has no legitimate need for them.