Search everything...

Skill

manuscript-review

Audits academic or technical manuscripts for macro-coherence, argumentative structure, citation hygiene, visual formatting, and submission readiness, delivering a prioritized section-level refactoring report.

documentation

npx claudepluginhub mathews-tom/armory --plugin armory

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Execute a comprehensive, multi-pass diagnostic audit of an academic or

Supporting Assets

evals/cases.yamlreferences/checklist.mdreferences/detection-patterns.mdreferences/report-template.md

SKILL.md

Similar Skills

e2e-testing

170.6k

Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.

everything-claude-code

nextjs-turbopack

170.6k

Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.

everything-claude-code

laravel-plugin-discovery

170.6k

Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.

everything-claude-code

Stats

Stars174

Forks27

Last CommitApr 11, 2026

Actions

View Source View Plugin View on GitHub View README

manuscript-review

From armory

documentation

npx claudepluginhub mathews-tom/armory --plugin armory

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Execute a comprehensive, multi-pass diagnostic audit of an academic or

Supporting Assets

evals/cases.yamlreferences/checklist.mdreferences/detection-patterns.mdreferences/report-template.md

SKILL.md

Manuscript Review Skill

Purpose

Execute a comprehensive, multi-pass diagnostic audit of an academic or technical manuscript, producing a structured improvement report that identifies issues across 24 audit dimensions — from macro-coherence and argumentative architecture through claims-evidence calibration, narrative flow, prose microstructure, rendered visual inspection, and cross-element coherence, down to citation hygiene and reproducibility.

The output is a prioritized, actionable improvement plan — not a line edit. The goal is to surface structural, logical, and clarity issues that authors systematically miss because they're too close to the text.

Optimized for arXiv/preprint submissions with flexible compliance standards.

Companion skill: manuscript-provenance audits whether manuscript content (numbers, tables, figures, ordering, terminology) is computationally derived from code and scripts. This skill audits the document as prose; that skill audits computational grounding. Run both for complete pre-publication coverage.

Boundary Agreement with manuscript-provenance

Concern	This skill (manuscript-review)	manuscript-provenance
Reproducibility	Does the paper describe enough to reproduce? (§6)	Does the code actually produce what the paper claims? (§1, §7)
Figures/Tables	Legible, accessible, well-formatted? (§12)	Generated by scripts, not manual entry? (§2, §3)
Rendered visuals	Readable at print scale? Floats near references? (§23)	Figure generation script produces correct format? (§3)
Hyperparameters	Listed in the paper with rationale? (§6)	Values trace to config files, not hardcoded? (§1, §8)
Code availability	Statement exists in the paper? (§17)	Repo URL valid, README accurate, pipeline works? (§11)
Terminology	Abbreviations consistent within document? (§14)	Terms match code identifiers? (§5)
Significant figures	Consistent precision within document? (§12)	Precision matches script output? (§2)
Figure format	Appropriate format for document quality? (§12)	Format generated by script, not manually exported? (§3)
Computational cost	Reported in the paper? (§7)	Values trace to benchmarking scripts? (§1)
Macro-prose coherence	Prose framing appropriate for injected value? (§24)	Value traced to code, macro manifest produced? (§4)
Cross-element consistency	Prose, captions, figures, tables mutually consistent? (§24)	All elements from same run/pipeline output? (§9)

Rule: This skill never opens the codebase. manuscript-provenance never judges prose quality. Each reads the other's report when available.

Integration point — Macro Manifest: manuscript-provenance produces a macro manifest as part of its §4 audit: a structured list of every macro-injected value, its resolved numeric value, its source (script + output file), and its location(s) in the manuscript text. This skill's Pass 13 (Cross-Element Coherence) consumes that manifest to check whether the prose surrounding each injected value is appropriate for the actual value. If no provenance report exists, this skill extracts macro values directly from .tex source (less precise — no source tracing, but coherence check still runs).

Workflow

1. Ingest

Read the uploaded manuscript. Accept PDF, DOCX, LaTeX source, or Markdown. If multiple files are uploaded (e.g., main text + supplementary), process all of them.

Identify:

Target venue (defaults to arXiv/preprint; adjust if conference/journal submission)
Submission type (full paper, technical report, thesis chapter, etc.)
Any specific concerns the user raised — these get priority in the report

For arXiv submissions, compliance checks are advisory. Focus on technical quality, reproducibility, and clarity rather than strict formatting rules.

2. Load the Checklist

Read references/checklist.md — the comprehensive 24-section, ~175-checkpoint refactoring checklist. Every audit pass is structured against this checklist.

Read references/checklist.md

3. Multi-Pass Audit

Execute the following passes sequentially. Each pass maps to one or more checklist sections. Work systematically — for each checkpoint:

PASS: Note briefly, move on
FAIL: Document with exact location (section, paragraph, line), specific defect, concrete fix required
N/A: Mark if not applicable to this manuscript type

Pass 1 — Structural Integrity (Checklist §1, §4, §5, §10)

Trace the thesis-thread from abstract through conclusion
Verify section-level necessity and logical dependency ordering
Check introduction funnel structure and contribution enumeration
Verify conclusion contains no new information and maps 1:1 to stated contributions
Assess related work organization (taxonomic vs. annotated) and differentiation

Pass 2 — Abstract & Title Calibration (Checklist §2, §3)

Abstract functional completeness (context → gap → approach → results → implication)
Quantitative specificity in abstract
Title precision-scope alignment
Keyword-abstract coherence

Pass 3 — Technical Rigor (Checklist §6, §7)

Reproducibility sufficiency of methodology (document-level: does the paper describe enough? Code-level verification deferred to manuscript-provenance)
Assumption explicitness and notation consistency
Baseline adequacy, dataset characterization, statistical rigor
Effect size reporting, evaluation metric justification
Computational cost reporting (checks paper reports it; value tracing to benchmarking scripts deferred to manuscript-provenance)

Pass 4 — Argumentation Quality (Checklist §8, §9)

Discussion introduces no new results
Alternative explanations considered
Generalizability boundaries stated
Limitations genuine (not performative), preemptively addressing reviewer objections
Threat-to-validity taxonomy coverage

Pass 5 — Citation & Reference Hygiene (Checklist §11)

Citation-reference bijection (no orphans in either direction)
Style conformance to target venue
Primary source preference over secondary citations
Preprint-to-publication status check
Citation placement (claim-level, not paragraph-level)
Retraction check advisory

Pass 6 — Visual & Tabular Quality (Checklist §12)

Sequential callout ordering
Resolution and legibility assessment
Colorblind accessibility
Axis labels with units, consistent visual language
Table alignment and significant figure consistency

Pass 7 — Prose Mechanics (Checklist §13, §14, §15)

Tense consistency (recommendations, not strict requirements)
Hedging calibration (neither overclaiming nor vacuous)
Passive voice patterns (advisory)
Nominalization reduction opportunities
Clarity and precision (marketing language advisory for arXiv)
Abbreviation hygiene (first-use expansion, consistency)
Mathematical typesetting consistency

Pass 7b — AI-Pattern Detection (advisory)

Scan prose sections for residual AI-writing patterns using detection rules from references/detection-patterns.md. Academic manuscripts drafted or polished with AI assistants often retain detectable tells.

Focus on patterns relevant to academic writing:

Significance inflation (#1) — "pivotal", "groundbreaking", "paradigm shift"
AI-frequency vocabulary (#7) — "delve", "landscape", "tapestry", "underscore"
Copula avoidance (#8) — "serves as" instead of "is"
Vague attributions (#5) — "experts argue", "studies have shown" without citations
Filler phrases (#22) — "it is important to note that"
Excessive hedging (#23) — beyond what epistemically appropriate hedging requires

Skip patterns that are acceptable in academic prose:

Passive voice — standard in methods sections
Formal transitions — "Furthermore", "Moreover" are conventional in academic writing
Title case headings — journal style may require it

This pass is MEDIUM priority. Flag findings but do not over-correct — academic conventions overlap with some AI patterns. Severity: report individual instances as LOW, flag clusters of 3+ patterns in a single paragraph as MEDIUM.

Pass 8 — Best Practices & Reproducibility (Checklist §16, §17, §18, §19)

Supplementary material cross-reference integrity
Code/data availability statements exist in the paper (verification that claimed repos are valid and pipelines work deferred to manuscript-provenance)
License compatibility for third-party assets
Hyperlink verification and reference integrity
Overall clarity and accessibility assessment

Pass 9 — Claims-Evidence Calibration (Checklist §20)

This is a dedicated pass through every assertion in the manuscript.

For each claim:

Grade claim strength: strong/definitive ("X causes Y"), moderate/qualified ("X improves Y under conditions Z"), or hedged/tentative ("X may contribute to Y")
Grade evidence strength: direct experimental, indirect/correlational, citation-only, analogical, or no evidence
Flag mismatches:
- Overclaim: Strong claim + weak evidence → soften the claim or add evidence
- Underclaim: Hedged language + strong evidence → sharpen the language
- Orphaned claim: Any strength + no evidence → add evidence or remove claim
Audit causal vs. correlational language against study design
Check generalization scope against actual experimental conditions
Verify comparative claims ("outperforms", "better than") against head-to-head evaluations actually present in the paper
Flag implicit claims (e.g., "Unlike prior work, our approach handles X" implies prior work cannot — verify this)
Check negation claims for evidence of absence vs. absence of evidence

This pass is HIGH priority. Claims-evidence mismatch is the single most common reason reviewers reject papers. An overclaim in the abstract poisons the entire reading.

Pass 10 — Narrative Flow & Coherence (Checklist §21)

Read the manuscript linearly, tracking the reader's cognitive state. At each sentence and paragraph boundary, check:

Does this sentence follow from the previous one, or does the reader need to make an inferential leap?
Does this paragraph's opening sentence state its point, or is the point buried?
Does each sentence start with known information and end with new information (given-new contract)?
Are cross-references between sentences ordered so the reader moves forward through the text, not zigzagging back?
Does the last sentence of each paragraph connect to the first sentence of the next paragraph?
Are there logic gaps where a premise is skipped because the author knows it implicitly?
Does every setup/promise within a section get its payoff within that section?
Does each section have a discernible arc (setup → content → landing)?

Flag any location where a domain-expert reader would need to re-read, scroll back, or pause to reconstruct the logical connection. These are flow breaks.

This pass is HIGH priority. Papers with strong results but poor narrative flow exhaust reviewers. A reader who has to fight the text stops trusting the author.

Pass 11 — Prose Microstructure (Checklist §22)

Sentence-level and paragraph-level patterns that compound into readability problems:

Ambiguous referents: "this", "it", "they" without clear antecedents
Information density spikes: paragraphs introducing too many new concepts at once
Sentences requiring multiple re-reads: excessive clause nesting, misplaced modifiers, garden-path constructions
Broken parallel structure in lists, comparisons, sequences
Semantic redundancy: same point restated in nearby paragraphs without purpose
Long-distance references: concepts introduced and referenced many paragraphs later without re-anchoring
Dangling modifiers: "Using gradient descent, the loss function converged"

This pass is MEDIUM priority on individual items but compounds — a manuscript with 20 ambiguous pronouns, 10 density spikes, and 5 dangling modifiers is materially harder to read even though no single instance is fatal.

Pass 12 — Rendered Document Inspection (Checklist §23)

This pass requires the compiled PDF. If only LaTeX source is provided, ask the user for the compiled PDF or compile it.

Open the PDF and inspect every page at actual print scale:

Figures: For each figure, zoom to the size it will appear at in the final document. Check:
- All text (axis labels, tick labels, legend, annotations) readable
- No label overlap, collision, or truncation
- Legend placement not covering data
- Annotations pointing to correct elements
Tables: Check column alignment, text wrapping, no content overflow
Floats: For each figure/table, locate its first text reference. Measure the page distance. Flag anything >1 page away.
Page breaks: Check no table splits across pages (unless intentionally long), no equation orphaned from its introduction, no header stranded at page bottom
Margins: Check no content bleeds outside margins (equations, URLs, wide tables, wide figures)
Visual consistency: Font sizes across figures comparable, color usage consistent

This pass is HIGH priority. A paper with illegible axis labels or a table split across pages signals carelessness to reviewers regardless of technical quality. These defects are invisible from source and the author often doesn't notice because they read the paper in their editor, not in the compiled output.

Pass 13 — Cross-Element Coherence (Checklist §24)

Read the manuscript as an integrated system. For each figure, table, and macro-injected value:

Collect the element cluster: The visual/data itself, its caption, every prose passage that references it, and any macro values appearing in or near those passages
Check four-way consistency: Does the prose claim match the visual? Does the caption describe the current content? Do the numbers agree across text, table, and figure? Does the qualitative language match the quantitative values?
Check cross-reference accuracy: Every \ref points to the element the surrounding prose describes. After figure reordering, references often point to the wrong visual.
Check macro-prose coherence: When a macro injects a number, read the sentence it sits in. Does the qualitative framing ("modest", "dramatic", "marginal", "substantial") match the actual numeric value? This is the handoff from manuscript-provenance: provenance traces the value to code, this pass verifies the prose wrapping that value is appropriate.
Check temporal consistency: Do all elements appear to come from the same experimental run? A figure from one run and a table from another is a coherence failure even if both are individually correct.

If a manuscript-provenance report exists, load its macro manifest (list of all traced macro values with locations and source values) and use it as input for step 4. If no provenance report exists, extract macro values directly from .tex source.

This pass is HIGH priority. Cross-element incoherence is the most insidious class of manuscript defect — each piece looks fine in isolation, the system is broken. Reviewers notice because they read the document linearly and encounter contradictions the author can't see because they edit pieces independently.

Note for arXiv: Ethics statements, anonymization, page limits, and strict formatting requirements are marked N/A by default. Focus on technical quality, reproducibility, and clarity.

4. Generate Refactoring Report

Produce the report as a structured document. Use references/report-template.md as the output format.

Read references/report-template.md

Report structure:

Executive Summary — Overall quality assessment (Publication-ready / Recommend revisions / Needs work). Top 5 high-priority improvements.
Per-Section Diagnostics — For each manuscript section, the specific issues found, mapped to checklist checkpoint IDs. Severity tagged as HIGH (impacts clarity/credibility), MEDIUM (noticeable quality gap), or LOW (polish/optional improvement).
Cross-Cutting Issues — Problems that span multiple sections (e.g., inconsistent notation, citation patterns, clarity patterns).
Priority Queue — All issues ranked by impact × effort. HIGH-impact items first, then MEDIUM items ordered by estimated fix effort (lowest effort first = quick wins).
Checklist Status — The full 24-section checklist with pass/needs-work/not-applicable status per checkpoint, referencing specific locations in the manuscript.

5. Triage and Priority Report

After completing the full scan, categorize issues:

HIGH — Impacts technical credibility or reproducibility (missing baselines, orphaned claims, insufficient methodology details, broken references)
MEDIUM — Reduces clarity or professional quality (inconsistent notation, vague claims, poor figure quality)
LOW — Polish issues (citation formatting variations, minor typesetting, style preferences)

For arXiv submissions, focus HIGH priority on technical quality and reproducibility. Compliance items (ethics statements, formatting) are typically LOW priority or N/A.

Present the priority queue first, then the detailed findings.

6. Output

Save the report as a Markdown file in the same directory as the manuscript, named [manuscript-name]-review-report.md.

Present the file to the user with a concise summary:

Quality assessment verdict
Count of HIGH/MEDIUM/LOW priority items
Top 3 recommended improvements

Core Principles

Focus on structure and clarity. This is a structural and technical audit. Sentence-level grammar is out of scope unless it forms a systematic pattern affecting readability.
Evidence-based findings. Every issue cites the specific manuscript location (section, paragraph, figure/table number). No vague "could be better."
Balanced severity. HIGH priority for technical credibility and reproducibility issues. MEDIUM for clarity and professional quality. LOW for style preferences. ArXiv allows more flexibility than peer-reviewed venues.
Context-aware recommendations. Formatting and compliance requirements vary by venue. For arXiv, prioritize technical quality over strict formatting. For journal submissions, adjust accordingly.
Constructive framing. Frame findings as improvements to clarity, credibility, and reproducibility rather than as rejection risks. ArXiv is more forgiving; focus on making the work accessible and trustworthy.
Direct communication. Report issues as issues with specific fixes, not as vague suggestions. But recognize that many "rules" are guidelines for arXiv.
Systematic coverage. Work through the checklist methodically. Mark items as pass/needs-work/N/A based on actual content. ArXiv-specific items (anonymization, page limits, strict templates) default to N/A.

Example Invocation Patterns

User says any of:

"Review my manuscript"
"Check this paper before I submit"
"Is this ready for submission"
"Run pre-publication review"
"Check my references"
"Does the abstract work"
"Review the methodology section"
"Pre-submission checklist"
"/manuscript-review"

All trigger this skill. Partial reviews (e.g., "just check citations") still run the full audit — the user benefits from comprehensive diagnostics even when they only asked about one aspect.

Similar Skills

e2e-testing

170.6k

Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.

everything-claude-code

nextjs-turbopack

170.6k

Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.

everything-claude-code

laravel-plugin-discovery

170.6k

Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.

everything-claude-code

Stats

Stars174

Forks27

Last CommitApr 11, 2026

Actions

View Source View Plugin View on GitHub View README

Manuscript Review Skill

Purpose

Optimized for arXiv/preprint submissions with flexible compliance standards.

Boundary Agreement with manuscript-provenance

Concern	This skill (manuscript-review)	manuscript-provenance
Reproducibility	Does the paper describe enough to reproduce? (§6)	Does the code actually produce what the paper claims? (§1, §7)
Figures/Tables	Legible, accessible, well-formatted? (§12)	Generated by scripts, not manual entry? (§2, §3)
Rendered visuals	Readable at print scale? Floats near references? (§23)	Figure generation script produces correct format? (§3)
Hyperparameters	Listed in the paper with rationale? (§6)	Values trace to config files, not hardcoded? (§1, §8)
Code availability	Statement exists in the paper? (§17)	Repo URL valid, README accurate, pipeline works? (§11)
Terminology	Abbreviations consistent within document? (§14)	Terms match code identifiers? (§5)
Significant figures	Consistent precision within document? (§12)	Precision matches script output? (§2)
Figure format	Appropriate format for document quality? (§12)	Format generated by script, not manually exported? (§3)
Computational cost	Reported in the paper? (§7)	Values trace to benchmarking scripts? (§1)
Macro-prose coherence	Prose framing appropriate for injected value? (§24)	Value traced to code, macro manifest produced? (§4)
Cross-element consistency	Prose, captions, figures, tables mutually consistent? (§24)	All elements from same run/pipeline output? (§9)

Rule: This skill never opens the codebase. manuscript-provenance never judges prose quality. Each reads the other's report when available.

Workflow

1. Ingest

Read the uploaded manuscript. Accept PDF, DOCX, LaTeX source, or Markdown. If multiple files are uploaded (e.g., main text + supplementary), process all of them.

Identify:

Target venue (defaults to arXiv/preprint; adjust if conference/journal submission)
Submission type (full paper, technical report, thesis chapter, etc.)
Any specific concerns the user raised — these get priority in the report

For arXiv submissions, compliance checks are advisory. Focus on technical quality, reproducibility, and clarity rather than strict formatting rules.

2. Load the Checklist

Read references/checklist.md — the comprehensive 24-section, ~175-checkpoint refactoring checklist. Every audit pass is structured against this checklist.

Read references/checklist.md

3. Multi-Pass Audit

Execute the following passes sequentially. Each pass maps to one or more checklist sections. Work systematically — for each checkpoint:

PASS: Note briefly, move on
FAIL: Document with exact location (section, paragraph, line), specific defect, concrete fix required
N/A: Mark if not applicable to this manuscript type

Pass 1 — Structural Integrity (Checklist §1, §4, §5, §10)

Trace the thesis-thread from abstract through conclusion
Verify section-level necessity and logical dependency ordering
Check introduction funnel structure and contribution enumeration
Verify conclusion contains no new information and maps 1:1 to stated contributions
Assess related work organization (taxonomic vs. annotated) and differentiation

Pass 2 — Abstract & Title Calibration (Checklist §2, §3)

Abstract functional completeness (context → gap → approach → results → implication)
Quantitative specificity in abstract
Title precision-scope alignment
Keyword-abstract coherence

Pass 3 — Technical Rigor (Checklist §6, §7)

Reproducibility sufficiency of methodology (document-level: does the paper describe enough? Code-level verification deferred to manuscript-provenance)
Assumption explicitness and notation consistency
Baseline adequacy, dataset characterization, statistical rigor
Effect size reporting, evaluation metric justification
Computational cost reporting (checks paper reports it; value tracing to benchmarking scripts deferred to manuscript-provenance)

Pass 4 — Argumentation Quality (Checklist §8, §9)

Discussion introduces no new results
Alternative explanations considered
Generalizability boundaries stated
Limitations genuine (not performative), preemptively addressing reviewer objections
Threat-to-validity taxonomy coverage

Pass 5 — Citation & Reference Hygiene (Checklist §11)

Citation-reference bijection (no orphans in either direction)
Style conformance to target venue
Primary source preference over secondary citations
Preprint-to-publication status check
Citation placement (claim-level, not paragraph-level)
Retraction check advisory

Pass 6 — Visual & Tabular Quality (Checklist §12)

Sequential callout ordering
Resolution and legibility assessment
Colorblind accessibility
Axis labels with units, consistent visual language
Table alignment and significant figure consistency

Pass 7 — Prose Mechanics (Checklist §13, §14, §15)

Tense consistency (recommendations, not strict requirements)
Hedging calibration (neither overclaiming nor vacuous)
Passive voice patterns (advisory)
Nominalization reduction opportunities
Clarity and precision (marketing language advisory for arXiv)
Abbreviation hygiene (first-use expansion, consistency)
Mathematical typesetting consistency

Pass 7b — AI-Pattern Detection (advisory)

Focus on patterns relevant to academic writing:

Significance inflation (#1) — "pivotal", "groundbreaking", "paradigm shift"
AI-frequency vocabulary (#7) — "delve", "landscape", "tapestry", "underscore"
Copula avoidance (#8) — "serves as" instead of "is"
Vague attributions (#5) — "experts argue", "studies have shown" without citations
Filler phrases (#22) — "it is important to note that"
Excessive hedging (#23) — beyond what epistemically appropriate hedging requires

Skip patterns that are acceptable in academic prose:

Passive voice — standard in methods sections
Formal transitions — "Furthermore", "Moreover" are conventional in academic writing
Title case headings — journal style may require it

Pass 8 — Best Practices & Reproducibility (Checklist §16, §17, §18, §19)

Supplementary material cross-reference integrity
Code/data availability statements exist in the paper (verification that claimed repos are valid and pipelines work deferred to manuscript-provenance)
License compatibility for third-party assets
Hyperlink verification and reference integrity
Overall clarity and accessibility assessment

Pass 9 — Claims-Evidence Calibration (Checklist §20)

This is a dedicated pass through every assertion in the manuscript.

For each claim:

Grade claim strength: strong/definitive ("X causes Y"), moderate/qualified ("X improves Y under conditions Z"), or hedged/tentative ("X may contribute to Y")
Grade evidence strength: direct experimental, indirect/correlational, citation-only, analogical, or no evidence
Flag mismatches:
- Overclaim: Strong claim + weak evidence → soften the claim or add evidence
- Underclaim: Hedged language + strong evidence → sharpen the language
- Orphaned claim: Any strength + no evidence → add evidence or remove claim
Audit causal vs. correlational language against study design
Check generalization scope against actual experimental conditions
Verify comparative claims ("outperforms", "better than") against head-to-head evaluations actually present in the paper
Flag implicit claims (e.g., "Unlike prior work, our approach handles X" implies prior work cannot — verify this)
Check negation claims for evidence of absence vs. absence of evidence

This pass is HIGH priority. Claims-evidence mismatch is the single most common reason reviewers reject papers. An overclaim in the abstract poisons the entire reading.

Pass 10 — Narrative Flow & Coherence (Checklist §21)

Read the manuscript linearly, tracking the reader's cognitive state. At each sentence and paragraph boundary, check:

Does this sentence follow from the previous one, or does the reader need to make an inferential leap?
Does this paragraph's opening sentence state its point, or is the point buried?
Does each sentence start with known information and end with new information (given-new contract)?
Are cross-references between sentences ordered so the reader moves forward through the text, not zigzagging back?
Does the last sentence of each paragraph connect to the first sentence of the next paragraph?
Are there logic gaps where a premise is skipped because the author knows it implicitly?
Does every setup/promise within a section get its payoff within that section?
Does each section have a discernible arc (setup → content → landing)?

Flag any location where a domain-expert reader would need to re-read, scroll back, or pause to reconstruct the logical connection. These are flow breaks.

This pass is HIGH priority. Papers with strong results but poor narrative flow exhaust reviewers. A reader who has to fight the text stops trusting the author.

Pass 11 — Prose Microstructure (Checklist §22)

Sentence-level and paragraph-level patterns that compound into readability problems:

Ambiguous referents: "this", "it", "they" without clear antecedents
Information density spikes: paragraphs introducing too many new concepts at once
Sentences requiring multiple re-reads: excessive clause nesting, misplaced modifiers, garden-path constructions
Broken parallel structure in lists, comparisons, sequences
Semantic redundancy: same point restated in nearby paragraphs without purpose
Long-distance references: concepts introduced and referenced many paragraphs later without re-anchoring
Dangling modifiers: "Using gradient descent, the loss function converged"

Pass 12 — Rendered Document Inspection (Checklist §23)

This pass requires the compiled PDF. If only LaTeX source is provided, ask the user for the compiled PDF or compile it.

Open the PDF and inspect every page at actual print scale:

Figures: For each figure, zoom to the size it will appear at in the final document. Check:
- All text (axis labels, tick labels, legend, annotations) readable
- No label overlap, collision, or truncation
- Legend placement not covering data
- Annotations pointing to correct elements
Tables: Check column alignment, text wrapping, no content overflow
Floats: For each figure/table, locate its first text reference. Measure the page distance. Flag anything >1 page away.
Page breaks: Check no table splits across pages (unless intentionally long), no equation orphaned from its introduction, no header stranded at page bottom
Margins: Check no content bleeds outside margins (equations, URLs, wide tables, wide figures)
Visual consistency: Font sizes across figures comparable, color usage consistent

Pass 13 — Cross-Element Coherence (Checklist §24)

Read the manuscript as an integrated system. For each figure, table, and macro-injected value:

Collect the element cluster: The visual/data itself, its caption, every prose passage that references it, and any macro values appearing in or near those passages
Check four-way consistency: Does the prose claim match the visual? Does the caption describe the current content? Do the numbers agree across text, table, and figure? Does the qualitative language match the quantitative values?
Check cross-reference accuracy: Every \ref points to the element the surrounding prose describes. After figure reordering, references often point to the wrong visual.
Check macro-prose coherence: When a macro injects a number, read the sentence it sits in. Does the qualitative framing ("modest", "dramatic", "marginal", "substantial") match the actual numeric value? This is the handoff from manuscript-provenance: provenance traces the value to code, this pass verifies the prose wrapping that value is appropriate.
Check temporal consistency: Do all elements appear to come from the same experimental run? A figure from one run and a table from another is a coherence failure even if both are individually correct.

Note for arXiv: Ethics statements, anonymization, page limits, and strict formatting requirements are marked N/A by default. Focus on technical quality, reproducibility, and clarity.

4. Generate Refactoring Report

Produce the report as a structured document. Use references/report-template.md as the output format.

Read references/report-template.md

Report structure:

Executive Summary — Overall quality assessment (Publication-ready / Recommend revisions / Needs work). Top 5 high-priority improvements.
Per-Section Diagnostics — For each manuscript section, the specific issues found, mapped to checklist checkpoint IDs. Severity tagged as HIGH (impacts clarity/credibility), MEDIUM (noticeable quality gap), or LOW (polish/optional improvement).
Cross-Cutting Issues — Problems that span multiple sections (e.g., inconsistent notation, citation patterns, clarity patterns).
Priority Queue — All issues ranked by impact × effort. HIGH-impact items first, then MEDIUM items ordered by estimated fix effort (lowest effort first = quick wins).
Checklist Status — The full 24-section checklist with pass/needs-work/not-applicable status per checkpoint, referencing specific locations in the manuscript.

5. Triage and Priority Report

After completing the full scan, categorize issues:

HIGH — Impacts technical credibility or reproducibility (missing baselines, orphaned claims, insufficient methodology details, broken references)
MEDIUM — Reduces clarity or professional quality (inconsistent notation, vague claims, poor figure quality)
LOW — Polish issues (citation formatting variations, minor typesetting, style preferences)

For arXiv submissions, focus HIGH priority on technical quality and reproducibility. Compliance items (ethics statements, formatting) are typically LOW priority or N/A.

Present the priority queue first, then the detailed findings.

6. Output

Save the report as a Markdown file in the same directory as the manuscript, named [manuscript-name]-review-report.md.

Present the file to the user with a concise summary:

Quality assessment verdict
Count of HIGH/MEDIUM/LOW priority items
Top 3 recommended improvements

Core Principles

Focus on structure and clarity. This is a structural and technical audit. Sentence-level grammar is out of scope unless it forms a systematic pattern affecting readability.
Evidence-based findings. Every issue cites the specific manuscript location (section, paragraph, figure/table number). No vague "could be better."
Balanced severity. HIGH priority for technical credibility and reproducibility issues. MEDIUM for clarity and professional quality. LOW for style preferences. ArXiv allows more flexibility than peer-reviewed venues.
Context-aware recommendations. Formatting and compliance requirements vary by venue. For arXiv, prioritize technical quality over strict formatting. For journal submissions, adjust accordingly.
Constructive framing. Frame findings as improvements to clarity, credibility, and reproducibility rather than as rejection risks. ArXiv is more forgiving; focus on making the work accessible and trustworthy.
Direct communication. Report issues as issues with specific fixes, not as vague suggestions. But recognize that many "rules" are guidelines for arXiv.
Systematic coverage. Work through the checklist methodically. Mark items as pass/needs-work/N/A based on actual content. ArXiv-specific items (anonymization, page limits, strict templates) default to N/A.

Example Invocation Patterns

User says any of:

"Review my manuscript"
"Check this paper before I submit"
"Is this ready for submission"
"Run pre-publication review"
"Check my references"
"Does the abstract work"
"Review the methodology section"
"Pre-submission checklist"
"/manuscript-review"

All trigger this skill. Partial reviews (e.g., "just check citations") still run the full audit — the user benefits from comprehensive diagnostics even when they only asked about one aspect.