From armory
Audits figures and plots in manuscripts for rhetorical effectiveness across 8 dimensions: chart-type fit, axis design, visual hierarchy, data density, caption clarity, perceptual accuracy, and narrative alignment with claims.
npx claudepluginhub mathews-tom/armory --plugin armoryThis skill uses the workspace's default tool permissions.
---
Audits readability and rendering of figures and tables in LaTeX academic manuscripts by computing display-scale font and marker sizes, checking label collisions, color accessibility, axis efficiency, table formatting, and consistency.
Critiques academic figures for format, colorblind safety, legibility, overplotting, category count, resolution, and vector/raster suitability before paper submission.
Evaluates scientific manuscripts and grants for methodology, statistics, design, reproducibility, ethics, figure integrity, and reporting standards across disciplines.
Share bugs, ideas, or general feedback.
Pipeline position: Phase 1b (content audit). Runs in parallel with
manuscript-review. Requires compiled PDF. No prior dependencies.
See /manuscript-pipeline for full execution order.
Evaluate every figure in a manuscript as a rhetorical act — a visual argument that must land with the reader. Each figure exists to communicate a specific claim. This skill audits whether the figure actually achieves that communication, or whether it undermines, obscures, or contradicts the author's intent.
A figure that is technically correct but rhetorically ineffective is a wasted opportunity. Reviewers form judgments from figures before reading the methodology. A figure that fails to show what the text claims creates doubt even when the underlying data supports the claim.
| Concern | This skill (figure-rhetoric) | manuscript-review | manuscript-typography |
|---|---|---|---|
| Chart type selection | Is this the right chart for this claim? | N/A | N/A |
| Visual emphasis | Does the figure draw attention to the right thing? | N/A | N/A |
| Prose-figure alignment | Does a reader SEE what the text SAYS? | Does the text match the figure? (§24) | N/A |
| Data selection | Should different data be plotted? | N/A | N/A |
| Axis design | Do axes help or hide the story? | Axis labels present? (§12) | Font consistency |
| Figure quality | N/A | Resolution, colorblind, chartjunk (§12) | Backgrounds, framing |
| Figure rendering | N/A | Legibility at print scale (§23) | Caption formatting |
| Provenance | N/A | N/A (→ manuscript-provenance) | N/A |
Boundary: manuscript-review §24 checks "does the prose match the figure?" This skill checks "does the figure communicate what the prose needs it to?" §24 catches factual mismatches (text says 14.3%, figure shows 13.8%). This skill catches rhetorical failures (text says "dramatic improvement," figure shows bars that look identical because the y-axis starts at 0).
CRITICAL: This skill requires visual inspection. LaTeX source alone is insufficient. The entire point of this audit is what the reader sees.
Obtain the rendered figures by one of:
.pdf, .png,
.jpg figure file referenced by \includegraphicsFor each figure:
Visually inspect the figure. Read the figure file or the PDF page containing it. Before reading any prose, record what the figure communicates at first glance — the immediate visual takeaway. What pattern, trend, comparison, or relationship does a naive reader see?
Extract the claim context. Read the 2-3 paragraphs surrounding the
first \ref{fig:X} reference. Identify the specific claim the figure
is supposed to support. Write it down as a one-sentence assertion.
Read the caption. Does the caption tell the reader what to see, or does it just describe the axes?
Compare. Does the visual takeaway (step 1) match the claimed assertion (step 2)? The figure must be evaluated through the reader's eyes, not the author's intent.
For each figure, evaluate across 8 dimensions:
Dimension 1 — Claim-Figure Alignment
The central question: does a reader who looks at this figure see the claim the text makes?
Failure modes:
Dimension 2 — Chart Type Appropriateness
Is this the right type of visualization for this claim?
| Claim type | Effective chart | Ineffective chart |
|---|---|---|
| Comparison across categories | Grouped bar, dot plot | Pie chart, stacked bar (hard to compare) |
| Trend over time/sequence | Line plot | Bar chart (obscures continuity) |
| Distribution | Histogram, violin, box plot | Bar chart of means (hides variance) |
| Correlation / relationship | Scatter plot | Table of paired values |
| Composition / proportion | Stacked area, stacked bar | Multiple pie charts |
| Difference / improvement | Difference plot, waterfall | Side-by-side bars at full scale |
| Ranking | Horizontal bar (sorted) | Vertical bar (unsorted) |
| Spatial | Heatmap, contour | Table of values |
| Part-to-whole | Treemap, stacked bar | Grouped bar |
| Flow / process | Sankey, alluvial | Static diagram |
| Confusion / classification | Confusion matrix (heatmap) | Table of numbers |
| Ablation contributions | Waterfall chart | Table or grouped bar |
Flag when a more effective chart type exists for the claim being made.
Dimension 3 — Axis and Scale Design
Axes can make or break the visual argument.
Y-axis origin: Starting at 0 when differences are small makes them invisible. Starting above 0 when showing absolute quantities is misleading. The choice must serve the claim:
// if neededAxis scale: Linear vs logarithmic. Log scale appropriate when data spans orders of magnitude or when relative differences matter more than absolute. Flag linear scale with data spanning 3+ orders of magnitude.
Axis range: Does the range include all relevant data? Does it extend far beyond the data (wasting space)? Is the range chosen to exaggerate or minimize visual differences?
Aspect ratio manipulation: A very wide or very narrow plot can exaggerate or flatten trends. The slope of a trend line should be perceptible but not exaggerated.
Dual axes: Almost always confusing. Two different y-axes invite incorrect visual comparisons. Prefer: two separate panels with aligned x-axes.
Dimension 4 — Visual Hierarchy and Emphasis
Does the figure guide the reader's eye to the right element?
Primary element: The data series, region, or comparison that the text discusses should be visually dominant (thicker line, bolder color, larger markers, foreground position).
Secondary elements: Context, baselines, and reference data should be visually recessive (thinner lines, gray, smaller markers, background).
Annotations: If the text references a specific point, region, or threshold, it should be annotated in the figure (arrow, callout, shaded region, dashed reference line). The reader should not have to decode coordinates to find what the text describes.
Cognitive load: Count the number of distinct visual elements the reader must track. More than 5-7 series/groups in one plot exceeds working memory. Split into panels or highlight the comparison of interest.
Failure modes:
Dimension 5 — Data Density and Simplification
Is the figure showing the right amount of data for its claim?
Overloaded: Too many series, categories, or data points for a single plot. The claim is about the relationship between 2 methods but the plot shows 12. Simplify: show the comparison of interest prominently; relegate others to supplementary material or a secondary panel.
Underloaded: The plot shows so little data that the claim isn't convincing. A single point where a trend is claimed. Two bars where a distribution is relevant. Three epochs where convergence behavior matters.
Summarized when raw matters: Showing only means when the variance is the story (or when it would undermine the story — flag both). Confidence intervals, error bars, or violin plots for stochastic results.
Raw when summary matters: Individual data points where the aggregate pattern is the claim. A scatter plot of 10,000 points where a density plot or binned heatmap would show the structure.
Dimension 6 — Caption as Interpretation Guide
The caption should tell the reader what to see, not just what the axes are.
Levels of caption quality:
Descriptive only (weak): "Training loss over epochs for five methods." The reader must figure out the takeaway.
Directive (adequate): "Training loss over epochs. Method A (red) converges in ~50 epochs while baselines require 150+." Points the reader to the claim.
Interpretive (strong): "Training loss over epochs. Method A (red) converges 3x faster than the nearest baseline (blue), supporting the claim that architectural change X reduces optimization difficulty." Connects the visual to the argument.
For each caption, identify its level and recommend upgrading if Level 1. Level 2 is the minimum for effective communication. Level 3 is ideal for key figures supporting core claims.
Dimension 7 — Perceptual Accuracy
Does the visual encoding accurately represent the underlying data?
Area encoding for quantities: If using bar width, bubble size, or area to encode values, the mapping must be proportional to area (not radius or diameter). A value 2x larger should have 2x the area, not 2x the radius (which is 4x the area).
3D effects: 3D bar charts, 3D pie charts, perspective effects — these distort perception of values through foreshortening. Flag any 3D visualization of 2D data.
Color scale linearity: Sequential color maps (viridis, plasma) have perceptually uniform steps. Rainbow/jet color maps have perceptual discontinuities that create artificial boundaries in continuous data. Flag rainbow/jet for continuous data.
Truncated axes without indication: Y-axis not starting at 0
without a visible break (// notation) can mislead readers about
relative magnitudes.
Aspect ratio distortion: Pie charts not circular. Bar widths inconsistent. Maps with incorrect projections (rare in ML papers but common in spatial analysis).
Dimension 8 — Redundancy and Narrative Arc
Do the figures as a set tell a coherent story?
Redundant figures: Two figures showing essentially the same information in different forms. Unless each adds distinct insight, merge or choose the more effective one.
Missing figures: Is there a key claim in the text that has no visual support but would benefit from one? A figure that isn't there is a missed opportunity if the claim is central.
Figure ordering: Do the figures appear in the order of the paper's argument? Architecture → training → results → analysis is a natural arc. Figures out of narrative order confuse the reader's mental model.
Visual consistency across figures: Same method → same color/marker across all figures. Same data → same axis scale when comparison is relevant. Cross-reference with manuscript-review §12 (visual language consistency).
Quick-reference of frequently occurring rhetorical failures:
| Antipattern | Description | Fix |
|---|---|---|
| The Invisible Win | Method outperforms by 0.3% but y-axis spans 0-100% | Zoom to relevant range; use difference plot |
| The Spaghetti Plot | 10+ overlapping lines, all same weight | Highlight 2-3 of interest; gray out rest |
| The Bar Chart of Means | Bars showing means without error bars/CI | Add error bars; consider violin/box plots |
| The Orphan Claim | Text discusses a specific region/point with no annotation | Add arrow, shaded region, or reference line |
| The Pie Chart | Comparing proportions across >5 categories | Horizontal bar chart, sorted |
| The Rainbow Heatmap | Jet/rainbow colormap for continuous data | Use perceptually uniform colormap (viridis) |
| The Giant Legend | Legend with 15 entries reader must cross-reference | Direct labeling on lines; or reduce series count |
| The Wrong Chart | Line chart for categorical data; bar chart for trends | Match chart type to data type and claim |
| The Dual Axis | Two y-axes implying false correlation | Two separate panels, aligned x-axis |
| The Data Dump | All results in one figure "for completeness" | Show what matters; appendix the rest |
| The Missing Baseline | Results without visual reference point | Add dashed line for baseline/random/human performance |
| The Abstract Figure | Text says "see Figure 3" but Figure 3 requires 5 minutes of study | Simplify; annotate; upgrade caption |
For each figure, produce:
## Figure [N]: [brief description]
**Prose claim:** [one-sentence claim the figure is supposed to support]
**Visual takeaway:** [what a reader actually sees at first glance]
**Alignment:** [STRONG | ADEQUATE | WEAK | CONTRADICTORY]
**Issues:**
1. [Dimension]: [specific problem]
- **Impact:** [how this affects the reader's understanding]
- **Fix:** [specific recommendation with concrete changes]
**Caption assessment:** [Level 1/2/3] — [recommendation if upgrade needed]
**Recommended changes:** [prioritized list]
Then a summary section:
## Figure Set Assessment
**Overall narrative coherence:** [figures tell a coherent story / gaps exist / redundancies]
**Strongest figure:** Figure [N] — [why it works]
**Weakest figure:** Figure [N] — [primary issue]
**Priority fixes:**
1. [Most impactful change across all figures]
2. [Second most impactful]
3. [Third most impactful]
**Missing figures:** [claims that need visual support but lack it]
**Redundant figures:** [figures that could be merged or cut]
Save report as [manuscript-name]-figure-rhetoric-report.md.
Present:
The reader is naive. Do not evaluate figures through the author's eyes. The author knows what the figure "should" show. The reader sees only what is visually present. Every judgment in this audit is from the reader's perspective.
Claim first, then figure. Read the prose claim before looking at the figure. The figure's job is to support that specific claim. If the figure is beautiful but doesn't support the claim, it fails.
One figure, one message. A figure trying to show three things shows none of them clearly. If the text makes three claims about one figure, the figure is overloaded or the text should point to three figures.
Visual perception trumps data accuracy. A figure can be numerically correct but perceptually wrong (e.g., differences invisible due to scale). The reader's visual impression IS the communication. If the impression doesn't match the claim, the figure has failed.
Concreteness over abstraction. Recommendations specify the exact change: "change y-axis range from 0-100 to 85-95," not "consider adjusting the axis." Include suggested chart types, axis ranges, color choices, and annotation text.
Severity scales with claim importance. A weak figure supporting a minor methodological point is LOW priority. A weak figure supporting the paper's core result is CRITICAL — it's the first thing a reviewer scrutinizes.