From armory
Audits manuscript provenance by verifying numbers, tables, figures derive from code outputs, generates provenance map, flags reproducibility defects.
npx claudepluginhub mathews-tom/armory --plugin armoryThis skill uses the workspace's default tool permissions.
Verify that a manuscript is a faithful rendering of computational outputs.
Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
Verify that a manuscript is a faithful rendering of computational outputs. Every number, table, figure, category label, ordering, and threshold in the document must trace to a specific script, config file, or pipeline output. Manual data entry in a manuscript is a reproducibility defect.
This skill produces a provenance map — a structured report linking each manuscript artifact to its generating code — and flags every break in the chain.
Companion skill: manuscript-review audits the document as prose (structure,
argumentation, citations). This skill audits whether the document content is
computationally grounded. Run both for complete pre-publication coverage.
| Concern | manuscript-review | This skill (manuscript-provenance) |
|---|---|---|
| Reproducibility | Does the paper describe enough to reproduce? (§6) | Does the code actually produce what the paper claims? (§1, §7) |
| Figures/Tables | Legible, accessible, well-formatted? (§12) | Generated by scripts, not manual entry? (§2, §3) |
| Rendered visuals | Readable at print scale? Floats near references? (§23) | Figure generation script produces correct format? (§3) |
| Hyperparameters | Listed in the paper with rationale? (§6) | Values trace to config files, not hardcoded? (§1, §8) |
| Code availability | Statement exists in the paper? (§17) | Repo URL valid, README accurate, pipeline works? (§11) |
| Terminology | Abbreviations consistent within document? (§14) | Terms match code identifiers? (§5) |
| Significant figures | Consistent precision within document? (§12) | Precision matches script output? (§2) |
| Figure format | Appropriate format for document quality? (§12) | Format generated by script, not manually exported? (§3) |
| Computational cost | Reported in the paper? (§7) | Values trace to benchmarking scripts? (§1) |
| Macro-prose coherence | Prose framing appropriate for injected value? (§24) | Value traced to code, macro manifest produced? (§4) |
| Cross-element consistency | Prose, captions, figures, tables mutually consistent? (§24) | All elements from same run/pipeline output? (§9) |
Rule: This skill never judges prose quality. manuscript-review never opens the codebase. Each reads the other's report when available.
Integration point — Macro Manifest: This skill produces a macro manifest as part of the §4 audit: a structured list of every macro-injected value with:
\bestf)0.847)manuscript-review's Pass 13 (Cross-Element Coherence, §24) consumes this manifest to check whether the prose surrounding each injected value is appropriate for the actual numeric value. Provenance owns "is this value computationally grounded?" Review owns "does the text wrapping this value make sense given what the value is?"
In scope:
\newcommand, \def, \pgfmathsetmacro)Out of scope:
This audit requires TWO artifacts:
.tex files (preferred), or PDF/DOCX as fallbackIf the user provides only one, ask for the other. LaTeX source is strongly preferred over compiled PDF — provenance auditing requires seeing the raw markup, macros, and input commands.
1a. Manuscript Artifact Extraction
Read all .tex files (main + included via \input/\include). Extract:
\newcommand, \def, \pgfmathsetmacro, and custom
command definitions that carry data valuestabular/table environment — cell values,
row/column ordering, headers\includegraphics paths, caption content, referenced data\input{generated/*.tex} patterns that pull from
script-generated LaTeX fragments\label/\ref pairs for cross-referencingBuild an artifact registry — a flat list of every data-carrying element in the manuscript with its location (file, line number).
1b. Codebase Mapping
Scan the project directory. Identify:
Makefile, snakemake, dvc.yaml, run.sh,
main.py, or equivalent orchestrationconfig.toml, config.yaml, .env, params.yaml,
hyperparameter filesresults/, output/,
figures/, tables/, generated/).tex files in output directories that scripts
produce for \input inclusionBuild a source registry — a flat list of every code artifact that produces or configures manuscript content.
For each entry in the artifact registry, attempt to establish a provenance chain: manuscript value → generated output → script → input data/config.
2a. Value Provenance
For every number in the manuscript:
Classification:
2b. Table Provenance
For each table:
Classification:
2c. Figure Provenance
For each figure:
\includegraphics?Classification:
2d. Terminology Provenance
For each named mode, mechanism, category, or method label:
Classification:
greedy_search, manuscript says "Greedy Search" in some places and
"greedy approach" in others)2e. Ordering Provenance
For each ordered list, ranked comparison, or sequenced enumeration:
Classification:
3a. LaTeX Macro Hygiene
\newcommand{\someMetric}{42.7} defined directly in .tex
files (bad) vs \input{generated/metrics.tex} where that file is script output (good).tex files that carry numeric/data values3b. Pipeline Completeness
3c. Config/Code Separation
3d. Stale Output Detection
3e. Version Pinning
4a. Macro Manifest Generation
Produce the macro manifest — the primary handoff artifact to manuscript-review. For every data-carrying macro identified in Phase 1a and traced in Phase 2a:
Macro: \bestf
Value: 0.847
Source: results/metrics.json → scripts/generate_latex_macros.py → generated/metrics.tex
Locations:
- paper.tex:142 — "achieving an F1 score of \bestf{}"
- paper.tex:287 — "The \bestf{} result represents a substantial improvement"
- abstract.tex:8 — "...with \bestf{} F1 score"
Classification: MACRO-TRACED
Also include every bare number (not a macro) found in Phase 1a that carries data (metrics, counts, parameters) — these are values that SHOULD be macros but aren't:
Bare value: 50
Location: paper.tex:198 — "convergence after 50 epochs"
Should-be-macro: YES — this is a training parameter, should trace to config
Classification: UNTRACED (no macro, no provenance)
Save the manifest as [manuscript-name]-macro-manifest.json alongside the
provenance report. This file is consumed by manuscript-review Pass 13
(Cross-Element Coherence) to verify prose-value appropriateness.
4b. Cross-Reference with manuscript-review
If a manuscript-review report exists for this manuscript, load it and:
If no manuscript-review report exists, recommend running it as a companion audit and note that the macro manifest is available for its Pass 13.
Load references/checklist.md and references/report-template.md.
Read references/checklist.md
Read references/report-template.md
Generate the provenance report following the template structure:
Save two files in the manuscript directory:
[manuscript-name]-provenance-report.md — the full provenance report[manuscript-name]-macro-manifest.json — the structured macro manifest
for consumption by manuscript-review Pass 13The macro manifest JSON structure:
{
"macros": [
{
"name": "\\bestf",
"value": "0.847",
"source_chain": "results/metrics.json → scripts/gen_macros.py → generated/metrics.tex",
"locations": [
{
"file": "paper.tex",
"line": 142,
"context": "achieving an F1 score of \\bestf{}"
},
{
"file": "paper.tex",
"line": 287,
"context": "The \\bestf{} result represents a substantial improvement"
}
],
"classification": "MACRO-TRACED"
}
],
"bare_numbers": [
{
"value": "50",
"location": {
"file": "paper.tex",
"line": 198,
"context": "convergence after 50 epochs"
},
"section": "methodology",
"should_be_macro": true,
"rationale": "Training parameter — should trace to config",
"classification": "UNTRACED"
}
]
}
Present to the user:
CRITICAL — Value in manuscript has no provenance chain AND is a key result (main finding, abstract metric, table headline number). This means the paper's core claims cannot be verified from code.
HIGH — Value/table/figure is untraced or stale, and appears in results or methodology sections. Reproducibility gap.
MEDIUM — Terminology mismatch, manual ordering, partial table generation, config values hardcoded in scripts. Maintenance and consistency risk.
LOW — Minor issues: display-name mapping missing but terms are close, non-critical figures without generation scripts, cosmetic post-editing of generated figures.
Binary provenance. Every artifact is either traced or not. No "partially reproducible" — partial means broken.
Code is truth. When manuscript and code disagree, the manuscript is wrong until proven otherwise. Flag the disagreement; do not assume the manuscript author "meant to" override code output.
Macros over magic numbers. Every data value in LaTeX should be a macro. Every macro should be generated. No exceptions for "obvious" values.
Pipeline as proof. If make (or equivalent) does not produce the PDF from
raw data, the manuscript is not reproducible. Partial pipelines get partial
credit, not a pass.
Config is not code. Hyperparameters, thresholds, model names, file paths — all belong in config files, not scattered through script bodies.
Ordering is data. The sequence of items in a table or enumeration is an assertion. It must come from code (sort order, enum definition) not from the author's sense of what "looks right."
Timestamps matter. A figure generated last month from a script modified yesterday is suspect. Stale outputs are provenance failures.
Companion, not replacement. This audit checks computational grounding. manuscript-review checks document quality. Both are needed. Neither subsumes the other.
User says any of:
All trigger this skill.