Help us improve
Share bugs, ideas, or general feedback.
From workflows
Verifies academic citations in pandoc markdown drafts against source PDFs using the Gemini File Search API. Useful for validating citation grounding before submission.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:cite-checkThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Scan pandoc-flavored markdown drafts for citations, upload source PDFs to a Gemini File Search store, and verify each citation is grounded in its source. Produces a structured REVIEW-CITES.md report.
Share bugs, ideas, or general feedback.
Scan pandoc-flavored markdown drafts for citations, upload source PDFs to a Gemini File Search store, and verify each citation is grounded in its source. Produces a structured REVIEW-CITES.md report.
GOOGLE_API_KEY env var set (Google AI Studio; on this machine: export GOOGLE_API_KEY="$(cat $GEMINI_API_KEY_FILE)")rclone with a google-drive: remote configured (used to bypass Google Drive FUSE deadlocks)python3 with pymupdf4llm installed (used for PDF text extraction in passage grounding)readwise CLI installed and authenticated (for Readwise article export in source materialization).bib files with file fields mapping bibkeys to PDF paths (e.g., Paperpile's paperpile.bib)Before running cite-check, materialize all sources locally:
cd ${CLAUDE_SKILL_DIR}
bun materialize-sources.ts \
--bib ~/Google\ Drive/My\ Drive/resources/Paperpile/paperpile.bib \
--bib ./references/sources.bib \
--refs ./references \
--drafts ./drafts \
--debug
This populates references/ with local copies of all cited sources:
rclone copy from Google Drive → references/<bibkey>.pdfreferences/<bibkey>.mdAfter materialization, cite-check operates purely locally.
cd ${CLAUDE_SKILL_DIR}
bun install # first time only
# Single bib file
bun cite-check.ts --bib ~/Google\ Drive/My\ Drive/resources/Paperpile/paperpile.bib --drafts <path-to-drafts>
# Multiple bib files (Paperpile + project-local; first bib wins on duplicate keys)
bun cite-check.ts \
--bib ~/Google\ Drive/My\ Drive/resources/Paperpile/paperpile.bib \
--bib ./references/sources.bib \
--drafts <path-to-drafts>
| Flag | Required | Default | Description |
|---|---|---|---|
--bib <path> | Yes* | -- | Path to .bib file (repeatable; first wins on duplicate keys) |
--store <id> | No | auto-create | Use existing File Search store ID |
--drafts <dir> | No | ./drafts | Directory with markdown draft files |
--out <path> | No | <drafts>/REVIEW-CITES.md | Output report path |
--limit <n> | No | all | Check only first N citations (smoke test) |
--dry-run | No | false | Print prompts without querying |
--sequential | No | false | Run queries one-at-a-time instead of Batch API (default: batch) |
--retry-model <model> | No | gemini-3.1-pro-preview | Retry UNSUPPORTED results with a stronger model |
--audit | No | false | Audit source availability without querying (checks Paperpile PDFs) |
--debug | No | false | Verbose logging |
*Either --bib or --store is required.
Ask a specific question about a single source:
# Does Bebchuk2019 support a specific claim?
bun cite-check.ts ask @Bebchuk2019-uq "do expense ratios fall since 2010?" --bib paperpile.bib
# What does a source say about a topic?
bun cite-check.ts ask @Brav2022-ht "what are retail turnout rates?" --bib paperpile.bib --bib sources.bib
The ask mode uploads the single source PDF via the legacy Files API (with manifest caching, 48h TTL), queries Gemini with inline file references, and prints the answer with supporting passages to stdout. No File Search store is created and no report is generated.
When multiple --bib files are provided, file paths are resolved across all bib directories. This handles the common case where a project-local sources.bib has file = {All Papers/...} paths that are relative to the Paperpile folder rather than the project's references/ directory. The tool tries each bib directory as a fallback when the primary path doesn't exist on disk.
[@bibkey] syntaxfile fieldsrclone to avoid EDEADLK deadlocks. Stores persist across runs (no 48h TTL); if cited sources have not changed, the existing store is reused without re-uploading.fileSearch tool with metadata filtering to scope each query to the relevant source documentspymupdf4llm and run token-level LCS alignment to confirm the passage Gemini quoted actually exists in the source. Ungrounded passages are flagged [UNGROUNDED] in the report.The --bib flag expects a .bib file where entries have a file field with a path relative to the bib file's directory. Paperpile's exported paperpile.bib follows this convention:
@article{Hu2024-bm,
author = {Edwin Hu and ...},
title = {{Custom proxy voting advice}},
file = {All Papers/H/Hu et al. 2024 - Custom proxy voting advice.pdf},
year = {2024}
}
All bib entries are parsed. Entries with a file field (~95% of Paperpile entries) are imported into the File Search store. Only sources for bibkeys that are actually cited in the drafts are imported.
[@key] and in-text @key citations[@key, p. 42][@a; @b] (queried together)[^id]: footnote bodiessee, cf., see also, etc. (softens verification)[@key] (holding that X)REVIEW-CITES.md with:
[UNGROUNDED] flag on any SUPPORTED/PARTIAL result whose passage failed grounding verificationBy default, all citation queries are submitted as a single Gemini Batch API job using the File Search tool with metadata filtering. Each query is scoped to the relevant source documents via bibkey metadata, so there is no cross-contamination between queries.
# Default (batch)
bun cite-check.ts --bib paperpile.bib --drafts ./drafts
# Sequential (one query at a time, useful for debugging)
bun cite-check.ts --bib paperpile.bib --drafts ./drafts --sequential
The --sequential flag runs each query as an individual generateContent call instead of a batch job. This is useful for debugging or when batch jobs hit rate limits.
Run --audit before checking citations to see which sources are available and which need to be added:
bun cite-check.ts --bib paperpile.bib --bib sources.bib --drafts ./drafts --audit
The audit checks each cited bibkey for PDF availability on disk (via bib file field with cross-directory resolution). Missing sources should be added to Paperpile.
Exit code is 1 if any sources are missing, 0 if all sources are available. No Gemini store is created and no queries are sent.
After Gemini returns a SUPPORTED/PARTIAL result with a supporting_passage, the tool verifies the passage actually exists in the source PDF text using token-level LCS alignment (ported from langextract's WordAligner). Two gates reject bad matches:
Signal cites (see, cf., etc.) use relaxed thresholds (0.5 coverage / 0.2 density) since they only need conceptual alignment.
Grounding requires extracting text from the source PDF. This uses pymupdf4llm (via extract-pdf-text.py) which preserves document structure, footnotes, and tables as clean markdown. Extracted text is cached in <drafts>/.cite-check-text/.
PDF files stored on Google Drive Desktop's FUSE mount (~/Google Drive/My Drive/) are subject to EDEADLK deadlocks when accessed concurrently or when not locally cached. The tool detects Google Drive paths — including through symlinks (e.g., references/All Papers → ~/Google Drive/.../Paperpile/All Papers) — and uses rclone copyto to fetch them to a local cache (~/.cache/cite-check-pdfs/) before upload or text extraction. Requires rclone with a google-drive: remote configured.
cite-extract.ts -- Pure citation extraction (no I/O)
gemini.ts -- Gemini API wrapper (File Search store CRUD, query, legacy upload for ask mode, rclone FUSE bypass)
grounding.ts -- Post-hoc passage grounding (tokenizer, LCS aligner)
extract-pdf-text.py -- PDF text extraction via pymupdf4llm
materialize-sources.ts -- Copy Paperpile PDFs + Readwise articles to references/
cite-check.ts -- CLI orchestrator (extract -> import -> query -> ground -> report)
npx claudepluginhub edwinhu/workflows --plugin workflowsVerifies citations in academic/legal manuscripts by checking existence, accuracy, quotes, and claim grounding using Paperpile, BibTeX, and RAG.
Audits citations and source claims in academic manuscripts. Verifies whether cited papers support attributed claims and checks quantitative claims.
Verifies every citation in a manuscript by fetching cited works to detect ghost papers, wrong IDs, inverted claims, and dead links. Includes optional fix mode for bib corrections and claim rewrites.