Help us improve
Share bugs, ideas, or general feedback.
From academic-research
Adds abstracts, attaches PDFs, enriches metadata, deduplicates, and fixes BBT citation keys in a Zotero library. For standalone library housekeeping, not full systematic reviews.
npx claudepluginhub mronkko/claude-academic-research --plugin academic-researchHow this skill is triggered — by the user, by Claude, or both
Slash command
/academic-research:zotero-operationsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Glossary:** unfamiliar with **MCP**, **BBT**, **DOI**, **ISSN**?
Manage Zotero reference libraries via Python using the pyzotero client. Retrieve, create, update, delete items, collections, tags, and attachments through the Zotero Web API v3.
Audits Zotero libraries: detects duplicate DOIs, orphan items, tag issues, bloat, and generates preview cleanup plans. Invoke for Zotero audits, duplicates, tag hygiene, or cleanup proposals.
Imports and synchronizes Zotero literature with an Obsidian vault. Searches Zotero library, inspects items/collections, imports notes with attachments and annotations, and batch-ingests collections.
Share bugs, ideas, or general feedback.
Glossary: unfamiliar with MCP, BBT, DOI, ISSN? See skills/_glossary.md for one-line definitions of every acronym this skill uses.
Before any step below, verify the plugin has been configured:
python3 "${CLAUDE_PLUGIN_ROOT:-.}/scripts/setup/check_configured.py"
If the result is NOT CONFIGURED, stop immediately and tell the user:
The academic-research project has not been set up on this machine yet. Run the setup skill or setup wizard first to configure API keys, MCP servers, and permission rules. Do not attempt Zotero operations before that.
Do not call MCP tools, run scripts, or proceed with the procedure. Running the setup skill/wizard is the required first step.
If the result is configured, proceed.
systematic-review — who owns enrichment?Both this skill and systematic-review list the enrichment scripts
(enrich_abstracts.py, enrich_pdfs.py, enrich_dois.py,
audit_zotero_library.py). The scripts are the same; the operational
context differs. The decision is simple:
systematic-review when enrichment is part of a PRISMA-style
pipeline that will flow into abstract screening and full-text coding.
Stage tags (abstract:*, fulltext:*), the screening-config
round-trip, QA evaluator agents, and export to coded_papers.csv
are all in scope. The audit report drives which items need
enrichment before screening can start.Signal for the harness. If the user's prompt mentions PRISMA,
systematic review, screening, inclusion criteria, coding, QA
evaluators, adjudication, or anything that implies a full-text
review pipeline — route to systematic-review. If it's
"just add abstracts / PDFs / tags to my Zotero library", stay here.
A half-SLR library that also needs housekeeping is still SR work:
delegate to systematic-review and note the housekeeping step is
a sub-task of that pipeline, not an independent operation.
Overlap is not redundancy. The same script (enrich_pdfs.py)
behaves identically whether called from SR context or ad-hoc
context — the scripts don't know which skill invoked them. What
differs is what comes next: SR context expects
abstract_screen.py to read the enriched library; ad-hoc context
stops after enrichment.
Do not list the plugin's scripts/pipelines/ directory to figure
out what is available. The mapping below is authoritative; use the
exact invocation.
| User intent | Script | Invocation |
|---|---|---|
| Audit a library for items missing abstracts / PDFs / empty stubs | audit_zotero_library.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/audit_zotero_library.py --group <id> |
| Add missing abstracts to items | enrich_abstracts.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_abstracts.py --filter-keys-file .claude/audit/audit.missing_abstract.keys |
| Attach missing PDFs (fast HTTP cascade) | enrich_pdfs.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_pdfs.py --filter-keys-file .claude/audit/audit.missing_pdf.keys |
| Attach PDFs from Wiley journals (TDM token route) | enrich_pdfs.py --sources wiley | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_pdfs.py --sources wiley --filter-keys-file .claude/audit/audit.missing_pdf.keys |
| Attach PDFs from Cloudflare-gated publishers (Sage, APA, T&F, Emerald, …) | enrich_pdfs.py --sources browser | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/enrich_pdfs.py --sources browser --filter-keys-file .claude/audit/audit.missing_pdf.keys |
Generate references.bib from a manuscript's citation keys | generate_bib.py | uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/generate_bib.py <project_dir> |
The audit script writes both a JSON report and three .keys files
(.claude/audit/audit.{missing_abstract,missing_pdf,empty_stubs}.keys)
— feed them straight to the next stage's --filter-keys-file flag.
Do not improvise a jq step to extract keys; the script wrote them
for you.
Each script reads API keys from ~/.config/academic-research/config.toml
(the /setup wizard writes it) inside its own process via
core.config_loader. The keys never pass through your tool layer.
Some pipeline stages do things the user may find startling if unannounced. Always tell the user what is about to happen before running these stages:
enrich_pdfs.py --sources browser — opens a visible Chromium window
on their desktop; they may need to solve a Cloudflare challenge or
sign in via institutional SSO. Tell them before launching:
"Next step: browser-based PDF fetcher. A Chromium window will
open on your desktop. For each publisher you may need to click
through a Cloudflare challenge once. Ready?" and wait for
acknowledgement.enrich_pdfs.py on a large library — can take 5–15 minutes with
the default multi-source cascade. Warn if > 20 items.enrich_pdfs.py --sources wiley — silent HTTP via the Wiley TDM
token, no warning needed.uv run command installs Python dependencies
(~1–20 s). Mention it if noticeable.mcp__zotero__zotero_list_libraries if you need to see what is
available. Never guess the group ID.audit_zotero_library.py --group <id>. Read the summary counts.
The script writes .claude/audit/audit.{missing_abstract,missing_pdf, empty_stubs}.keys alongside the JSON report (project-local)..keys file
to --filter-keys-file. The audit script prints the exact commands
in its "Next steps" output — use those verbatim.Retracted papers in a Zotero library are a silent data-quality
problem — citing a retracted paper is a fact-check failure mode the
author almost certainly wants to catch. Scite exposes a free
retraction-watch endpoint that the Zotero MCP server wraps as
mcp__zotero__scite_check_retractions (no Scite account required).
Offer the check as a post-audit step when any of the following is true: the library is being prepared for submission, the user mentions bibliography hygiene / citation integrity, or the audit report shows a mature library (no stubs, few missing abstracts). The check queries each DOI in the collection against the retraction registry and reports matches.
Invocation (agent-mediated — the pipeline script can't call MCP tools directly):
mcp__zotero__scite_check_retractions(
group_id=<group>,
collection_key=<collection>,
)
Report any retracted items to the user with the matching citation
key; ask whether to tag them (retracted:flag is the convention)
and/or remove them from the collection. Flag, don't auto-remove —
the author decides. For SLR projects where retraction screening is
part of PRISMA quality assessment, the systematic-review skill
has the equivalent step inside its pipeline.
If the user's request does not clearly map to one of the rows above, ask before acting. Specifically:
ls to see what scripts
exist (they are listed here — this is authoritative).~/.config/academic-research/config.toml
under any circumstance — scripts read it internally.If you truly need an operation the table above does not cover, tell the user which operation is missing and propose adding a new shipped script to the plugin. A one-off improvised script has no place here — it breaks the security model (API keys flow through your context) and sidesteps pre-approved permissions.
When you need to talk to the user's Zotero library, the access hierarchy is:
mcp__zotero__zotero_get_item_metadata,
mcp__zotero__zotero_get_item_children, mcp__zotero__zotero_search_items,
mcp__zotero__zotero_get_item_fulltext, …). These cover most
reads — item metadata, children, attachments, items lists,
fulltext, annotations.scripts/pipelines/zotero_io.py and scripts/pipelines/bbt_client.py
for operations the MCP doesn't cover — Better BibTeX endpoints
(get_bibtex_export, bbt_json_rpc, get_bbt_keys,
populate_missing_bbt_keys), bulk transactional writes
(batch_update_tags, upsert_child_note, merge_duplicate_item),
and any other custom Zotero operation.http://127.0.0.1:23119/... is not a third
option. It is a defect signal. If you find yourself writing
urllib.request.urlopen("http://127.0.0.1:23119/...") or
curl localhost:23119, that means the plugin is missing a helper.Stop, name the gap to the user, and propose adding a method to
zotero_io.py (or bbt_client.py for BBT) — do not work around it
inline.
A direct-HTTP call by the agent bypasses retries, schema versioning, cross-project reuse, and the one-line definition-of-Zotero-shape that other consumers rely on. Inline urllib also drives the agent back into improvising pipeline code, which the standing rule forbids.
Implementation note for plugin contributors. The CI guard at
tests/unit/test_no_direct_localhost_zotero.py greps every file
under scripts/pipelines/ for 127.0.0.1:23119 or localhost:23119
and fails the build on a match outside zotero_io.py and
bbt_client.py. New code must route through those modules.
pyzotero.zotero.Zotero(group, "group", key, local=True) reads from
localhost:23119 (Zotero must be running). Much faster than the remote
API for bulk operations — a library of a few thousand items that would
time out on api.zotero.org returns in milliseconds from the local
client.
Use the remote API (api.zotero.org) for writes: PATCH, new items,
child notes, tag updates.
brownUsingDailyStock1985a).uv run ${CLAUDE_PLUGIN_ROOT:-.}/scripts/pipelines/generate_bib.py <project_dir>.Smith2019.Extra field to override or pin BBT keys.http://localhost:23119/better-bibtex/json-rpc.For operations that need to classify every item's attachment state, fetch all attachments in one pass:
attachments = local.everything(local.items(itemType="attachment"))
by_parent = {}
for a in attachments:
parent = a["data"].get("parentItem")
if parent:
by_parent.setdefault(parent, []).append(a)
Classify into real files (has md5) vs. empty stubs (no md5). Avoids
N+1 remote queries. Delete empty PDF stubs before processing — Zotero
creates these when a PDF import fails.
/items/{key}/file with md5, filename, filesize,
mtime → get S3 upload authorization.prefix + pdf_bytes + suffix from the
authorization response./items/{key}/file with upload={uploadKey} to register.Validate PDFs before upload: %PDF magic bytes AND parse-test (some
downloaders save HTML-with-200 or corrupted PDFs that pass magic-bytes
but fail to parse).
Creating duplicates has three distinct failure modes. Any import script must handle all three:
Against the existing Zotero library. Match each input row by DOI,
falling back to normalised_title|first_author_lastname. If matched,
add to the target collection and backfill the abstract if empty.
Within the import batch itself. As the loop processes rows, keep
growing sets of batch_doi_seen and batch_title_seen. A second row
for the same paper (e.g. Scopus + WoS where only one has a DOI) must
merge into the already-queued item, not create a new one.
Post-import. Always run mcp__zotero__zotero_find_duplicates at
the end of the import. Pre-existing library items with incomplete
metadata can slip past the first two checks; the post-check is the
safety net.
Fix the data, don't work around it. If post-import surfaces duplicates, audit the upstream source first (search-API field mapping, manual entries, out-of-scope items), fix them, re-run. Only add new fallback matching after confirming the missing metadata is legitimate.
When a pipeline writes decisions or structured extractions back to Zotero (e.g. LLM screening decisions, coded fields), make them reviewable in Zotero itself:
fulltext:include / fulltext:exclude).SLR Coding). The local Zotero client reads item version + existing
tags; the remote API writes PATCH and the child note.--full-recode, delete prior named child notes before re-writing
so re-runs don't accumulate stale notes.mcp__zotero__zotero_add_by_doi when a DOI exists (preferred).mcp__zotero__zotero_add_by_url only when no DOI exists.mcp__zotero__zotero_get_item_metadata with format="bibtex". The
key is the first argument of the BibTeX entry.Extra field to pin a citation key.--full-recode but not deleting
prior child notes first.~/.config/academic-research/config.toml via
cat, head, tail, grep, less, more, awk, sed, a
Python script, or any other command. NEVER read that file. It
holds API keys. Pipeline scripts read it via Python's open()
outside your tool layer; you have no legitimate reason to inspect
it. If you feel like you need to debug by looking inside, you are
on the wrong track — ask the user to re-run /setup instead.