Skill

paper-retriever

Retrieves scientific paper PDFs from arXiv URLs, DOIs, titles, or other identifiers, saving to papers/ directory. Uses direct arXiv downloads or sci-hub with Chrome for paywalled papers.

Python

Bash

cli-tools

automation

npx claudepluginhub ctoth/research-papers-plugin --plugin research-papers

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Download a scientific paper PDF to the `papers/` directory.

Supporting Assets

scripts/_paper_id.pyscripts/emit_nested_retriever_fallback.pyscripts/fetch_paper.pyscripts/search_papers.py

SKILL.md

Similar Skills

download-papers

Downloads academic paper PDFs given URL, DOI, title, or citation. Searches open-web sources, Sci-Hub mirrors, then arXiv using curl and grep.

download-papers

papers

274

Searches academic literature via arXiv, Semantic Scholar, and open-access sources. Fetches and parses PDFs for abstracts, key findings, methodology, and citations. Use for research, literature reviews, or formal citations.

tome

Finding Open Access Papers

Uses Unpaywall API to find free full-text open access versions of paywalled academic papers by DOI. Useful when direct DOI resolution, publisher sites, or PMC fail.

research-superpowers

Stats

Parent Repo Stars27

Parent Repo Forks5

Last CommitMar 27, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Paper Retriever: $ARGUMENTS

Download a scientific paper PDF to the papers/ directory.

Script Paths

The command examples below use scripts/... paths that are relative to this skill's directory. Resolve them against the installed skill location, not the user's project root.

Step 1: Parse Input

The argument can be:

An arxiv URL: https://arxiv.org/abs/XXXX.XXXXX or https://arxiv.org/pdf/XXXX.XXXXX
A DOI: 10.XXXX/...
An ACL Anthology URL: https://aclanthology.org/...
An AAAI URL: https://ojs.aaai.org/...
A paper title (will search)

$ARGUMENTS names exactly one intended paper. Preserve that identity throughout retrieval.

The goal of this skill is to obtain the intended paper's PDF. Metadata resolution and canonical naming support that goal; they are not the definition of success.

Step 1.5: Normalize to an Identity-Preserving Input

Before downloading, decide whether the input is already a strong paper identifier:

Strong inputs: arxiv ID/URL, DOI/DOI URL, ACL Anthology URL, S2 paper ID, direct PDF URL, exact paper title
Weak inputs: publisher landing pages, journal homepages, PMC/article pages, society pages, or generic URLs that may require interpretation before they identify one paper cleanly

If the input is weak, first infer the intended paper and continue with the strongest identity-preserving input available. Prefer:

DOI
ACL Anthology ID/URL
arXiv ID/URL
S2 paper ID
exact paper title
the original weak URL only if it is still the clearest remaining identifier

Do not keep retrying a weak URL mechanically when a stronger identifier is already apparent.

Step 2: Search (title input only)

If the input is a paper title (not a URL or DOI), search for it first:

uv run scripts/search_papers.py "PAPER TITLE" --source all --max-results 5 --json

Review the results. If there's a clear match, extract the strongest available identifier and continue to Step 3. If ambiguous, present the top results to the user and ask which one.

For weak URL input, use the inferred title or metadata from Step 1.5 and perform the same search/normalization before Step 3.

Step 3: Download

Use the fetch_paper.py script to download the PDF and extract metadata:

uv run scripts/fetch_paper.py "<identifier>" --papers-dir papers/

Where <identifier> is the arxiv ID/URL, DOI, ACL URL, or S2 paper ID from the input or search results.

If you had to normalize a weak input first, use the normalized identifier here rather than the original weak URL.

Use fetch_paper.py as the first download path, not as the definition of whether retrieval is possible. One metadata-resolution failure does not by itself mean the paper is unretrievable.

The script will:

Resolve metadata (title, authors, year, abstract) from arxiv or Semantic Scholar
Attempt PDF download via waterfall: direct download → Unpaywall → report fallback needed
Only after a real PDF is downloaded, create the canonical paper directory (Author_Year_ShortTitle)
Only after a real PDF is downloaded, write metadata.json alongside paper.pdf

Before treating Step 3 as successful, verify that the resolved metadata still matches the intended paper. If not, stop on mismatch.

If fetch_paper.py obtains the intended paper's PDF through an allowed path, Step 3 succeeded even if metadata had to be materialized afterward.

Step 4: Handle Fallback (if needed)

If fetch_paper.py returns "fallback_needed": true, the paper couldn't be downloaded via open-access channels. In that case it returns the planned dirname/directory plus inline metadata, but it does not create metadata.json or the paper directory yet. Fall back to browser automation for sci-hub:

Try browser automation in this order:

Option 1: Any available browser automation (preferred)

If you have browser automation available, use it to:

Open https://sci-hub.st/
Find the input field and enter the URL or DOI
Submit the form
Inspect the result page for an iframe, embed, or direct PDF link

If needed, evaluate JavaScript in the page to extract the PDF URL:

const iframe = document.querySelector('#pdf');
if (iframe) return iframe.src;
const embed = document.querySelector('embed[type="application/pdf"]');
if (embed) return embed.src;
const links = [...document.querySelectorAll('a')].filter(a => a.href.includes('.pdf'));
return links.map(a => a.href);

Create the paper directory and download the PDF: mkdir -p "./papers/<dirname>" && curl -L -o "./papers/<dirname>/paper.pdf" "EXTRACTED_URL" 2>&1
Materialize metadata.json only after paper.pdf exists: uv run scripts/fetch_paper.py "<identifier>" --papers-dir papers/ --output-dir "<dirname>" --metadata-only

If browser automation or a direct PDF URL yields the intended paper's PDF, retrieval succeeded. Finalize metadata afterward.

Option 2: No browser automation

Report the DOI/URL and ask the user to download the PDF manually to the paper directory.

Step 5: Verify

file "./papers/<dirname>/paper.pdf"
ls -la "./papers/<dirname>/"

Confirm:

PDF exists and is valid ("PDF document" in file output)
File size is reasonable (>100KB for a real paper)
metadata.json exists with title, authors, year

The core success condition is that the intended paper's PDF exists at ./papers/<dirname>/paper.pdf. metadata.json should also exist by the end of the step, but earlier metadata-resolution failures do not negate successful retrieval if the correct PDF and final metadata are in place.

Output

When done, report:

Retrieved: papers/<dirname>/paper.pdf
Source: [arxiv/aclanthology/unpaywall/sci-hub]
Size: [file size]

Error Handling

If fetch_paper.py fails metadata resolution: try the other source (arxiv vs S2)
If metadata resolution or search yields a different paper than the intended one: stop and report the mismatch
If all download methods fail: report failure, provide the URL for manual download
ALWAYS clean up temp files on failure: rm -f ./papers/temp_*.pdf

CRITICAL: File Modified Error Workaround

If Edit/Write fails with "file unexpectedly modified":

Read the file again
Retry the edit
Try path formats: ./relative, C:/forward/slashes, C:\back\slashes
Prefer your file editing tools over shell text manipulation (cat, sed, echo)
If all formats fail, STOP and report

CRITICAL: Parallel Swarm Awareness

You may be running alongside other agents in parallel.

FORBIDDEN GIT COMMANDS - NEVER USE THESE:

git stash, git restore, git checkout, git reset, git clean