From agent-research-flow
Use when the caller needs the full text of a single academic paper as a Markdown file — given a paper identifier (URL, DOI, arXiv ID, Semantic Scholar ID, title, or local PDF path) and a destination path, this skill locates the PDF, downloads it, and converts it to Markdown. Does NOT resolve bibliographic metadata and does NOT modify any literature index — callers handle those concerns themselves.
npx claudepluginhub yunhaom94/agent-research-flowThis skill uses the workspace's default tool permissions.
Locate a paper's PDF, download it, and convert it to Markdown at a specified output path. Three steps — no metadata extraction, no index updates.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Locate a paper's PDF, download it, and convert it to Markdown at a specified output path. Three steps — no metadata extraction, no index updates.
The caller provides:
.md file should be writtenscripts/download_pdf.py)The S2_API_KEY environment variable is available; include it as -H "x-api-key: $S2_API_KEY" on Semantic Scholar calls.
Produce a list of candidate PDF URLs (or a local PDF path) to try. If the input is already a local PDF, skip to Step 3 using that path.
Try sources in this order based on the input type; collect every URL that looks viable so Step 2 has fallbacks:
.pdf or Content-Type is application/pdf): use as-is.2301.00001, arxiv.org/abs/...): https://arxiv.org/pdf/<arxiv_id>.pdf.curl -s -H "x-api-key: $S2_API_KEY" "https://api.semanticscholar.org/graph/v1/paper/DOI:<doi>?fields=openAccessPdf" → openAccessPdf.urlcurl -s "https://api.unpaywall.org/v2/<doi>?email=parse-paper@example.com" → best_oa_location.url_for_pdfcurl -s -H "x-api-key: $S2_API_KEY" "https://api.semanticscholar.org/graph/v1/paper/<paper_id>?fields=openAccessPdf,externalIds" → openAccessPdf.url, and if externalIds.ArXiv or externalIds.DOI are present, derive those candidates too.curl -s -H "x-api-key: $S2_API_KEY" "https://api.semanticscholar.org/graph/v1/paper/search?query=<url_encoded_title>&limit=3&fields=title,authors,year,openAccessPdf,externalIds".
openAccessPdf.url if present, and also derive candidates from externalIds.ArXiv (→ https://arxiv.org/pdf/<id>.pdf) and externalIds.DOI (→ run the DOI path above). Use every candidate you can derive so Step 2 has fallbacks.Note on deriving candidates from externalIds: any response that includes externalIds (Semantic Scholar ID path or title-search path) should expand into arXiv/DOI candidates the same way. Multiple PDF sources for the same paper is fine — just collect them all; Step 2 picks whichever works first.
STOP condition: If no source produces any candidate PDF URL — and the input isn't a local PDF — stop and report to the caller that the paper could not be found. Do not fabricate a URL. Do not proceed to Step 3 without a PDF.
For each candidate URL from Step 1, attempt:
python3 "<skill_dir>/scripts/download_pdf.py" \
--url "<candidate_url>" \
--output "<project_root>/.tmp/<unique_name>.pdf"
<project_root> is the root of the project the caller is operating in (i.e. the user's current working project, not the skill directory). If the caller didn't supply one, use any scratch location such as /tmp/parse-paper/<unique_name>.pdf.
<unique_name> must be unique per invocation — use the arXiv ID, a sanitized DOI, or a short sanitized form of the title. Never hard-code paper.pdf, because parallel calls (e.g. from literature-search) will overwrite each other.
The script validates that the response is actually a PDF (checks %PDF- magic bytes) and retries transient network errors. A non-zero exit means that specific URL failed.
If a candidate fails, try the next candidate. If every candidate from Step 1 has been exhausted without a successful download, go back to Step 1 and look harder using the WebFetch and WebSearch tools — try a different query formulation, try an author-hosted copy, try a mirror. Only give up and report failure to the caller after those tools genuinely turn up nothing.
Create the temp directory (mkdir -p) before calling the script. Clean up the downloaded PDF in Step 3 after conversion.
Invoke the markitdown skill (via the Skill tool) to convert the downloaded (or user-provided local) PDF to Markdown, writing the result to the output_md path the caller specified. Pass the input PDF path and the output_md destination as arguments — let the markitdown skill decide the concrete invocation (CLI vs. Python API) and handle availability checks.
After a successful conversion:
.md file exists at output_md and is non-empty before returning.Return a short status to the caller:
- **status**: ok | not_found | download_failed
- **output_md**: <path> (only when status is ok)
- **source_url**: <url> (the candidate that succeeded, or "local" for user-provided PDFs)
On not_found or download_failed, include a one-line reason so the caller knows which stage failed.