From arxiv-skills
Converts arXiv papers to structured Markdown docs by fetching LaTeX source or PDF, preserving math/sections via pandoc/pdfplumber. Invoke with ID for implementation reference.
npx claudepluginhub ultimatile/arxiv-skills --plugin arxiv-skillsThis skill uses the workspace's default tool permissions.
Automatically converts arXiv papers into structured Markdown documentation for implementation reference.
arxiv_doc_builder/__init__.pyarxiv_doc_builder/check_pdf_skill.pyarxiv_doc_builder/convert_latex.pyarxiv_doc_builder/convert_paper.pyarxiv_doc_builder/convert_pdf_double_column.pyarxiv_doc_builder/convert_pdf_extract.pyarxiv_doc_builder/convert_pdf_simple.pyarxiv_doc_builder/convert_pdf_split_columns.pyarxiv_doc_builder/convert_pdf_with_vision.pyarxiv_doc_builder/fetch_paper.pyarxiv_doc_builder/pdf_converter_lib.pypyproject.tomlreferences/arxiv-fetch.mdreferences/latex-conversion.mdreferences/output-format.mdreferences/pdf-conversion.mdCreates new Angular apps using Angular CLI with flags for routing, SSR, SCSS, prefixes, and AI config. Follows best practices for modern TypeScript/Angular development. Use when starting Angular projects.
Generates Angular code and provides architectural guidance for projects, components, services, reactivity with signals, forms, dependency injection, routing, SSR, ARIA accessibility, animations, Tailwind styling, testing, and CLI tooling.
Executes ctx7 CLI to fetch up-to-date library documentation, manage AI coding skills (install/search/generate/remove/suggest), and configure Context7 MCP. Useful for current API refs, skill handling, or agent setup.
Automatically converts arXiv papers into structured Markdown documentation for implementation reference.
This skill automatically:
Fetches paper materials from arXiv
Converts to structured Markdown
$...$, $$...$$)Generates implementation-ready documentation
{ARXIV_ID}/{ARXIV_ID}.md under the output directory (default: current working directory)Invoke this skill when the user requests:
Use the main orchestrator script or the globally installed convert-paper command:
# Using global command (recommended)
convert-paper ARXIV_ID [--output-dir DIR]
# Using script directly
uv run arxiv_doc_builder/convert_paper.py ARXIV_ID [--output-dir DIR]
--output-dir: Directory where {ARXIV_ID}/{ARXIV_ID}.md will be created. Default: current working directory (not a papers/ subdirectory).The orchestrator:
fetch_paper.py to download materials (with automatic source→PDF fallback)convert_latex.py or convert_pdf_simple.py){output-dir}/{ARXIV_ID}/{ARXIV_ID}.mdAll HTTP requests (curl), file extraction (tar), and directory creation (mkdir) are handled automatically.
The fetcher tries LaTeX source first, then PDF:
.tar.gz, extracts to papers/{ID}/source/, converts with pandocpapers/{ID}/pdf/, extracts text with pdfplumberNo manual intervention needed—the skill handles format detection and fallback automatically.
Generated Markdown includes:
$f(x) = x^2$$$\int_0^\infty e^{-x} dx = 1$$Output location: {output-dir}/{ARXIV_ID}/{ARXIV_ID}.md (default output-dir is current working directory)
Three specialized scripts for direct PDF conversion:
Convert all pages as single-column layout.
uv run arxiv_doc_builder/convert_pdf_simple.py paper.pdf -o output.md
Convert all pages as double-column layout (for academic papers).
uv run arxiv_doc_builder/convert_pdf_double_column.py paper.pdf -o output.md
Extract specific pages with optional double-column processing.
# Extract specific pages
uv run arxiv_doc_builder/convert_pdf_extract.py paper.pdf --pages 1-5,10 -o output.md
# Extract with mixed column layouts
uv run arxiv_doc_builder/convert_pdf_extract.py paper.pdf --pages 1-10 --double-column-pages 3-7 -o output.md
Note: --double-column-pages must be a subset of --pages. Invalid page ranges cause immediate error.
All three scripts share common conversion logic through pdf_converter_lib.py, ensuring consistent behavior while keeping each script focused on its specific use case.
For papers with complex mathematical formulas where text extraction fails, a vision-based approach is available as a manual fallback:
# Generate high-resolution images from PDF
python arxiv_doc_builder/convert_pdf_with_vision.py paper.pdf --dpi 300 --columns 2
This creates page images (with optional column splitting) that can be read manually with Claude's vision capabilities for maximum accuracy. This is NOT part of the automatic workflow—use it only when automatic conversion produces poor results.
See references/pdf-conversion.md for details on vision-based conversion.
Some arXiv papers (e.g., PRL with supplemental material) contain multiple .tex files, each with its own \documentclass. When this happens, the converter warns:
⚠ Found 2 files with \documentclass:
[0] main_paper.tex
[1] supplemental_material.tex
Non-interactive mode, selecting [0] main_paper.tex
If the wrong file was selected, re-run the LaTeX converter directly with --tex-file:
convert_latex.py ARXIV_ID --source-dir {output-dir}/{ARXIV_ID}/source --tex-file {output-dir}/{ARXIV_ID}/source/correct_file.tex --output {output-dir}/{ARXIV_ID}/{ARXIV_ID}.md
When pandoc fails on a LaTeX source, the error may point to \end{document} with unexpected \end. This means pandoc's parser broke down due to a syntax issue elsewhere — \end{document} itself is not the cause. Do NOT attempt broad preprocessing (replacing documentclass, expanding \newcommand, removing environments, etc.) — pandoc handles revtex4/revtex4-2, custom commands, picture environments, and theorem environments correctly.
\begin{document} to \end{document}), then test pandoc with increasing prefixes to find the first line that causes failure.{ or } in the LaTeX source. LaTeX's TeX engine silently tolerates these, but pandoc's structured parser does not.{) is usually sufficient.The source (see, e.g., {\cite{makhlin}) has an unmatched {. LaTeX compiles fine but pandoc fails. Fix: remove the stray {.
Output is created under --output-dir (default: current working directory):
{output-dir}/
└── {ARXIV_ID}/
├── source/ # LaTeX source files (if available)
├── pdf/ # PDF file
├── {ARXIV_ID}.md # Generated Markdown output
└── figures/ # Extracted figures (if any)