By jordancoin
Extract clean text from any PDF — even the ones Chrome mangles. 7-level cascade resolves broken Unicode mappings locally via WASM. Returns plain text, basic markdown, or structured markdown with TOC and headings.
The agent interface for PDF to Text, the Chrome extension that fixes broken copy-paste from PDFs.
Chrome extension = humans. This plugin = agents. Same extraction engine.
/plugin install pdf-to-text@JordanCoin/pdf-to-text
The engine downloads automatically on first use (~1.2 MB WASM). No build step, no dependencies beyond Node.
3 MCP tools available in every Claude Code session:
| Tool | What it does |
|---|---|
extract_pdf | Extract text from any PDF (URL or local path). Returns plain text, basic markdown, or structured markdown with TOC, headings, and token counts. |
render_markdown | Fetch and parse any .md file. Returns sections, headings, and token estimate. |
list_recent | Show recently extracted PDFs from the local cache. |
1 skill that routes automatically:
When you mention a PDF or ask to extract text, Claude invokes /pdf-to-text:extract-pdf which calls the MCP tools.
Once installed, just talk to Claude:
Or call the tools directly:
Use the extract_pdf tool on /path/to/document.pdf with format structured
The extraction engine uses a 7-level fallback cascade to recover text from PDFs with broken Unicode mappings — the ones where Chrome's copy-paste gives you gibberish. Everything runs locally via WebAssembly. Your PDFs never leave your machine.
The same engine powers the PDF to Text Chrome extension. Install both for full coverage: the extension fixes copy-paste in the browser, this plugin gives agents the same capability.
# Page N headers per page. Quick and simple.<!-- page N --> markers. For RAG pipelines and LLM consumption.The plugin checks for new versions automatically. When an update is available, Claude will ask before upgrading. The engine binary updates independently of the plugin code.
All extraction happens locally. No PDF data is sent to any server. The only network request is checking for engine updates from GitHub Releases.
Plugin wrapper: MIT. Extraction engine: proprietary.
Admin access level
Server config contains admin-level keywords
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
200+ iOS/Swift/Xcode skills with phase-aware routing. Full lifecycle: ideation, design, develop, test, deploy, iterate.
Navigate markdown, PDF, and YAML documentation structure without reading full files. Bundles a /docmap skill that teaches Claude when and how to use the docmap CLI.
Combined code and documentation mapping for efficient LLM context
npx claudepluginhub jordancoin/pdf-to-text --plugin pdf-to-textParse PDF / Office / image files into clean Markdown via MinerU — zero-dependency, AI-Native, auto-routing between the free Agent API and the token-gated Standard API, with 15 content-tool delivery sinks.
Convert any file, URL, or media to clean Markdown — PDF, EPUB, HTML, images, YouTube, audio, video, and more
PDF data extraction with multi-backend support (markitdown, docling, pdfplumber, etc.)
Transform academic PDFs into structured literature notes and critical-thinking canvases for Obsidian
Anthropic's production PDF skill — extract text and tables, fill forms and generate PDFs. Essential for financial statements, contracts and scanned invoices.
Extract text as structured, semantic Markdown from a PDF.