Skill

pdf-ocr

Extract text from PDF files using Apple Vision OCR, optimized for Apple Silicon

npx claudepluginhub varunr89/claude-marketplace --plugin ocr-toolkit

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ocr-toolkit:pdf-ocr

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Converts PDF pages to images using PyMuPDF, then runs Apple Vision OCR on each page in parallel. Produces Markdown, plain text, or JSONL output.

SKILL.md

59 lines · ~539 tokens

Similar Skills

pdf-ocr

Converts PDF files to markdown using local GLM-OCR via Ollama. Renders each page to image, runs OCR, assembles clean text output. Use for extracting text from PDFs.

1 file4 tools

ac-document-gen

document-processing

This skill should be used when the user says "process documents", "extract text from PDF", "OCR this document", "convert PDF to markdown", "extract emails from documents", "parse document", "document conversion", "batch OCR", "extract structured data from PDF", "read PDF", "extract tables from PDF", "convert Word document", "convert docx to markdown", or wants to extract, convert, or process documents and scanned images.

2 files

project

pdf-extractor

Extracts text and structured data from single or batch PDFs using 9 backends with automatic fallback, OCR for scanned docs, and markdown output via CLI.

1 file

pdf-extractor

Stats

Parent stars0

MaintenanceGood

Last CommitMar 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

PDF OCR

Converts PDF pages to images using PyMuPDF, then runs Apple Vision OCR on each page in parallel. Produces Markdown, plain text, or JSONL output.

When to use

Use this skill when the user wants to extract text from a PDF file -- especially scanned PDFs, image-based PDFs, or PDFs where copy-paste produces garbled text.

Usage

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/pdf_ocr.py <input.pdf> \ [-o output_file] \ [-f markdown|text|jsonl] \ [--dpi 200] \ [--workers N] \ [--languages en-US] \ [--fast] \ [--keep-images] [--images-dir <dir>] \ [--stdout]

Key arguments

Argument

Default

Description

pdf (positional)

required

Input PDF file path

-o, --output

<pdf_name>.md

Output file path

-f, --format

markdown

Output format: markdown, text, or jsonl

--dpi

200

Resolution for rendering PDF pages (higher = better quality, slower)

--workers

CPU count

Number of parallel OCR workers

--languages

en-US

Comma-separated recognition languages

--fast

false

Use faster, less accurate recognition

--keep-images

false

Keep the extracted page images after OCR

--images-dir

temp dir

Directory to save page images (requires --keep-images)

--stdout

false

Print extracted text to stdout instead of writing to file

Pipeline

PDF pages are rendered to PNG images at the specified DPI using PyMuPDF (fitz)

Apple Vision OCR runs in parallel across pages using ThreadPoolExecutor

Results are assembled in page order and written to the chosen output format

Output formats

markdown -- Each page becomes a ## Page N section with the OCR text

text -- Plain text with === Page N === separators

jsonl -- One JSON object per line with page, text, and backend fields

Dependencies

PyMuPDF (pip install pymupdf) -- PDF to image conversion

macOS + PyObjC (pip install pyobjc-core pyobjc-framework-Vision pyobjc-framework-Cocoa) -- Vision OCR

PDF OCR

Converts PDF pages to images using PyMuPDF, then runs Apple Vision OCR on each page in parallel. Produces Markdown, plain text, or JSONL output.

When to use

Use this skill when the user wants to extract text from a PDF file -- especially scanned PDFs, image-based PDFs, or PDFs where copy-paste produces garbled text.

Usage

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/pdf_ocr.py <input.pdf> \
  [-o output_file] \
  [-f markdown|text|jsonl] \
  [--dpi 200] \
  [--workers N] \
  [--languages en-US] \
  [--fast] \
  [--keep-images] [--images-dir <dir>] \
  [--stdout]

Key arguments

Argument	Default	Description
`pdf` (positional)	required	Input PDF file path
`-o, --output`	`<pdf_name>.md`	Output file path
`-f, --format`	`markdown`	Output format: markdown, text, or jsonl
`--dpi`	200	Resolution for rendering PDF pages (higher = better quality, slower)
`--workers`	CPU count	Number of parallel OCR workers
`--languages`	`en-US`	Comma-separated recognition languages
`--fast`	false	Use faster, less accurate recognition
`--keep-images`	false	Keep the extracted page images after OCR
`--images-dir`	temp dir	Directory to save page images (requires --keep-images)
`--stdout`	false	Print extracted text to stdout instead of writing to file

Pipeline

PDF pages are rendered to PNG images at the specified DPI using PyMuPDF (fitz)
Apple Vision OCR runs in parallel across pages using ThreadPoolExecutor
Results are assembled in page order and written to the chosen output format

Output formats

markdown -- Each page becomes a ## Page N section with the OCR text
text -- Plain text with === Page N === separators
jsonl -- One JSON object per line with page, text, and backend fields

Dependencies

PyMuPDF (pip install pymupdf) -- PDF to image conversion
macOS + PyObjC (pip install pyobjc-core pyobjc-framework-Vision pyobjc-framework-Cocoa) -- Vision OCR

pdf-ocr

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

pdf-ocr

Invocation

Context Preview

SKILL.md

PDF OCR

When to use

Usage

Key arguments

Pipeline

Output formats

Dependencies

Similar Skills

Help us improve

PDF OCR

When to use

Usage

Key arguments

Pipeline

Output formats

Dependencies