Skill

image-ocr

Extract text from a directory of JPG/JPEG images into a single Markdown file using Apple Vision, tesseract, or MLX OCR

npx claudepluginhub varunr89/claude-marketplace --plugin ocr-toolkit

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ocr-toolkit:image-ocr

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Batch-processes a directory of JPG/JPEG images and produces a combined Markdown file containing the OCR text from each image.

SKILL.md

59 lines · ~616 tokens

Similar Skills

document-processing

This skill should be used when the user says "process documents", "extract text from PDF", "OCR this document", "convert PDF to markdown", "extract emails from documents", "parse document", "document conversion", "batch OCR", "extract structured data from PDF", "read PDF", "extract tables from PDF", "convert Word document", "convert docx to markdown", or wants to extract, convert, or process documents and scanned images.

2 files

project

mm-cli-skill

Uses mm CLI to index multimodal directories, explore contents, find files by type/size, extract PDF text/images/video keyframes, search across files, count tokens, view trees.

vlmrun-skills

pdf-extractor

Extracts text and structured data from single or batch PDFs using 9 backends with automatic fallback, OCR for scanned docs, and markdown output via CLI.

1 file

pdf-extractor

Stats

Parent stars0

MaintenanceGood

Last CommitMar 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Image OCR

Batch-processes a directory of JPG/JPEG images and produces a combined Markdown file containing the OCR text from each image.

When to use

Use this skill when the user wants to extract text from one or more JPG/JPEG images -- for example scanned documents, photos of whiteboards, or screenshots.

Backends

vision (default on macOS/Apple Silicon) -- Uses Apple's Vision framework via PyObjC. Fast, on-device, GPU-accelerated. Requires pyobjc-core and pyobjc-framework-Vision.
tesseract -- Uses the tesseract CLI binary. Cross-platform. Optional preprocessing (grayscale, autocontrast, upscale) via Pillow.
mlx_ocr -- Uses MLX-based OCR on Apple Silicon. Runs in a subprocess to isolate potential Metal crashes. Requires Python 3.10+ and mlx-ocr.
auto -- Tries vision first, falls back to tesseract.

Usage

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/ocr_jpgs_to_markdown.py \
  --input <directory-of-jpgs> \
  --output <output.md> \
  [--backend auto|vision|tesseract|mlx_ocr] \
  [--start N] [--limit N] \
  [--languages en-US] \
  [--fast] \
  [--workers N] \
  [--tesseract-lang eng] [--tesseract-psm 6] [--tesseract-oem 1] \
  [--preprocess] [--preprocess-scale 2.0] \
  [--det-lang eng] [--rec-lang lat] \
  [--mlx-worker-python /path/to/python3.11]

Key arguments

Argument	Default	Description
`--input`	`ezgif-...`	Directory containing .jpg/.jpeg files
`--output`	`training_plan_ocr.md`	Output Markdown file path
`--backend`	`auto`	OCR backend: auto, vision, tesseract, mlx_ocr
`--start`	0	Start index (0-based) within sorted file list
`--limit`	None	Max files to process
`--languages`	`en-US`	Comma-separated recognition languages (Vision)
`--fast`	false	Use faster, less accurate recognition level (Vision)
`--workers`	CPU count	Number of parallel workers
`--preprocess`	false	Grayscale/autocontrast/upscale before tesseract
`--preprocess-scale`	2.0	Upscale factor for preprocessing

Dependencies

macOS + PyObjC for Vision backend (pip install pyobjc-core pyobjc-framework-Vision)
tesseract binary for tesseract backend (brew install tesseract)
mlx-ocr + Python 3.10+ for MLX backend
Pillow for preprocessing and MLX

image-ocr

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

image-ocr

Invocation

Context Preview

SKILL.md

Image OCR

When to use

Backends

Usage

Key arguments

Dependencies

Similar Skills

Help us improve

Image OCR

When to use

Backends

Usage

Key arguments

Dependencies