Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By dnvriend
A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown
npx claudepluginhub dnvriend/ollama-deepseek-ocr-tool --plugin ollama-deepseek-ocr-tool
A CLI tool for batch OCR processing of document images using DeepSeek-OCR via Ollama.
Convert sequences of textbook pages, lecture slides, or scanned documents into a single, coherent markdown file suitable for note-taking applications like Obsidian.
Key Features:
# 1. Install Ollama
brew install ollama
# 2. Start Ollama service
ollama serve
# 3. Pull DeepSeek-OCR model (~6GB download)
ollama pull deepseek-ocr
cd ollama-deepseek-ocr-tool
uv sync
uv tool install .
# Basic: Process all PNG files in current directory
ollama-deepseek-ocr-tool "*.png" output.md
# Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md
# Process lecture slides from subdirectory
ollama-deepseek-ocr-tool "lectures/week-5/*.jpg" week-5-summary.md
# Process numbered scans in order
ollama-deepseek-ocr-tool "scan-00*.png" document.md
# INFO level - High-level operations
ollama-deepseek-ocr-tool "*.png" output.md -v
# DEBUG level - Detailed processing info (file sizes, word counts)
ollama-deepseek-ocr-tool "*.png" output.md -vv
# TRACE level - Full HTTP request/response logs
ollama-deepseek-ocr-tool "*.png" output.md -vvv
# Show full help with examples and troubleshooting
ollama-deepseek-ocr-tool --help
<!-- Source: IMG_4170.png -->
[extracted text from page 1]
---
<!-- Source: IMG_4171.png -->
[extracted text from page 2]
# Install dependencies
make install
# Run linting
make lint
# Format code
make format
# Type check
make typecheck
# Security checks
make security
# Full pipeline
make pipeline
See ARCHITECTURE.md for detailed documentation on:
MIT
Built with assistance from AI coding tools and reviewed by humans.
Share bugs, ideas, or general feedback.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Expert in vision models, OCR systems, barcode detection, and visual AI. Stays current with latest models (GPT-4V, Claude Vision, Mistral-OCR, etc.), optimization techniques, and specialized libraries. Use PROACTIVELY for image processing, document analysis, or visual AI tasks.
Parse PDF / Office / image files into clean Markdown via MinerU — zero-dependency, AI-Native, auto-routing between the free Agent API and the token-gated Standard API, with 15 content-tool delivery sinks.
Computer vision image processing and analysis
Convert any file, URL, or media to clean Markdown — PDF, EPUB, HTML, images, YouTube, audio, video, and more
Development tools and documentation for building applications with the Claude Agent SDK
Agent Skills for visual AI tasks including image understanding, video processing, document extraction, and multi-modal generation using VLM Run's Orion agent
Claude Code Scheduler
A CLI that provides text-to-speech using kokoro
Generate images using Google Gemini with AI prompt optimization, cost tracking, and multi-turn conversations
vector rag gui
PDF to PowerPoint converter CLI tool
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim