ollama-deepseek-ocr-tool

A CLI tool for batch OCR processing of document images using DeepSeek-OCR via Ollama.

Overview

Convert sequences of textbook pages, lecture slides, or scanned documents into a single, coherent markdown file suitable for note-taking applications like Obsidian.

Key Features:

⚡ Fast - ~3s per image on M4 (faster than cloud OCR services)
🔒 Private - Runs entirely on your machine via Ollama
💰 Free - No API keys, rate limits, or costs
📝 Clean Output - Markdown tables, headings, and lists
🔄 Sequential Processing - Natural sorting maintains document order

Installation

Prerequisites

# 1. Install Ollama
brew install ollama

# 2. Start Ollama service
ollama serve

# 3. Pull DeepSeek-OCR model (~6GB download)
ollama pull deepseek-ocr

Install Tool

cd ollama-deepseek-ocr-tool
uv sync
uv tool install .

Usage

Quick Start

# Basic: Process all PNG files in current directory
ollama-deepseek-ocr-tool "*.png" output.md

Common Use Cases

# Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md

# Process lecture slides from subdirectory
ollama-deepseek-ocr-tool "lectures/week-5/*.jpg" week-5-summary.md

# Process numbered scans in order
ollama-deepseek-ocr-tool "scan-00*.png" document.md

Verbose Logging

# INFO level - High-level operations
ollama-deepseek-ocr-tool "*.png" output.md -v

# DEBUG level - Detailed processing info (file sizes, word counts)
ollama-deepseek-ocr-tool "*.png" output.md -vv

# TRACE level - Full HTTP request/response logs
ollama-deepseek-ocr-tool "*.png" output.md -vvv

Get Help

# Show full help with examples and troubleshooting
ollama-deepseek-ocr-tool --help

What It Can Do

Text & Formatting

✅ Body text with markdown formatting
✅ Headings (H1, H2, H3)
✅ Lists (bulleted, numbered)
✅ Multi-column layouts

Tables

✅ Converts to clean markdown tables
✅ Preserves headers and structure
✅ Handles merged cells

Diagrams & Figures

✅ Extracts text labels from diagrams
✅ Captures figure captions
❌ Does not describe visual content
❌ Does not capture flow/arrows

Output Format

<!-- Source: IMG_4170.png -->

[extracted text from page 1]

---

<!-- Source: IMG_4171.png -->

[extracted text from page 2]

Performance

Speed: ~3 seconds per image (M4 MacBook)
Memory: ~6GB (DeepSeek-OCR model)
Throughput: ~20 images per minute

Development

# Install dependencies
make install

# Run linting
make lint

# Format code
make format

# Type check
make typecheck

# Security checks
make security

# Full pipeline
make pipeline

Architecture

See ARCHITECTURE.md for detailed documentation on:

System components and module structure
Ollama integration details
DeepSeek-OCR capabilities and limitations
Performance benchmarks and design decisions

License

MIT

Credits

Built with assistance from AI coding tools and reviewed by humans.

ollama-deepseek-ocr-tool

Popularity

What's Inside

README

ollama-deepseek-ocr-tool

Overview

Installation

Prerequisites

Install Tool

Usage

Quick Start

Common Use Cases

Verbose Logging

Get Help

What It Can Do

Text & Formatting

Tables

Diagrams & Figures

Output Format

Performance

Development

Architecture

License

Credits

Confidence

Similar Plugins

aidenwu0209-paddleocr-skills

mineru

vision-specialist

claude-agent-sdk-dev

vlmrun-skills

zai-cli

More by dnvriend

claude-code-scheduler

kokoro-tts-tool

gemini-nano-banana-tool

vector-rag-gui

pdf-to-pptx-tool

Popularity

Health & Quality

More by dnvriend

claude-code-scheduler

kokoro-tts-tool

gemini-nano-banana-tool

vector-rag-gui

pdf-to-pptx-tool

Similar Plugins

aidenwu0209-paddleocr-skills

mineru

vision-specialist

claude-agent-sdk-dev

vlmrun-skills

zai-cli