Help us improve
Share bugs, ideas, or general feedback.
From paideia
Transcribes hand-written or scanned answer PDFs to markdown. Supports three OCR tiers: Claude native vision, local Ollama Qwen3-VL, and pytesseract fallback. Engine selectable via .course-meta or per-call override.
npx claudepluginhub taewooopark/paideia --plugin paideiaHow this skill is triggered — by the user, by Claude, or both
Slash command
/paideia:vision-ocrThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- `/grade` needs to convert `answers/*.pdf` → `answers/converted/*.md`
Converts hand-written/scanned answer PDFs to markdown via OCR, then grades against reference solutions using strategy-based comparison.
Converts PDF files to markdown using local GLM-OCR via Ollama. Renders each page to image, runs OCR, assembles clean text output. Use for extracting text from PDFs.
Processes academic PDFs into structured Obsidian literature notes and 9-node critical-thinking canvases. Useful for researchers who want to deeply read, summarize, or critique papers and save results in Obsidian.
Share bugs, ideas, or general feedback.
/grade needs to convert answers/*.pdf → answers/converted/*.mdanswer-processing skill's step-2 conversion.course-meta holds a single line OCR_ENGINE: <engine> written by /paideia:init-course. The grade command reads it and dispatches. Users can override per-call with /paideia:grade --ocr=<engine> [path].
| Engine | Default? | How it runs | When to pick it |
|---|---|---|---|
claude | Yes | pdftoppm → Claude reads each PNG via the Read tool → synthesizes markdown inline. No external model. No subprocess. | The out-of-the-box path. Nothing to install. Highest fidelity on messy handwriting because Claude vision handles mixed-script (English/Korean) prose with LaTeX well. |
ollama | opt-in | python3 ${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py --engine=ollama <pdf> <md> — local Qwen3-VL 8B, with an automatic tesseract fall-back if ollama is unreachable. Reads INTERFACE_LANG from .course-meta to set the prose-language rule. | You want the PDF to never leave the machine and you don't want to burn Claude tokens on OCR. Requires one-time ollama pull qwen3-vl:8b (~6 GB). |
tesseract | opt-in | python3 ${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py --engine=tesseract <pdf> <md> — pytesseract (eng for en, eng+kor for ko, derived from .course-meta). | Zero cloud + no GPU/VRAM budget. Lowest fidelity on handwriting; fine for typed scans. |
All three emit answers/converted/<stem>.md with a <!-- SOURCE: ... --> / <!-- TIER: ... --> header comment that lets /grade caveat the confidence.
Pipeline (driven by the /grade command, not this script):
answers/<stem>.pdf
↓ pdftoppm -r 200 -png <pdf> <tmpdir>/page # rasterize to PNG per page
↓ Claude reads <tmpdir>/page-1.png, page-2.png, ... via the Read tool
↓ Claude synthesizes clean MD following the prompt contract below
answers/converted/<stem>.md
└── header: <!-- SOURCE: <stem>.pdf, claude-vision (native), N pages -->
The grade command handles the orchestration — rasterize, Read each page, synthesize into one markdown file in a single pass. No standalone driver script is required.
answers/<stem>.pdf
↓ pdf2image @ 300dpi
↓ resize to ≤1200px wide (VLMs dislike huge inputs)
↓ base64 JPEG per page
↓ [Tier 1a] ollama qwen3-vl:8b
↓ (on timeout / ollama down)
↓ [Tier 1b] pytesseract (eng or eng+kor, from .course-meta INTERFACE_LANG) ← auto-fallback inside the same script
answers/converted/<stem>.md
└── header: <!-- SOURCE: <stem>.pdf, qwen3-vl:8b @ 300dpi, N pages -->
<!-- TIER: tesseract fallback --> (only when 1a bombed)
Entrypoint:
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py" --engine=ollama <input.pdf> <output.md>
${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py is the single source of truth. It:
/api/generate so the first real page isn't stalled by model load.INTERFACE_LANG).keep_alive: "15m" so the model stays in memory across pages within a session.python3 "${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py" --engine=tesseract <input.pdf> <output.md>
Skips ollama entirely. Header: <!-- TIER: tesseract (explicit) -->.
Whether synthesized by Claude inline (Tier 0) or by Qwen3-VL through this script (Tier 1), the transcription prompt must:
$...$ / $$...$$[?] for ambiguous glyphs instead of guessing<think>, no commentaryIf you edit the prompt, keep these six clauses — they're what separates useful transcription from hallucination.
All engines need:
poppler binaries (pdftoppm, used by pdf2image). brew install poppler / apt-get install poppler-utils.Tier 0 (claude): nothing beyond Claude Code itself.
Tier 1 (ollama) extras:
ollama CLI + model qwen3-vl:8b (~6.1 GB). brew install ollama && ollama serve & && ollama pull qwen3-vl:8b.pdf2image, pytesseract, pillow.Tier 2 (tesseract) extras:
tesseract + tesseract-lang (or tesseract-ocr-kor on Debian). Python: pdf2image, pytesseract, pillow.| Symptom | Cause | Fix |
|---|---|---|
| Tier 0 produces garbage | Scan too dim / skewed / low-res | Re-scan at 300dpi with the page flat, re-run |
Tier 1 timed out on page 1 | first-load stall on cold ollama | re-run; warmup + keep_alive should help on 2nd try |
Tier 1 empty response / <think>... leaks | prompt contract violated | re-check prompt; add "Return ONLY markdown, no " |
| Tier 1 base64 error / 413 | image too large | drop MAX_IMG_WIDTH from 1200 → 1000 |
| Tier 1 ollama 404 | qwen3-vl:8b not pulled | ollama pull qwen3-vl:8b |
| Tier 1 tesseract fallback kept firing | ollama server not running | ollama serve & |
curl -d <arg> — ARG_MAX overflow. Use stdlib urllib with POST body./grade's job; OCR must stay pure transcription./grade to caveat its verdict./grade via answer-processing skill step 2/ingest (future) for hand-written lecture notes, if any appear in materials/answers/converted/ only; does not modify originals in answers/