From open-science-skills
Guides setup/execution of VLM-based OCR pipelines for scanned historical/multilingual documents: model selection, image/DPI handling, prompts, batch processing on HPC/SLURM, accuracy eval with CER/WER.
npx claudepluginhub scdenney/open-science-skills --plugin open-science-skillsThis skill uses the workspace's default tool permissions.
- **Start from OCR benchmarks, not general VLM leaderboards.** OCRBench (Liu et al. 2024) tests across 29 document OCR dimensions; OCRBench v2 (Fu et al. 2025) extends to multilingual scripts and multi-page documents. General vision-language benchmarks (MMMU, VQAv2) do not predict OCR accuracy.
Guides post-OCR text cleanup for research corpora using LLM correction, rule-based fixes, quality diagnostics, and multilingual handling. For strategy selection, prompt design, evaluation, and assurance.
Performs OCR on images, PDFs, and documents using DeepSeek-OCR vision model via vLLM or HuggingFace Transformers, supporting markdown output and optical compression.
Batch processes multiple PDFs with OCRmyPDF via shell loops, parallel processing, Docker setups, and CI/CD pipelines including GitHub Actions and GitLab CI.
Share bugs, ideas, or general feedback.
results_raw.json) per document so partial runs can resume without re-processing.