From oma
Convert PDF files to Markdown using opendataloader-pdf. Extracts text, tables, headings, lists, and images with correct reading order. Use for PDF parsing, PDF to Markdown conversion, document extraction, and AI-ready data preparation.
npx claudepluginhub first-fluke/oh-my-agent --plugin omaThis skill uses the workspace's default tool permissions.
Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review.
input_path: PDF file or folder pathoutput_dir: optional target directoryformat: optional output format, default markdownocr_languages: optional OCR language list for scanned or image-based PDFsextraction_options: optional flags for tagged structure, image extraction, or hybrid conversionuvx opendataloader-pdf for standard conversionuvx opendataloader-pdf-hybrid for OCR or hybrid conversionuvx mdformat for Markdown normalizationuvxoutput_dir and the expected output filename.mdformat for Markdown output and inspect the result for readable structure.--use-struct-tree.| Failure | Recovery |
|---|---|
uvx unavailable | Ask user to install uv before conversion |
| Password-protected PDF | Ask for password or unlocked PDF |
| Garbled output | Retry with tagged structure or hybrid mode |
| Missing tables | Retry with hybrid mode for complex or borderless tables |
| OCR language mismatch | Retry with explicit OCR languages, for example ko,en |
| Large file or memory pressure | Split into page ranges or batch smaller inputs |
| Action | SSL primitive | Evidence |
|---|---|---|
| Validate path and options | VALIDATE | Input preflight in execution protocol |
| Probe text layer | READ | Text preview extraction |
| Choose conversion strategy | SELECT | Standard, tagged, or hybrid mode decision |
| Run converter | CALL_TOOL | uvx opendataloader-pdf |
| Start OCR server | CALL_TOOL | uvx opendataloader-pdf-hybrid |
| Write output artifact | WRITE | Markdown, text, JSON, or HTML output |
| Normalize Markdown | CALL_TOOL | uvx mdformat |
| Inspect extraction quality | VALIDATE | Structure/readability verification |
| Report result | NOTIFY | Final user-facing summary |
opendataloader-pdf: primary PDF extraction CLIopendataloader-pdf-hybrid: hybrid OCR and complex extraction pathmdformat: Markdown normalizationfile, wc, or pdfinfo may be used for preflight when availableuvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}"
uvx mdformat "{output_path}"
For scanned/image-based PDFs, start OCR first and then convert through hybrid mode:
uvx opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "{languages}"
uvx opendataloader-pdf --hybrid docling-fast "{input_path}" --format markdown --output-dir "{output_dir}"
| Scope | Resource target |
|---|---|
LOCAL_FS | Input PDFs and generated output files |
PROCESS | uvx subprocesses and optional hybrid server |
MEMORY | Extracted previews and validation notes |
OTHER | OCR model/runtime behavior inside hybrid mode |
uvx.resources/execution-protocol.md rather than duplicating every variant here.resources/execution-protocol.mdconfig/pdf-config.yaml../_shared/core/context-loading.md../_shared/core/quality-principles.md