Marker Document Converter

Convert PDF, EPUB, PPTX, DOCX, XLSX, HTML, and image files to clean Markdown/JSON/HTML format using the marker-pdf tool with multimodal LLM enhancement.

Prerequisites

# Install marker-pdf with full document support
uv tool install marker-pdf[full]

Requires Python 3.10+ and PyTorch.

Basic Usage

marker_single "<file_path>" \
  --output_format markdown \
  --output_dir "<output_directory>" \
  --use_llm \
  --llm_service marker.services.claude.ClaudeService \
  --claude_model_name claude-haiku-4-5 \
  --claude_api_key $ANTHROPIC_API_KEY \
  --disable_image_extraction

Note: --disable_image_extraction generates plain text output. Remove this flag if images need to be preserved.

Output Formats

Format	Description	Use Case
`markdown`	Formatted text with tables, LaTeX equations ($$-fenced), code blocks, image links	General document conversion
`html`	Semantic HTML with `<img>`, `<math>`, `<pre>` tags	Web display
`json`	Hierarchical structure with block types, bounding boxes, section hierarchy	Programmatic processing
`chunks`	Flattened JSON optimized for RAG	Vector database ingestion

CLI Options

Core Options

--output_format: markdown (default), html, json, chunks
--output_dir: Directory for output files
--page_range: Specific pages, e.g., "0,5-10,20"

LLM Enhancement

--use_llm: Enable LLM for improved accuracy (tables, forms, math, handwriting)
--llm_service: LLM service class (see LLM Services below)
--block_correction_prompt: Custom prompt for output refinement

OCR & Processing

--force_ocr: Force OCR on entire document, converts inline math to LaTeX
--strip_existing_ocr: Remove existing OCR and re-process
--redo_inline_math: Highest quality inline math conversion (use with --use_llm)

Image & Output Control

--disable_image_extraction: Skip image extraction (plain text only)
--paginate_output: Add page separators to output
--extract_images: Enable image extraction (default: true)

Advanced

--config_json: Load configuration from JSON file
--debug: Enable diagnostic logging
--force_layout_block: Force layout type, e.g., Table
--converter_cls: Custom converter class

LLM Services

Claude (Default)

marker_single document.pdf \
  --use_llm \
  --llm_service marker.services.claude.ClaudeService \
  --claude_api_key $ANTHROPIC_API_KEY \
  --claude_model_name claude-haiku-4-5

OpenAI

marker_single document.pdf \
  --use_llm \
  --llm_service marker.services.openai.OpenAIService \
  --openai_api_key $OPENAI_API_KEY \
  --openai_model gpt-4o

Ollama (Local)

marker_single document.pdf \
  --use_llm \
  --llm_service marker.services.ollama.OllamaService \
  --ollama_base_url "http://localhost:11434" \
  --ollama_model llama3.2-vision

Google Gemini (Default if no service specified)

export GOOGLE_API_KEY="your-api-key"
marker_single document.pdf --use_llm

Examples

Convert PDF to Markdown (Plain Text)

marker_single "./docs/report.pdf" \
  --output_format markdown \
  --output_dir "./docs/" \
  --use_llm \
  --llm_service marker.services.claude.ClaudeService \
  --claude_model_name claude-haiku-4-5 \
  --claude_api_key $ANTHROPIC_API_KEY \
  --disable_image_extraction

Convert with Images Preserved

marker_single "./docs/report.pdf" \
  --output_format markdown \
  --output_dir "./docs/" \
  --use_llm \
  --llm_service marker.services.claude.ClaudeService \
  --claude_model_name claude-haiku-4-5 \
  --claude_api_key $ANTHROPIC_API_KEY

Extract Tables Only

marker_single "./docs/spreadsheet.pdf" \
  --use_llm \
  --force_layout_block Table \
  --converter_cls marker.converters.table.TableConverter \
  --output_format json

Batch Convert Multiple Files

marker /path/to/input/folder --workers 4

Using JSON Config File

cat > config.json << EOF
{
  "force_ocr": true,
  "use_llm": true,
  "output_format": "markdown",
  "disable_image_extraction": true,
  "strip_existing_ocr": true,
  "redo_inline_math": true
}
EOF

marker_single document.pdf --config_json config.json

Output Structure

Markdown Output

Image links: ![](image_name.png)
Tables: Formatted as markdown tables
Equations: Fenced with $$...$$
Code: Fenced with ```language
Headings: # for sections

JSON Output

{
  "pages": [
    {
      "id": "page_0",
      "polygon": [[x1,y1], [x2,y2], ...],
      "children": [
        {
          "id": "block_0",
          "block_type": "Text|Table|Image|...",
          "html": "<p>content</p>",
          "polygon": [...],
          "section_hierarchy": {...}
        }
      ]
    }
  ],
  "metadata": {
    "table_of_contents": [...],
    "page_stats": [...]
  }
}

Instructions

Confirm the input file path exists
Determine output directory (default: same as input file)
Use AskUserQuestion tool to ask user preferences (ask both questions together):

Question 1 - Image Extraction:
- Header: "Images"
- Question: "是否需要提取文档中的图片？"
- Options:
  - "No (Recommended)": 仅提取文本，生成纯 Markdown 文件
  - "Yes": 提取图片并保存，Markdown 中包含图片链接
Question 2 - LLM Service:
- Header: "LLM"
- Question: "使用哪个 LLM 来识别图片和表格内容？"
- Options:
  - "Claude Haiku (Recommended)": 快速、经济，需要 ANTHROPIC_API_KEY
  - "Claude Sonnet": 更高质量，需要 ANTHROPIC_API_KEY
  - "GPT-4o": OpenAI 模型，需要 OPENAI_API_KEY
  - "Ollama (Local)": 本地运行，无需 API Key
Based on user's answers, construct the command:
- If "No" for images: add --disable_image_extraction
- Set LLM service parameters according to selection:
  - Claude Haiku: --llm_service marker.services.claude.ClaudeService --claude_model_name claude-haiku-4-5 --claude_api_key $ANTHROPIC_API_KEY
  - Claude Sonnet: --llm_service marker.services.claude.ClaudeService --claude_model_name claude-sonnet-4-20250514 --claude_api_key $ANTHROPIC_API_KEY
  - GPT-4o: --llm_service marker.services.openai.OpenAIService --openai_api_key $OPENAI_API_KEY --openai_model gpt-4o
  - Ollama: --llm_service marker.services.ollama.OllamaService --ollama_base_url "http://localhost:11434" --ollama_model llama3.2-vision
Run the marker_single command with chosen options
Report the output file location and any extraction notes

marker