Help us improve
Share bugs, ideas, or general feedback.
From markitdown
CLI for converting files to Markdown using Microsoft's markitdown. Use when converting PDF, DOCX, PPTX, XLSX, HTML, images, audio, or other file formats to Markdown for text analysis or LLM processing. Triggered by requests involving file-to-markdown conversion, document text extraction, or preparing files for LLM input.
npx claudepluginhub fprochazka/claude-code-plugins --plugin markitdownHow this skill is triggered — by the user, by Claude, or both
Slash command
/markitdown:markitdownThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Command-line tool for converting various file formats to Markdown. Built by Microsoft for LLM and text analysis pipelines.
Converts local PDF, DOCX, XLSX, PPTX, images via OCR, and audio files to clean Markdown using Microsoft's markitdown CLI. Best for text extraction from local documents.
Converts PDF, DOCX, PPTX, XLSX, images (OCR), audio (transcription), HTML, CSV, JSON, XML, ZIP, EPUB, and YouTube transcripts to clean Markdown using Microsoft MarkItDown. Useful for preparing documents for LLM ingestion or batch conversion.
Converts files and URLs to clean Markdown using MarkItDown. Supports PDF, DOCX, XLSX, PPTX, HTML, images (OCR), audio, CSV, and YouTube transcripts. Optimized for LLM ingestion pipelines.
Share bugs, ideas, or general feedback.
Command-line tool for converting various file formats to Markdown. Built by Microsoft for LLM and text analysis pipelines.
markitdown document.pdf # Convert, output to stdout
markitdown document.pdf -o output.md # Convert, write to file
cat document.pdf | markitdown # Convert from stdin
markitdown document.pdf > output.md # Redirect stdout to file
markitdown < document.pdf # Redirect stdin
Converted without any extras: HTML, CSV, JSON, XML, plain text, ZIP (iterates contents), EPub, Jupyter notebooks (.ipynb), RSS feeds, Wikipedia URLs, Bing SERP results.
With extras installed: PDF, DOCX, PPTX, XLSX, XLS, Outlook MSG, images (EXIF metadata + optional LLM captioning), audio (EXIF metadata + optional transcription), YouTube URLs (transcript extraction).
markitdown -v # Show version
markitdown -o FILE # Write output to file instead of stdout
markitdown -x .pdf # Hint file extension (useful with stdin)
markitdown -m application/pdf # Hint MIME type
markitdown -c UTF-8 # Hint charset
markitdown --keep-data-uris # Keep base64-encoded data URIs in output (truncated by default)
For higher-quality extraction using cloud OCR:
markitdown -d -e "https://your-endpoint.cognitiveservices.azure.com" document.pdf
Third-party plugins extend format support via the markitdown.plugin entry point:
markitdown --list-plugins # List installed plugins
markitdown -p document.xyz # Enable plugins for conversion
markitdown report.pdf | wc -w # Word count
markitdown report.pdf | head -50 # Preview first 50 lines
for f in *.pdf; do markitdown "$f" -o "${f%.pdf}.md"; done
curl -sL "https://example.com/doc.pdf" | markitdown -x .pdf
When piping content, provide an extension hint so markitdown can select the right converter:
cat unknown_file | markitdown -x .docx