Skill

markitdown

CLI for converting files to Markdown using Microsoft's markitdown. Use when converting PDF, DOCX, PPTX, XLSX, HTML, images, audio, or other file formats to Markdown for text analysis or LLM processing. Triggered by requests involving file-to-markdown conversion, document text extraction, or preparing files for LLM input.

From markitdown
Install
1
Run in your terminal
$
npx claudepluginhub fprochazka/claude-code-plugins --plugin markitdown
Tool Access

This skill is limited to using the following tools:

Bash(markitdown:*)
Skill Content

markitdown

Command-line tool for converting various file formats to Markdown. Built by Microsoft for LLM and text analysis pipelines.

Basic Usage

markitdown document.pdf                    # Convert, output to stdout
markitdown document.pdf -o output.md       # Convert, write to file
cat document.pdf | markitdown              # Convert from stdin
markitdown document.pdf > output.md        # Redirect stdout to file
markitdown < document.pdf                  # Redirect stdin

Supported Formats

Converted without any extras: HTML, CSV, JSON, XML, plain text, ZIP (iterates contents), EPub, Jupyter notebooks (.ipynb), RSS feeds, Wikipedia URLs, Bing SERP results.

With extras installed: PDF, DOCX, PPTX, XLSX, XLS, Outlook MSG, images (EXIF metadata + optional LLM captioning), audio (EXIF metadata + optional transcription), YouTube URLs (transcript extraction).

Options

markitdown -v                              # Show version
markitdown -o FILE                         # Write output to file instead of stdout
markitdown -x .pdf                         # Hint file extension (useful with stdin)
markitdown -m application/pdf              # Hint MIME type
markitdown -c UTF-8                        # Hint charset
markitdown --keep-data-uris                # Keep base64-encoded data URIs in output (truncated by default)

Azure Document Intelligence

For higher-quality extraction using cloud OCR:

markitdown -d -e "https://your-endpoint.cognitiveservices.azure.com" document.pdf

Plugins

Third-party plugins extend format support via the markitdown.plugin entry point:

markitdown --list-plugins                  # List installed plugins
markitdown -p document.xyz                 # Enable plugins for conversion

Common Patterns

Convert a file and pipe to another tool

markitdown report.pdf | wc -w                       # Word count
markitdown report.pdf | head -50                     # Preview first 50 lines

Batch convert multiple files

for f in *.pdf; do markitdown "$f" -o "${f%.pdf}.md"; done

Convert from URL (via curl)

curl -sL "https://example.com/doc.pdf" | markitdown -x .pdf

Convert and process with stdin extension hint

When piping content, provide an extension hint so markitdown can select the right converter:

cat unknown_file | markitdown -x .docx
Similar Skills
cache-components

Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). **PROACTIVE ACTIVATION**: Use this skill automatically when working in Next.js projects that have `cacheComponents: true` in their next.config.ts/next.config.js. When this config is detected, proactively apply Cache Components patterns and best practices to all React Server Component implementations. **DETECTION**: At the start of a session in a Next.js project, check for `cacheComponents: true` in next.config. If enabled, this skill's patterns should guide all component authoring, data fetching, and caching decisions. **USE CASES**: Implementing 'use cache' directive, configuring cache lifetimes with cacheLife(), tagging cached data with cacheTag(), invalidating caches with updateTag()/revalidateTag(), optimizing static vs dynamic content boundaries, debugging cache issues, and reviewing Cache Component implementations.

138.5k
Stats
Parent Repo Stars3
Parent Repo Forks1
Last CommitFeb 15, 2026