Batch convert documents (DOCX, PDF, XLSX, TXT, PPTX, MSG, DOC) to markdown, preserving tracked changes and comments.
From aops-coworknpx claudepluginhub nicsuzor/aopsThis skill is limited to using the following tools:
scripts/pdf2md.pyDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Taxonomy note: This skill provides domain expertise (HOW) for batch document conversion to markdown. See [[TAXONOMY.md]] for the skill/workflow distinction.
Batch convert documents to markdown format, preserving tracked changes, comments, and other markup.
/convert-to-md [directory]
| Format | Method | Notes |
|---|---|---|
| DOCX | pandoc --track-changes=all | Preserves comments & tracked changes |
| PyMuPDF | Text extraction | |
| XLSX | pandas | Converts to markdown tables |
| TXT | rename | Direct rename to .md |
| PPTX | pandoc | Slide content to markdown |
| MSG | extract-msg | Email metadata + body |
| DOC | textutil | macOS native (fallback) |
| DOTX | pandoc | Word templates |
Install dependencies (if needed):
uv add pymupdf pandas openpyxl tabulate extract-msg
Convert DOCX (preserves comments/edits):
for f in *.docx; do
pandoc --track-changes=all -f docx -t markdown -o "${f%.docx}.md" "$f" && rm "$f"
done
Convert PDF:
import fitz
from pathlib import Path
for pdf in Path(".").glob("*.pdf"):
doc = fitz.open(pdf)
text = "\n\n".join(page.get_text() for page in doc)
pdf.with_suffix(".md").write_text(text.strip())
pdf.unlink()
Convert XLSX to tables:
import pandas as pd
for xlsx in Path(".").glob("*.xlsx"):
xls = pd.ExcelFile(xlsx)
content = f"# {xlsx.stem}\n\n"
for sheet in xls.sheet_names:
df = pd.read_excel(xlsx, sheet_name=sheet)
content += f"## {sheet}\n\n{df.to_markdown(index=False)}\n\n"
xlsx.with_suffix(".md").write_text(content)
xlsx.unlink()
Convert TXT: for f in *.txt; do mv "$f" "${f%.txt}.md"; done
Convert MSG:
import extract_msg
msg = extract_msg.Message("file.msg")
content = f"# {msg.subject}\n\n**From:** {msg.sender}\n**Date:** {msg.date}\n\n{msg.body}"
Clean up: Remove *:Zone.Identifier files (Windows metadata)
pandoc (system): DOCX, PPTX, DOTX conversiontextutil (macOS): DOC fallbackpymupdf (Python): PDF text extractionpandas, openpyxl, tabulate (Python): XLSX tablesextract-msg (Python): Outlook MSG files