From html-to-markdown
Converts HTML to Markdown, Djot, or plain text with metadata, table, and image extraction via CLI or 12 language SDKs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/html-to-markdown:html-to-markdownThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
html-to-markdown is a high-performance HTML→Markdown converter with a Rust core and 12 native language bindings. It converts HTML to CommonMark Markdown, Djot, or plain text in a single pass, optionally extracting metadata, tables, inline images, and a structured document tree.
html-to-markdown is a high-performance HTML→Markdown converter with a Rust core and 12 native language bindings. It converts HTML to CommonMark Markdown, Djot, or plain text in a single pass, optionally extracting metadata, tables, inline images, and a structured document tree.
Use this skill when writing code that:
| Capability | CLI | SDKs |
|---|---|---|
| HTML→Markdown / Djot / plain text | html-to-markdown FILE | convert(html, options) |
| Read HTML from stdin / file / URL | cat f | …, FILE, --url URL | convert(htmlString, …) |
| ~30 config options (headings, code blocks, lists, escaping, wrapping…) | flags | ConversionOptions |
| Metadata extraction | --json (default-extracted) | result.metadata |
| Table extraction | --json → tables[] | result.tables |
| Document structure tree | --json --include-structure | include_document_structure=true |
| Inline image extraction | --json --extract-inline-images | extract_images=true |
| HTML preprocessing | --preprocess [--preset …] | PreprocessingOptions |
| Extraction-only (no Markdown body) | --json --no-content | options + read fields |
# (Homebrew 6.0+ requires explicit trust for third-party taps)
brew trust xberg-io/tap
brew install xberg-io/tap/html-to-markdown
# or run without a persistent install (the CLI proxy package self-installs the binary):
npx @kreuzberg/html-to-markdown-cli --help
uvx --from html-to-markdown-cli html-to-markdown --help
# or download a prebuilt binary from the latest GitHub release:
# https://github.com/xberg-io/html-to-markdown/releases/latest
# or build from source:
cargo install --git https://github.com/xberg-io/html-to-markdown html-to-markdown-cli
pip install html-to-markdown # Python
npm install @kreuzberg/html-to-markdown # TypeScript / Node.js
cargo add html-to-markdown-rs # Rust (features: metadata default; full = all)
gem install html-to-markdown # Ruby
composer require xberg-io/html-to-markdown # PHP
go get github.com/xberg-io/html-to-markdown/packages/go/v3 # Go
dotnet add package KreuzbergDev.HtmlToMarkdown # C#
npm install @kreuzberg/html-to-markdown-wasm # WASM
dev.kreuzberg:html-to-markdown{:html_to_markdown, "~> 3.6"} in mix.exsinstall.packages("htmltomarkdown", repos = "https://xberg-io.r-universe.dev").so / .dll / .dylib from GitHub releases--json | jq. Flags only, no subcommands. FILE is positional; omit or use - for stdin.html-to-markdown mcp exposes convert/extract as agent tools, so an MCP client can convert HTML directly with no shell-out. This plugin auto-registers it; see the using-the-mcp-server skill.Both share the same ConversionResult shape, so output is interchangeable.
Rule of thumb: single HTML in → Markdown out = html-to-markdown. Many URLs / a site = kreuzcrawl. Non-HTML documents = kreuzberg.
# Convert a file to stdout
html-to-markdown input.html
# Convert and save
html-to-markdown input.html -o output.md
# Read from stdin
cat page.html | html-to-markdown
# Fetch and convert a URL
html-to-markdown --url https://example.com > out.md
# Full ConversionResult as JSON (content, tables, metadata, images, warnings)
html-to-markdown --json input.html
# JSON with document structure tree
html-to-markdown --json --include-structure input.html
# Extraction-only (no Markdown body)
html-to-markdown --json --no-content input.html
# Aggressive web-page cleanup
html-to-markdown input.html --preprocess --preset aggressive
use html_to_markdown_rs::convert;
let result = convert("<h1>Hello World</h1><p>A paragraph.</p>", None)?;
println!("{}", result.content.unwrap_or_default());
from html_to_markdown import convert
result = convert("<h1>Hello World</h1><p>A paragraph.</p>")
print(result.content) # # Hello World\n\nA paragraph.
print(result.metadata) # title, links, headers, …
import { convert } from "@kreuzberg/html-to-markdown";
// Node's convert() returns a JSON string — always JSON.parse() it.
const result = JSON.parse(convert("<h1>Hello World</h1><p>A paragraph.</p>"));
console.log(result.content);
All languages return the same structure (dict, object, or struct).
| Field | Description |
|---|---|
content | Converted text (Markdown/Djot/plain). null only in extraction-only mode. |
metadata | Title, OG, headers, links, images, structured data. |
tables | Tables with grid (structured cells) and markdown fields. |
images | Extracted inline images (requires inline-image extraction). |
document | Structured document tree when structure extraction is enabled. |
warnings | Non-fatal processing warnings (message, kind). |
All languages expose the same ~30 options. See references/configuration.md for the complete table. Common ones:
| Option | Values | Default |
|---|---|---|
heading_style | atx, underlined, atx-closed | atx |
code_block_style | backticks, indented, tildes | backticks |
output_format | markdown, djot | markdown |
wrap / wrap_width | bool / 20–500 | off / 80 |
autolinks (SDK) / --no-autolinks (CLI) | bool / flag | true (on); disable in CLI with --no-autolinks |
| preprocessing | minimal / standard / aggressive | off |
use html_to_markdown_rs::{convert, ConversionOptions, HeadingStyle, OutputFormat};
let options = ConversionOptions::builder()
.heading_style(HeadingStyle::Atx)
.output_format(OutputFormat::Markdown)
.wrap(true)
.wrap_width(100)
.build();
let result = convert(html, Some(options))?;
from html_to_markdown import convert, ConversionOptions, PreprocessingOptions
result = convert(
html,
ConversionOptions(heading_style="atx", wrap=True, wrap_width=100),
PreprocessingOptions(enabled=True, preset="aggressive"),
)
Metadata is extracted by default; in the CLI it appears under metadata in --json output. Fields include document (title, description, language, charset, open_graph), headers, links (with link_type), images, and structured_data (JSON-LD/Microdata/RDFa). See the extracting-metadata skill for details.
Tables appear in result.tables, each with a pre-rendered markdown string and a structured cell grid. Markdown tables also appear inline in content. See the extracting-tables skill.
Enable structure extraction (--include-structure on the CLI, include_document_structure=true in SDKs) to get a semantic node tree under document. Node types include heading, paragraph, list, list_item, table, image, code, quote, group, metadata_block.
convert() returns a result object, not a string. Access .content for the Markdown text.convert() returns a JSON string. Always JSON.parse(convert(html)) — NAPI-RS serializes the result for performance.--json outputs JSON, not Markdown. Omit --json for plain Markdown.--include-structure, --extract-inline-images, and --no-content require --json.FILE is positional.--preset, --keep-navigation, --keep-forms require --preprocess.npx claudepluginhub xberg-io/plugins --plugin html-to-markdownMines projects and conversations into a searchable memory palace. Activates on queries about MemPalace, memory palace, mining, searching, palace setup, wings, rooms, drawers, or recalling past work.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.