From claude-superskills
Extracts clean Markdown from web pages by stripping navigation, ads, sidebars, footers, and boilerplate using Defuddle. Use for URLs to documentation, articles, blog posts, research papers, release notes.
npx claudepluginhub ericgandrade/claude-superskills --plugin claude-superskillsThis skill uses the workspace's default tool permissions.
This skill extracts clean, readable Markdown from any web page URL by stripping navigation menus, advertisements, sidebars, footers, cookie banners, and all other non-content elements. It produces token-efficient output that focuses exclusively on the meaningful content of the page.
Extracts clean markdown from web pages using Defuddle CLI, removing ads, navigation, and clutter to save tokens. Prefer for user-provided URLs of docs, articles, or blogs over WebFetch.
Extracts clean markdown from web pages using Defuddle CLI, removing navigation, ads, and clutter to save tokens. Use for online docs, articles, blogs—not .md files.
Extracts clean markdown from web pages using Defuddle CLI, stripping clutter like navigation and ads to save tokens. Use for URLs to analyze docs, articles, or blogs instead of WebFetch.
Share bugs, ideas, or general feedback.
This skill extracts clean, readable Markdown from any web page URL by stripping navigation menus, advertisements, sidebars, footers, cookie banners, and all other non-content elements. It produces token-efficient output that focuses exclusively on the meaningful content of the page.
When agents fetch web pages using standard tools, the raw output often includes hundreds of navigation links, promotional widgets, and boilerplate text that consume tokens without adding value. This skill eliminates that noise using Defuddle, a purpose-built content extraction tool, to isolate the article body, documentation text, or main content and return only what matters.
The skill gracefully degrades when Defuddle is not installed: it offers to install it automatically or falls back to standard web fetching with a clear note about the trade-off in output quality and token usage.
This is a universal skill — it works in any project, any terminal context, and does not require Obsidian or any specific project structure.
Invoke this skill when:
Do NOT use this skill when:
.md — those files are already Markdown, use WebFetch directly insteaddocling-converter insteaddeep-research insteadyoutube-summarizer insteadBefore fetching, inspect the URL provided by the user:
.md files → use WebFetch directly (already Markdown, no extraction needed).pdf, .docx, .pptx, .xlsx files → redirect to docling-converter.mp3, .mp4, .wav files → redirect to audio-transcriber.png, .jpg, .gif, .svg files → inform the user that image extraction is not supportedRun a quick availability check before attempting extraction:
defuddle --version
If Defuddle is available, proceed to Step 3a.
If Defuddle is not installed, present this offer to the user:
"Defuddle is not installed. It extracts clean Markdown from web pages with significantly better quality than standard fetching. Install it now with
npm install -g defuddle? It takes about 10 seconds. (yes/no)"
npm install -g defuddle, then proceed to Step 3a.Use Defuddle to extract the page content:
# Extract and display as Markdown
defuddle parse <url> --md
# Extract and save to a file
defuddle parse <url> --md -o output-filename.md
# Extract specific metadata only
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p author
defuddle parse <url> -p domain
Output format flags:
| Flag | Output | When to use |
|---|---|---|
--md | Markdown | Default choice for all content |
--json | JSON with HTML and Markdown | When structured metadata is needed |
| (none) | HTML | Avoid — use --md instead |
-p <name> | Single metadata property | When only title, author, or description is needed |
Always prefer --md unless the user explicitly requests another format.
If Defuddle is not available and the user declined installation, use the standard WebFetch capability with this note prepended to the output:
Note: Fetching without Defuddle — output may contain navigation elements and non-content text that increases token usage. Install Defuddle with
npm install -g defuddlefor cleaner results.
Apply best-effort cleanup to the WebFetch output:
If the page returns an error or blocks the request, respond with clear guidance:
| Situation | Response |
|---|---|
| 403 Forbidden | "This site blocks automated access. Try opening it manually in a browser." |
| 404 Not Found | "Page not found. Please verify the URL is correct." |
| Timeout | Retry once automatically; if it fails again, report the timeout |
| Login required | "This page requires authentication. Log in first and share the content manually." |
| Paywall detected | "This content is behind a paywall and cannot be extracted automatically." |
| Empty extraction | Fall back to WebFetch and note the reduced quality |
| Invalid SSL | Report the SSL error and ask if the user wants to proceed anyway |
Never fabricate or invent page content when extraction fails. Always report failures honestly.
Present the extracted content in a consistent format:
# [Page Title]
**Source:** [full URL]
**Domain:** [domain.com]
---
[Extracted Markdown content]
If saving to a file, confirm the save:
"Content saved to
[filepath]— [word count] words extracted from [domain.com]."
If the user requested metadata only (title, author, description), return just the requested fields without the full body content.
For multiple URLs, separate each result with a clear divider and label each section with its source URL.
Defuddle supports several output modes. Choose the correct one based on what the user needs:
| Mode | Command flag | Output type | Best for |
|---|---|---|---|
| Markdown | --md | Clean .md text | Reading articles, documentation, blog posts |
| JSON | --json | JSON with html and markdown fields | When structured metadata and body are both needed |
| HTML | (no flag) | Raw cleaned HTML | Avoid — use --md for readability |
| Metadata | -p <name> | Single string value | When only title, author, description, or domain is needed |
Available metadata properties via -p:
title — page titledescription — meta descriptionauthor — article authordomain — domain name onlyurl — canonical URLAlways default to --md unless the user requests a different format explicitly.
This skill works well in combination with others:
deep-research — use webpage-reader to pre-fetch and clean individual pages that will be cited in a research synthesisobsidian-note-builder — extract a URL first, then pass the clean Markdown to obsidian-note-builder to create a linked vault noteyoutube-summarizer — if the user provides a non-YouTube URL for a video page, route to youtube-summarizer insteaddeep-research — use webpage-reader to fetch individual sources identified during a research session for deeper readingWhen chaining skills, always inform the user which skill is handling which step so they understand the workflow.
NEVER:
.md — WebFetch is correct for Markdown filesdocling-converter--md flagALWAYS:
User: Read this page for me: https://docs.anthropic.com/en/docs/about-claude/models
Action: Check Defuddle availability → run defuddle parse https://docs.anthropic.com/en/docs/about-claude/models --md → return clean Markdown of the Claude models page, with source URL noted at the top.
Result: Clean Markdown of the documentation without navigation links, sidebar content, or promotional elements.
User: Extract this article and save it to notes/microservices.md:
https://martinfowler.com/articles/microservices.html
Action: Run defuddle parse <url> --md -o notes/microservices.md → confirm save.
Result: "Content saved to notes/microservices.md — 4,200 words extracted from martinfowler.com."
User: What is the title and author of this page?
https://kentcdodds.com/blog/javascript-to-know-for-react
Action: Run defuddle parse <url> -p title and defuddle parse <url> -p author → return only the two metadata values.
Result: Title and author presented cleanly, without fetching the full body content.
User: I am researching LLM evaluation frameworks. Read these three pages:
- https://docs.ragas.io/en/latest/
- https://www.trulens.org/
- https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
Action: Process each URL sequentially with Defuddle → return three labeled Markdown sections.
Result: Three clean extractions, each starting with its source URL, ready for comparison or summarization.
User: Fetch https://research.google/blog/advances-in-research/ and summarize it
Action: Check defuddle --version → not found → present install offer.
Response: "Defuddle is not installed. Install it now with npm install -g defuddle for clean Markdown output? (yes/no)"
If the user says yes: install and extract. If no: fall back to WebFetch with a quality note, then proceed with extraction and summarization.