From docs-to-skill
Ingests documentation site URLs, discovers pages via sitemap or nav crawl, extracts markdown, and generates Claude Code skill packages with SKILL.md indexes and references.
npx claudepluginhub emdashcodes/claude-code-plugins --plugin docs-to-skillThis skill is limited to using the following tools:
Ingest a documentation site and generate a ready-to-use Claude Code skill. Point this at any docs URL and it will discover pages, fetch content, and produce a skill with a SKILL.md index and `references/` directory.
Converts documentation in Markdown, PDF, DOCX, PPTX, XLSX, AsciiDoc, RST, HTML, Jupyter notebooks, configs, and text into Claude Code skill directories with SKILL.md and grouped references/*.md files. Invoke with docs path for AI skills.
Creates structured AI skills from docs, GitHub repos, PDFs, codebases, videos using skill-seekers CLI. Guides scraping, analysis, enhancement, packaging for Claude/Gemini/OpenAI.
Automatically converts documentation websites, GitHub repositories, and PDFs into Claude AI skills in minutes. Use when creating custom skills from external resources.
Share bugs, ideas, or general feedback.
Ingest a documentation site and generate a ready-to-use Claude Code skill. Point this at any docs URL and it will discover pages, fetch content, and produce a skill with a SKILL.md index and references/ directory.
This skill orchestrates a multi-phase workflow:
The bundled Playwright MCP server (headless) gives you a real browser for navigating pages, taking screenshots, and extracting content as clean markdown via Turndown.js injection.
Collect three things from the user:
The root URL of the documentation site. Examples:
https://docs.example.comhttps://example.com/docs/getting-startedIf the user provides multiple URLs, treat each as a separate docs section to include.
Either provided by the user or derived from the site:
react-docs, stripe-api, shadcn-uiAsk the user where to place the generated skill:
~/.claude/skills/<skill-name>/.claude/skills/<skill-name>/<marketplace>/plugins/<plugin-name>/skills/<skill-name>/Find all documentation pages to fetch. Try strategies in order until one succeeds.
Use WebFetch to retrieve <domain>/sitemap.xml:
WebFetch url="https://docs.example.com/sitemap.xml" prompt="Extract all URLs from this sitemap XML. Return them as a plain list, one URL per line. Only include URLs that appear to be documentation pages (not blog posts, marketing pages, etc.)."
If the sitemap exists, filter URLs to the docs path prefix. For example, if the user gave https://example.com/docs/, only keep URLs starting with that path.
If no sitemap exists, use Playwright to crawl the navigation:
browser_navigatebrowser_evaluate:// Common doc site nav patterns — adapt to what you find on the page
const navSelectors = [
'nav a[href]',
'[role="navigation"] a[href]',
'.sidebar a[href]',
'.docs-sidebar a[href]',
'.toc a[href]',
'aside a[href]'
];
const links = new Set();
for (const selector of navSelectors) {
document.querySelectorAll(selector).forEach(a => {
const href = a.href;
if (href && href.startsWith(window.location.origin)) {
links.add(href);
}
});
}
// Deduplicate and sort
Array.from(links).sort();
If automated discovery fails, ask the user to provide a list of URLs.
After discovery, present the URL list to the user:
This is the most context-intensive phase. Use subagents to keep the main context clean.
Navigate to a representative page with Playwright and figure out the right CSS selector for the content area.
browser_navigate url="<docs-url>"
browser_take_screenshot name="docs-layout"
const candidates = ['article', '[role="main"]', 'main', '.docs-content', '.markdown-body', '.content', '#content'];
const found = candidates.map(sel => {
const el = document.querySelector(sel);
return el ? { selector: sel, textLength: el.innerText.trim().length } : null;
}).filter(Boolean);
JSON.stringify(found);
new Promise((resolve) => {
const s = document.createElement('script');
s.src = 'https://unpkg.com/turndown/dist/turndown.js';
s.onload = () => {
const td = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });
const el = document.querySelector('<best-selector>');
const clone = el.cloneNode(true);
clone.querySelectorAll('button, [aria-label="Copy"]').forEach(e => e.remove());
clone.querySelectorAll('h1 > div, h2 > div, h3 > div, h4 > div').forEach(e => e.remove());
resolve(td.turndown(clone.innerHTML));
};
document.head.appendChild(s);
});
#content, main, article)button, .sidebar, nav)Split the remaining URLs into batches of 3-5 pages per subagent. Each page requires ~4-5 tool calls (navigate, load Turndown, extract, get title, write file), so keep batches small to avoid hitting turn limits. For each batch, spawn a general-purpose subagent using the Task tool:
Task:
subagent_type: general-purpose
prompt: |
You are fetching documentation pages and saving them as markdown reference files.
## Output Directory
<absolute-path-to-output>/references/
## URLs to Fetch
1. <url-1>
2. <url-2>
...
## Extraction
For each URL, use Playwright with Turndown to get full markdown content:
1. Navigate: `browser_navigate` to the URL
2. Load Turndown (only needed once — it persists across navigations on the same origin):
```javascript
new Promise((resolve) => {
if (window.TurndownService) { resolve('already loaded'); return; }
const s = document.createElement('script');
s.src = 'https://unpkg.com/turndown/dist/turndown.js';
s.onload = () => resolve('loaded');
document.head.appendChild(s);
});
```
3. Extract content:
```javascript
const td = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });
const el = document.querySelector('<content-selector>');
const clone = el.cloneNode(true);
clone.querySelectorAll('<elements-to-strip>').forEach(e => e.remove());
// Strip heading anchor link wrappers (common in doc sites)
clone.querySelectorAll('h1 > div, h2 > div, h3 > div, h4 > div').forEach(e => e.remove());
td.turndown(clone.innerHTML);
```
4. Get the page title: `document.querySelector('h1')?.textContent || document.title`
If Playwright fails on a page, fall back to WebFetch with prompt:
"Extract the main documentation content as clean markdown."
## For Each URL
1. Extract the page content using the method above
2. Determine the filename:
- Use the URL path to derive a descriptive filename
- Replace slashes with underscores, remove leading/trailing slashes
- Prefix with the site/project name for namespacing
- Example: `https://docs.example.com/guides/auth` → `example_guides_auth.md`
3. Write the reference file with this format:
```
---
title: <Page Title>
source_url: <original-url>
---
<clean markdown content>
```
## Quality Checks
- Skip pages that yield less than 100 characters of content (likely empty/redirect)
- Preserve all code blocks with their language annotations
- Keep relative links as-is (don't try to resolve them)
- Tables should be converted to markdown table format where possible
After subagents complete:
references/ directoryGenerate the SKILL.md index file by reading all reference files and organizing them.
Read all reference files to build a map of:
Group references into logical topic sections. Common groupings:
Use the documentation site's own navigation structure as a guide for grouping.
Use the template below. Fill in all sections based on the reference inventory.
---
name: <skill-name>
description: <When Claude should use this skill — be specific about the technology/framework and what kinds of tasks trigger it>
allowed-tools: Read
---
# <Skill Display Name>
## Overview
<2-3 sentence description of what this skill covers and what it enables.>
## When to Use This Skill
Activate this skill when:
- <Specific trigger 1 — e.g., "Building components with shadcn/ui">
- <Specific trigger 2 — e.g., "Configuring shadcn/ui theming or styling">
- <Specific trigger 3>
- <Specific trigger 4>
## Quick Reference
<Essential patterns that come up constantly. Keep this to 10-15 lines max. Include:
- The primary install/setup command
- The most common import pattern
- The 1-2 most frequently needed code snippets
- Key conventions (naming, file structure, etc.)
This section saves a round-trip to reference files for the basics that Claude needs on almost every interaction.>
## <Topic Section 1 — e.g., "Getting Started">
### <Subtopic — e.g., "Installation">
<Brief description of what this reference covers.>
**Reference**: `references/<filename>.md`
**When to reference**: <Specific scenario — e.g., "When setting up a new project with shadcn/ui or adding it to an existing project.">
### <Subtopic 2>
...
## <Topic Section 2 — e.g., "Components">
### <Subtopic>
...
## Reference Documentation Index
All documentation is available in the `references/` directory:
**<Category>**:
- `<filename>.md` - <Brief description>
- `<filename>.md` - <Brief description>
**<Category>**:
- `<filename>.md` - <Brief description>
marketplace.jsonThe bundled Playwright MCP server (headless) gives the agent a real browser. Core tools:
browser_navigate — Navigate to a URL. Params: urlbrowser_take_screenshot — Screenshot the page (PNG)browser_click — Click an element. Params: element (description), ref (element reference)browser_evaluate — Run JavaScript in the page context. Params: function (JS code as () => { ... })Visual reconnaissance (always do this first):
browser_navigate url="https://docs.example.com/getting-started"
browser_take_screenshot name="docs-layout"
Look at the screenshot to understand: Where is the content? Where is the nav? Are there overlays to dismiss?
Convert page content to markdown with Turndown:
new Promise((resolve) => {
const s = document.createElement('script');
s.src = 'https://unpkg.com/turndown/dist/turndown.js';
s.onload = () => {
const td = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });
const el = document.querySelector('main');
const clone = el.cloneNode(true);
clone.querySelectorAll('button').forEach(e => e.remove());
clone.querySelectorAll('h1 > div, h2 > div, h3 > div, h4 > div').forEach(e => e.remove());
resolve(td.turndown(clone.innerHTML));
};
document.head.appendChild(s);
});
Turndown persists in the page after loading — subsequent navigations to the same origin won't need to reload it.
Dismiss cookie banners or overlays:
browser_click selector="button[aria-label='Accept cookies']"
browser_take_screenshot name="after-dismiss"
Extract all links from navigation:
browser_evaluate script="Array.from(document.querySelectorAll('nav a[href]')).map(a => ({text: a.textContent.trim(), href: a.href}))"
Some sites load content dynamically after initial page load. If Turndown returns empty/short content:
new Promise(r => setTimeout(r, 2000)) then retryIf the site returns 429 errors:
This skill works with publicly accessible documentation. For authenticated docs:
For large sites: