Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Fetch, index, and search static documentation from any URL or library alias (e.g., react, nextjs, fastapi) directly into Claude Code sessions. Use conditional-GET caching for instant reloads of live docs, regex grep for context-aware searches, refresh stale caches (>7 days), manage sources, and expose as local MCP tools—no browser, API keys, or external services needed.
npx claudepluginhub raintree-technology/docpull --plugin docpullFetch documentation for a library and make it searchable in this session. Accepts a built-in alias (e.g. "react"), an HTTPS URL, or "name url" to register a custom alias.
List documentation libraries currently cached locally, with last-fetched age.
Re-fetch a cached library, ignoring the 7-day cache. Use when docs have been updated upstream.
Remove a user-defined source alias from sources.yaml, optionally deleting its cached docs.
Search fetched docs by regex and pull surrounding context for the best hits. Optionally restrict to one library.
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Pull server-rendered web content from any URL into Claude Code. Indexes sites in seconds with conditional-GET caching, then exposes them as MCP tools (fetch_url, ensure_docs, list_sources, list_indexed, grep_docs, read_doc, add_source, remove_source). Local, browser-free, no API keys.
Crawl, index, and search web documentation directly inside Claude Code. Quick mode for focused lookups, deep mode for up to 100 pages with in-memory search.
Import external documentation locally - bypass AI-blocking sites via Context7, WebFetch, or Playwright
Zero-config knowledge base MCP server — search, manage, and embed documentation (SQLite default, PostgreSQL optional)
Fast local documentation search with llms.txt indexing. Search 12K+ line docs in 6ms with line-accurate citations. One command (/blz) and one agent (@blz:blazer) for all documentation operations.
Claude Code skill pack for FireCrawl (30 skills)
Pull server-rendered web content from any URL into Claude Code. Indexes sites in seconds with conditional-GET caching, then exposes them as MCP tools (fetch_url, ensure_docs, list_sources, list_indexed, grep_docs, read_doc, add_source, remove_source). Local, browser-free, no API keys.
Security-hardened, browser-free crawler that turns static documentation sites into clean, AI-ready Markdown — fast.
docpull uses async HTTP (not Playwright) to fetch server-rendered pages, extracts main content, and writes clean Markdown with source-URL frontmatter — in seconds, with a small install footprint. It won't render JavaScript, but for the large class of docs that don't need it (API references, Python/Go stdlib, most dev-tool docs, OpenAPI specs, Next.js and Docusaurus builds), it is a fast, auditable, sandbox-friendly way to pipe documentation into an LLM context, a RAG index, or an offline archive. SSRF, XXE, DNS-rebinding, and CRLF-injection protections are on by default — a necessity when an AI agent is choosing the URLs.
pip install docpull
# Optional extras
pip install 'docpull[llm]' # tiktoken for token-accurate chunking
pip install 'docpull[trafilatura]' # alternative extractor for noisy pages
pip install 'docpull[mcp]' # run as an MCP server for AI agents
pip install 'docpull[all]' # everything above
# Crawl and save Markdown
docpull https://docs.example.com
# One page, no crawl — the fast path for agents
docpull https://docs.example.com/guide --single
# LLM-ready NDJSON with 4k-token chunks streamed to stdout
docpull https://docs.example.com --profile llm --stream | jq .
# Mirror a site for offline use
docpull https://docs.example.com --profile mirror --cache
docpull inspects each page before running the generic extractor and can pull content directly from framework data feeds:
| Framework | Strategy |
|---|---|
| Next.js | Parses __NEXT_DATA__ JSON |
| Mintlify | __NEXT_DATA__ with Mintlify tagging |
| OpenAPI | Renders openapi.json / swagger.json into Markdown |
| Docusaurus | Detected and tagged; generic extractor produces Markdown |
| Sphinx | Detected and tagged; generic extractor produces Markdown |
JS-only SPAs with no server-rendered content are detected and skipped with a
clear reason (or, with --strict-js-required, reported as an error so agents
can route elsewhere).
--single — fetch a single URL without discovery. Designed for tool loops.--stream — NDJSON one-record-per-line, flushed on every page, pipeable.--max-tokens-per-file N — split each page into token-bounded chunks on
heading boundaries (exact counts with tiktoken, estimate without).--emit-chunks — write one file or record per chunk instead of per page.--strict-js-required — hard-fail on JS-only pages instead of silently
skipping.--extractor trafilatura — swap in trafilatura
for sites where the default heuristics struggle.from docpull import fetch_one
ctx = fetch_one("https://docs.python.org/3/library/asyncio.html")
print(ctx.title, ctx.source_type)
print(ctx.markdown[:500])
Async streaming:
import asyncio
from docpull import Fetcher, DocpullConfig, ProfileName, EventType
async def main():
cfg = DocpullConfig(
url="https://docs.example.com",
profile=ProfileName.LLM, # chunked NDJSON output
)
async with Fetcher(cfg) as fetcher:
async for event in fetcher.run():
if event.type == EventType.FETCH_PROGRESS:
print(f"{event.current}/{event.total}: {event.url}")
print(f"Done: {fetcher.stats.pages_fetched} pages")
asyncio.run(main())
Single-page from an agent tool:
from docpull import Fetcher, DocpullConfig
async def tool_call(url: str) -> str:
async with Fetcher(DocpullConfig(url=url)) as f:
ctx = await f.fetch_one(url, save=False)
return ctx.markdown or ctx.error or ""
docpull https://site.com --profile rag # Default. Dedup, rich metadata.
docpull https://site.com --profile llm # NDJSON + chunks + metadata.
docpull https://site.com --profile mirror # Full archive, polite, cached.
docpull https://site.com --profile quick # Sampling: 50 pages, depth 2.
docpull ships an MCP (Model Context Protocol) server so AI agents can call it directly over stdio: