Search everything...

Stats

Actions

Available In

docpull

Name: docpull
Author: raintree-technology

By raintree-technology

Fetch and index documentation from any URL into Claude Code, with conditional-GET caching and searchable access via MCP tools. Supports fast-moving libraries like Next.js, FastAPI, LangChain, and React, enabling grounded answers with source citations.

Publisher marketplacedocpull@docpull · marketplace and plugin share one repository (raintree-technology/docpull)

npx claudepluginhub raintree-technology/docpull --plugin docpull

Popularity

Stars

Top 25%

Med: 0·Avg: 825

Copy clicks

Med: 0·Avg: 2

What's Inside

Slash Commands5

Add docs to this session

/docs-add

Fetch documentation for a library and make it searchable in this session. Accepts a built-in alias (e.g. "react"), an HTTPS URL, or "name url" to register a custom alias.

List cached docs

/docs-list

List documentation libraries currently cached locally, with last-fetched age.

Refresh cached docs

/docs-refresh

Re-fetch a cached library, ignoring the 7-day cache. Use when docs have been updated upstream.

Remove a docs source

/docs-remove

Remove a user-defined source alias from sources.yaml, optionally deleting its cached docs.

Search fetched docs

/docs-search

Search fetched docs by regex and pull surrounding context for the best hits. Optionally restrict to one library.

Skills1

docpull-research

/docpull-research

Use the docpull MCP tools (list_indexed, ensure_docs, grep_docs, read_doc, fetch_url) to ground answers in real documentation when the user asks about a specific library, framework, or API — especially for fast-moving libraries (Next.js, FastAPI, LangChain, Pydantic, React, Tailwind, Drizzle, Prisma, Anthropic SDK, etc.) where training data is likely stale or incomplete. Activate on questions like "how do I X in [library]", "what's the API for [framework].[method]", "show me how [library] handles Y", or when a user pastes a docs URL.

MCP Servers1

docpull

Stats

Version0.2.0

ReleasedApr 24, 2026

LanguagePython

Stars21

Forks2

MaintenanceGood

LicenseMIT

Last CommitApr 26, 2026

AddedApr 26, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

docpull23

README

docpull

Security-hardened, browser-free crawler that turns static documentation sites into clean, AI-ready Markdown — fast.

docpull uses async HTTP (not Playwright) to fetch server-rendered pages, extracts main content, and writes clean Markdown with source-URL frontmatter — in seconds, with a small install footprint. It won't render JavaScript, but for the large class of docs that don't need it (API references, Python/Go stdlib, most dev-tool docs, OpenAPI specs, Next.js and Docusaurus builds), it is a fast, auditable, sandbox-friendly way to pipe documentation into an LLM context, a RAG index, or an offline archive. SSRF, XXE, DNS-rebinding, and CRLF-injection protections are on by default — a necessity when an AI agent is choosing the URLs.

Install

pip install docpull

# Optional extras
pip install 'docpull[llm]'           # tiktoken for token-accurate chunking
pip install 'docpull[trafilatura]'   # alternative extractor for noisy pages
pip install 'docpull[mcp]'           # run as an MCP server for AI agents
pip install 'docpull[all]'           # everything above

Quick start

# Crawl and save Markdown
docpull https://docs.example.com

# One page, no crawl — the fast path for agents
docpull https://docs.example.com/guide --single

# LLM-ready NDJSON with 4k-token chunks streamed to stdout
docpull https://docs.example.com --profile llm --stream | jq .

# Mirror a site for offline use
docpull https://docs.example.com --profile mirror --cache

Framework-aware extraction

docpull inspects each page before running the generic extractor and can pull content directly from framework data feeds:

Framework	Strategy
Next.js	Parses `__NEXT_DATA__` JSON
Mintlify	`__NEXT_DATA__` with Mintlify tagging
OpenAPI	Renders `openapi.json` / `swagger.json` into Markdown
Docusaurus	Detected and tagged; generic extractor produces Markdown
Sphinx	Detected and tagged; generic extractor produces Markdown

JS-only SPAs with no server-rendered content are detected and skipped with a clear reason (or, with --strict-js-required, reported as an error so agents can route elsewhere).

Agent-friendly features

--single — fetch a single URL without discovery. Designed for tool loops.
--stream — NDJSON one-record-per-line, flushed on every page, pipeable.
--max-tokens-per-file N — split each page into token-bounded chunks on heading boundaries (exact counts with tiktoken, estimate without).
--emit-chunks — write one file or record per chunk instead of per page.
--strict-js-required — hard-fail on JS-only pages instead of silently skipping.
--extractor trafilatura — swap in trafilatura for sites where the default heuristics struggle.

Python API

from docpull import fetch_one

ctx = fetch_one("https://docs.python.org/3/library/asyncio.html")
print(ctx.title, ctx.source_type)
print(ctx.markdown[:500])

Async streaming:

import asyncio
from docpull import Fetcher, DocpullConfig, ProfileName, EventType

async def main():
    cfg = DocpullConfig(
        url="https://docs.example.com",
        profile=ProfileName.LLM,  # chunked NDJSON output
    )
    async with Fetcher(cfg) as fetcher:
        async for event in fetcher.run():
            if event.type == EventType.FETCH_PROGRESS:
                print(f"{event.current}/{event.total}: {event.url}")
        print(f"Done: {fetcher.stats.pages_fetched} pages")

asyncio.run(main())

Single-page from an agent tool:

from docpull import Fetcher, DocpullConfig

async def tool_call(url: str) -> str:
    async with Fetcher(DocpullConfig(url=url)) as f:
        ctx = await f.fetch_one(url, save=False)
        return ctx.markdown or ctx.error or ""

Profiles

docpull https://site.com --profile rag      # Default. Dedup, rich metadata.
docpull https://site.com --profile llm      # NDJSON + chunks + metadata.
docpull https://site.com --profile mirror   # Full archive, polite, cached.
docpull https://site.com --profile quick    # Sampling: 50 pages, depth 2.

MCP server

docpull ships an MCP (Model Context Protocol) server so AI agents can call it directly over stdio:

View full README on GitHub

docpull

Popularity

What's Inside

Confidence

README

docpull

Install

Quick start

Framework-aware extraction

Agent-friendly features

Python API

Profiles

MCP server

Similar Plugins

docpull

indexandria

import

firecrawl-scraper

okf-frontmatter

fastCRW

More by raintree-technology

apple-hig-skills

docpull

docpull

Install

Quick start

Framework-aware extraction

Agent-friendly features

Python API

Profiles

MCP server

Popularity

Health & Quality

More by raintree-technology

apple-hig-skills

docpull

Similar Plugins

docpull

indexandria

import

firecrawl-scraper

okf-frontmatter

fastCRW