Skill

llms-txt-crawler

Parse and crawl llms.txt documentation indexes. Use this when you need to crawl docs, parse llms.txt, index documentation, fetch doc URLs, scrape docs, build a knowledge index, find what pages are in the docs, or work with llms.txt files from any site.

npx claudepluginhub jadecli/claude-knowledge-sdk-typescript

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-knowledge-sdk:llms-txt-crawler

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Parses llms.txt files (the LLM-friendly documentation standard from llmstxt.org) and crawls

Supporting Files

evals/README.mdevals/evals.jsonreferences/scrapy-config.mdscripts/generate-spider.pyscripts/parse-llms-txt.ts

SKILL.md

111 lines · ~941 tokens

Similar Skills

algorithmic-art

147.3k

Creates p5.js generative art with seeded randomness, noise fields, and interactive parameter exploration. Use for algorithmic art, flow fields, or particle systems.

3 files

document-skills

Stats

LanguageTypeScript

Stars0

MaintenanceExcellent

Last CommitMar 26, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

llms.txt Documentation Crawler

Parses llms.txt files (the LLM-friendly documentation standard from llmstxt.org) and crawls the discovered documentation URLs to build a local knowledge index.

Known llms.txt Endpoints

https://code.claude.com/docs/llms.txt — Claude Code docs (updates daily)
https://platform.claude.com/llms.txt — Platform/API docs
Any site following the llms.txt standard at {domain}/llms.txt

Process

Step 1: Fetch the llms.txt file

Use WebFetch to retrieve the llms.txt URL.
Example: WebFetch https://code.claude.com/docs/llms.txt

Step 2: Parse the llms.txt content

The llms.txt format is markdown with:

# Site Name — top-level heading
> Description — site description
## Section — doc sections
- [Title](url): Description — doc page links

Parse using the bundled parser:

npx tsx skills/llms-txt-crawler/scripts/parse-llms-txt.ts https://code.claude.com/docs/llms.txt

This outputs structured JSON with all sections and links.

Step 3: Filter by section or priority

Select which sections/URLs to crawl based on the user's request. For targeted crawling, pick specific sections. For full indexing, crawl all.

Step 4: Crawl URLs

For quick single-page fetches (inside agent loop): Use WebFetch on each URL. Good for up to ~20 pages.

For bulk multi-page crawling (outside agent loop): Generate a Scrapy spider project:

python3 skills/llms-txt-crawler/scripts/generate-spider.py ./scrapy-output urls.json

The spider uses ClaudeBot user-agent, respects robots.txt, and rate-limits to 2s between requests.

Step 5: Store in knowledge index

Save crawled content to ~/.claude/knowledge/ using the SDK's knowledge index format. The ck fetch-docs CLI command handles this automatically for known Anthropic doc sources.

Scrapy Spider Generation

For deeper crawling beyond what WebFetch handles, generate a full Scrapy project:

Parse llms.txt to get URLs (Step 2)
Run the spider generator (Step 4)
Install and run: cd scrapy-output && uv pip install -e . && scrapy crawl docs
Results in data/crawled.jsonl

See references/scrapy-config.md for ClaudeBot web scraping best practices.

Important Notes

Always check robots.txt before bulk crawling
Use ClaudeBot user-agent: ClaudeBot/1.0 (+https://claude.ai/bot; Anthropic)
Rate limit: minimum 2 second delay between requests
Maximum 4 concurrent requests for bulk crawling
WebFetch is preferred for small-scale fetching (< 20 pages)
Scrapy is preferred for bulk crawling (> 20 pages)

Evaluation

This skill includes an evaluation suite in evals/evals.json following the agentskills.io format.

Running Evals

Spawn a clean subagent per test case (no shared context between runs)
Run each prompt with the skill loaded and without for comparison
Grade assertions against output — require concrete evidence for PASS
Aggregate results into benchmark.json with pass_rate and token deltas

Workspace Structure

llms-txt-crawler-workspace/iteration-N/
  eval-name/{with_skill,without_skill}/
    outputs/      — files produced by the run
    timing.json   — {total_tokens, duration_ms}
    grading.json  — assertion results
  benchmark.json  — aggregated comparison

Use the skill-creator skill to automate evaluation runs. See evals/README.md for details.

llms-txt-crawler

Invocation

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

llms-txt-crawler

Invocation

Context Preview

Supporting Files

SKILL.md

llms.txt Documentation Crawler

Known llms.txt Endpoints

Process

Step 1: Fetch the llms.txt file

Step 2: Parse the llms.txt content

Step 3: Filter by section or priority

Step 4: Crawl URLs

Step 5: Store in knowledge index

Scrapy Spider Generation

Important Notes

Evaluation

Running Evals

Workspace Structure

Similar Skills

Help us improve

llms.txt Documentation Crawler

Known llms.txt Endpoints

Process

Step 1: Fetch the llms.txt file

Step 2: Parse the llms.txt content

Step 3: Filter by section or priority

Step 4: Crawl URLs

Step 5: Store in knowledge index

Scrapy Spider Generation

Important Notes

Evaluation

Running Evals

Workspace Structure