From tavily
Crawls websites to extract content from multiple pages via Tavily CLI. Saves pages as local markdown files with depth/breadth limits, path filtering, and semantic instructions. Use for bulk doc downloads or site content collection.
npx claudepluginhub tavily-ai/skills --plugin tavilyThis skill is limited to using the following tools:
Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.
Crawls websites to bulk extract content from multiple pages or site sections like /docs, supporting depth limits, path filtering, concurrency, and JSON output via firecrawl CLI.
Scrapes webpages to markdown, takes screenshots, extracts structured data, searches web, and crawls sites like documentation using Firecrawl API. Use for fetching live web content or framework docs.
Scrapes URLs to markdown/HTML/JSON, crawls websites with BFS, searches web, maps URLs, and extracts structured data via Firecrawl MCP for AI agents.
Share bugs, ideas, or general feedback.
Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.
If tvly is not found on PATH, install it first:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
Do not skip this step or fall back to other tools.
See tavily-cli for alternative install methods and auth options.
/docs/)# Basic crawl
tvly crawl "https://docs.example.com" --json
# Save each page as a markdown file
tvly crawl "https://docs.example.com" --output-dir ./docs/
# Deeper crawl with limits
tvly crawl "https://docs.example.com" --max-depth 2 --limit 50 --json
# Filter to specific paths
tvly crawl "https://example.com" --select-paths "/api/.*,/guides/.*" --exclude-paths "/blog/.*" --json
# Semantic focus (returns relevant chunks, not full pages)
tvly crawl "https://docs.example.com" --instructions "Find authentication docs" --chunks-per-source 3 --json
| Option | Description |
|---|---|
--max-depth | Levels deep (1-5, default: 1) |
--max-breadth | Links per page (default: 20) |
--limit | Total pages cap (default: 50) |
--instructions | Natural language guidance for semantic focus |
--chunks-per-source | Chunks per page (1-5, requires --instructions) |
--extract-depth | basic (default) or advanced |
--format | markdown (default) or text |
--select-paths | Comma-separated regex patterns to include |
--exclude-paths | Comma-separated regex patterns to exclude |
--select-domains | Comma-separated regex for domains to include |
--exclude-domains | Comma-separated regex for domains to exclude |
--allow-external / --no-external | Include external links (default: allow) |
--include-images | Include images |
--timeout | Max wait (10-150 seconds) |
-o, --output | Save JSON output to file |
--output-dir | Save each page as a .md file in directory |
--json | Structured JSON output |
For agentic use (feeding results to an LLM):
Always use --instructions + --chunks-per-source. Returns only relevant chunks instead of full pages — prevents context explosion.
tvly crawl "https://docs.example.com" --instructions "API authentication" --chunks-per-source 3 --json
For data collection (saving to files):
Use --output-dir without --chunks-per-source to get full pages as markdown files.
tvly crawl "https://docs.example.com" --max-depth 2 --output-dir ./docs/
--max-depth 1, --limit 20 — and scale up.--select-paths to focus on the section you need.--limit to prevent runaway crawls.