Help us improve
Share bugs, ideas, or general feedback.
From toolbox
Extracts URLs from XML sitemaps using direct URL, path/root /sitemap.xml probes, or robots.txt discovery, with optional regex filtering.
npx claudepluginhub leejuoh/claude-code-zero --plugin toolboxHow this skill is triggered — by the user, by Claude, or both
Slash command
/toolbox:fetch-sitemapThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract URLs from an XML sitemap with optional regex filtering.
Discovers and lists URLs on websites via Tavily CLI without extracting content. Use to find specific pages on large sites, map site structure, list pages, or prepare for targeted extraction.
Discovers and lists all URLs on a website with optional search filtering. Use to find a specific page on a large site, see site structure, or map URLs before scraping.
Analyzes existing XML sitemaps for format validity, URL checks, SEO quality, and issues; generates new ones using industry templates with quality gates and limits.
Share bugs, ideas, or general feedback.
Extract URLs from an XML sitemap with optional regex filtering.
$0: URL (required, must start with http:// or https://)
.xml, use it directly as the sitemap URL (backward compatible)$1: an extended regex pattern for filtering (optional)If $0 is empty, display the usage below and stop:
Usage: /fetch-sitemap <url> [pattern]
Examples:
/fetch-sitemap https://kotlinlang.org/docs
/fetch-sitemap https://example.com/sitemap.xml
/fetch-sitemap https://example.com docs
/fetch-sitemap https://example.com/sitemap.xml 'skills|hooks'
If $0 does not start with http:// or https://, inform the user that a valid URL is required and stop.
When the URL does not end with .xml, automatically discover the sitemap by probing the following locations one at a time, stopping as soon as one produces output (do NOT run probes in parallel):
Probes 1–2 — fetch and extract in a single curl:
{url}/sitemap.xml — path-specific (e.g., https://kotlinlang.org/docs/sitemap.xml){origin}/sitemap.xml — site root (e.g., https://kotlinlang.org/sitemap.xml), where {origin} is the scheme + host of the URLcurl -sfL --compressed --connect-timeout 5 --max-time 10 <probe-url> | grep -oE '<loc>[^<]+</loc>' | sed 's/<loc>//;s/<\/loc>//'
If the output is non-empty, the sitemap is found and the URLs are already extracted — skip the Extraction section entirely and go straight to Output. If empty, try the next probe.
Probe 3 — robots.txt (different format, two-step):
{origin}/robots.txt — fetch and parse for Sitemap: lines, use the first matchcurl -sfL --compressed --connect-timeout 5 --max-time 10 <origin>/robots.txt
If a Sitemap: line is found, use that URL and proceed to the Extraction section.
If none of the probes succeed, report an error to the user and stop:
Could not auto-discover a sitemap for <url>. Try providing the direct sitemap XML URL instead.
When a sitemap is discovered (not passed directly), print which URL was found before proceeding:
Sitemap found: <discovered-url>
If URLs were already extracted during auto-discovery, skip this entire section. If a filter pattern ($1) is provided, apply it to the already-extracted URLs in memory — do not re-fetch.
Run the following bash command to extract URLs from the sitemap:
curl -sfL --compressed --connect-timeout 10 --max-time 30 <sitemap-url> | grep -oE '<loc>[^<]+</loc>' | sed 's/<loc>//;s/<\/loc>//'
If a pattern is provided, pipe the result through grep -E '<pattern>' to filter:
curl -sfL --compressed --connect-timeout 10 --max-time 30 <sitemap-url> | grep -oE '<loc>[^<]+</loc>' | sed 's/<loc>//;s/<\/loc>//' | grep -E '<pattern>'
curl flags explained:
-s: silent mode (no progress bar)-f: fail on HTTP errors (4xx/5xx) instead of returning the error page as content-L: follow redirects--compressed: handle gzip-compressed sitemaps--connect-timeout 10: connection timeout of 10 seconds--max-time 30: total operation timeout of 30 secondsIf the curl command fails (non-zero exit code), report the error clearly to the user (e.g., "Failed to fetch sitemap: connection timed out" or "Failed to fetch sitemap: HTTP 404").
en"Never re-fetch: All URLs have already been fetched. If the user later asks to save the results to a file, use the Write tool with the already-displayed output. Never run curl again for the same sitemap.
/fetch-sitemap https://kotlinlang.org/docs — auto-discover sitemap and list all URLs/fetch-sitemap https://example.com/sitemap.xml — use direct sitemap URL/fetch-sitemap https://example.com docs — auto-discover and filter URLs containing "docs"/fetch-sitemap https://example.com/sitemap.xml 'skills|hooks' — URLs matching "skills" or "hooks"