Extracts URLs from XML sitemaps using direct URL, path/root /sitemap.xml probes, or robots.txt discovery, with optional regex filtering.
From toolboxnpx claudepluginhub leejuoh/claude-code-zero --plugin toolboxThis skill is limited to using the following tools:
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Extract URLs from an XML sitemap with optional regex filtering.
$0: URL (required, must start with http:// or https://)
.xml, use it directly as the sitemap URL (backward compatible)$1: an extended regex pattern for filtering (optional)If $0 is empty, display the usage below and stop:
Usage: /fetch-sitemap <url> [pattern]
Examples:
/fetch-sitemap https://kotlinlang.org/docs
/fetch-sitemap https://example.com/sitemap.xml
/fetch-sitemap https://example.com docs
/fetch-sitemap https://example.com/sitemap.xml 'skills|hooks'
If $0 does not start with http:// or https://, inform the user that a valid URL is required and stop.
When the URL does not end with .xml, automatically discover the sitemap by probing the following locations one at a time, stopping as soon as one produces output (do NOT run probes in parallel):
Probes 1–2 — fetch and extract in a single curl:
{url}/sitemap.xml — path-specific (e.g., https://kotlinlang.org/docs/sitemap.xml){origin}/sitemap.xml — site root (e.g., https://kotlinlang.org/sitemap.xml), where {origin} is the scheme + host of the URLcurl -sfL --compressed --connect-timeout 5 --max-time 10 <probe-url> | grep -oE '<loc>[^<]+</loc>' | sed 's/<loc>//;s/<\/loc>//'
If the output is non-empty, the sitemap is found and the URLs are already extracted — skip the Extraction section entirely and go straight to Output. If empty, try the next probe.
Probe 3 — robots.txt (different format, two-step):
{origin}/robots.txt — fetch and parse for Sitemap: lines, use the first matchcurl -sfL --compressed --connect-timeout 5 --max-time 10 <origin>/robots.txt
If a Sitemap: line is found, use that URL and proceed to the Extraction section.
If none of the probes succeed, report an error to the user and stop:
Could not auto-discover a sitemap for <url>. Try providing the direct sitemap XML URL instead.
When a sitemap is discovered (not passed directly), print which URL was found before proceeding:
Sitemap found: <discovered-url>
If URLs were already extracted during auto-discovery, skip this entire section. If a filter pattern ($1) is provided, apply it to the already-extracted URLs in memory — do not re-fetch.
Run the following bash command to extract URLs from the sitemap:
curl -sfL --compressed --connect-timeout 10 --max-time 30 <sitemap-url> | grep -oE '<loc>[^<]+</loc>' | sed 's/<loc>//;s/<\/loc>//'
If a pattern is provided, pipe the result through grep -E '<pattern>' to filter:
curl -sfL --compressed --connect-timeout 10 --max-time 30 <sitemap-url> | grep -oE '<loc>[^<]+</loc>' | sed 's/<loc>//;s/<\/loc>//' | grep -E '<pattern>'
curl flags explained:
-s: silent mode (no progress bar)-f: fail on HTTP errors (4xx/5xx) instead of returning the error page as content-L: follow redirects--compressed: handle gzip-compressed sitemaps--connect-timeout 10: connection timeout of 10 seconds--max-time 30: total operation timeout of 30 secondsIf the curl command fails (non-zero exit code), report the error clearly to the user (e.g., "Failed to fetch sitemap: connection timed out" or "Failed to fetch sitemap: HTTP 404").
en"Never re-fetch: All URLs have already been fetched. If the user later asks to save the results to a file, use the Write tool with the already-displayed output. Never run curl again for the same sitemap.
/fetch-sitemap https://kotlinlang.org/docs — auto-discover sitemap and list all URLs/fetch-sitemap https://example.com/sitemap.xml — use direct sitemap URL/fetch-sitemap https://example.com docs — auto-discover and filter URLs containing "docs"/fetch-sitemap https://example.com/sitemap.xml 'skills|hooks' — URLs matching "skills" or "hooks"