From claude-seo
Audits technical SEO across 9 categories: crawlability, indexability, security, URL structure, mobile, Core Web Vitals, structured data, JS rendering, IndexNow. Provide site URL.
npx claudepluginhub agricidaniel/claude-seo --plugin claude-seoThis skill uses the workspace's default tool permissions.
- robots.txt: exists, valid, not blocking important resources
Guides strict Test-Driven Development (TDD): write failing tests first for features, bugfixes, refactors before any production code. Enforces red-green-refactor cycle.
Guides systematic root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Guides A/B test setup with mandatory gates for hypothesis validation, metrics definition, sample size calculation, and execution readiness checks.
As of 2025-2026, AI companies actively crawl the web to train models and power AI search. Managing these crawlers via robots.txt is a critical technical SEO consideration.
Known AI crawlers:
| Crawler | Company | robots.txt token | Purpose |
|---|---|---|---|
| GPTBot | OpenAI | GPTBot | Model training |
| ChatGPT-User | OpenAI | ChatGPT-User | Real-time browsing |
| ClaudeBot | Anthropic | ClaudeBot | Model training |
| PerplexityBot | Perplexity | PerplexityBot | Search index + training |
| Bytespider | ByteDance | Bytespider | Model training |
| Google-Extended | Google-Extended | Gemini training (NOT search) | |
| CCBot | Common Crawl | CCBot | Open dataset |
Key distinctions:
Google-Extended prevents Gemini training use but does NOT affect Google Search indexing or AI Overviews (those use Googlebot)GPTBot prevents OpenAI training but does NOT prevent ChatGPT from citing your content via browsing (ChatGPT-User)Example, selective AI crawler blocking:
# Allow search indexing, block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
# Allow all other crawlers (including Googlebot for search)
User-agent: *
Allow: /
Recommendation: Consider your AI visibility strategy before blocking. Being cited by AI systems drives brand awareness and referral traffic. Cross-reference the seo-geo skill for full AI visibility optimization.
Google updated its JavaScript SEO documentation in December 2025 with critical clarifications:
<meta name="robots" content="noindex"> but JavaScript removes it, Google MAY still honor the noindex from raw HTML. Serve correct robots directives in the initial HTML response.Best practice: Serve critical SEO elements (canonical, meta robots, structured data, title, meta description) in the initial server-rendered HTML rather than relying on JavaScript injection.
| Category | Status | Score |
|---|---|---|
| Crawlability | pass/warn/fail | XX/100 |
| Indexability | pass/warn/fail | XX/100 |
| Security | pass/warn/fail | XX/100 |
| URL Structure | pass/warn/fail | XX/100 |
| Mobile | pass/warn/fail | XX/100 |
| Core Web Vitals | pass/warn/fail | XX/100 |
| Structured Data | pass/warn/fail | XX/100 |
| JS Rendering | pass/warn/fail | XX/100 |
| IndexNow | pass/warn/fail | XX/100 |
If DataForSEO MCP tools are available, use on_page_instant_pages for real page analysis (status codes, page timing, broken links, on-page checks), on_page_lighthouse for Lighthouse audits (performance, accessibility, SEO scores), and domain_analytics_technologies_domain_technologies for technology stack detection.
If Google API credentials are configured, use python scripts/pagespeed_check.py <url> --json for real PSI + CrUX field data (replaces lab-only CWV estimates), python scripts/crux_history.py <url> --json for 25-week CWV trends, and python scripts/gsc_inspect.py <url> --json for real indexation status per URL.
| Scenario | Action |
|---|---|
| URL unreachable | Report connection error with status code. Suggest verifying URL, checking DNS resolution, and confirming the site is publicly accessible. |
| robots.txt not found | Note that no robots.txt was detected at the root domain. Recommend creating one with appropriate directives. Continue audit on remaining categories. |
| HTTPS not configured | Flag as a critical issue. Report whether HTTP is served without redirect, mixed content exists, or SSL certificate is missing/expired. |
| Core Web Vitals data unavailable | Note that CrUX data is not available (common for low-traffic sites). Suggest using Lighthouse lab data as a proxy and recommend increasing traffic before re-testing. |