From claude-ecosystem
Manages Claude documentation: local index search/discovery/resolution by keywords/tags/natural language, sitemap scraping, metadata/alias handling, drift detection.
npx claudepluginhub melodic-software/claude-code-plugins --plugin claude-ecosystemThis skill is limited to using the following tools:
> **STOP - Before using this skill for ANY Claude Code documentation query:**
DEPENDENCIES.mdRUN_TESTS.mdcanonical/anthropic-com/engineering/AI-resistant-technical-evaluations.mdcanonical/anthropic-com/engineering/a-postmortem-of-three-recent-issues.mdcanonical/anthropic-com/engineering/advanced-tool-use.mdcanonical/anthropic-com/engineering/building-agents-with-the-claude-agent-sdk.mdcanonical/anthropic-com/engineering/building-c-compiler.mdcanonical/anthropic-com/engineering/claude-code-sandboxing.mdcanonical/anthropic-com/engineering/code-execution-with-mcp.mdcanonical/anthropic-com/engineering/demystifying-evals-for-ai-agents.mdcanonical/anthropic-com/engineering/desktop-extensions.mdcanonical/anthropic-com/engineering/effective-context-engineering-for-ai-agents.mdcanonical/anthropic-com/engineering/effective-harnesses-for-long-running-agents.mdcanonical/anthropic-com/engineering/equipping-agents-for-the-real-world-with-agent-skills.mdcanonical/anthropic-com/engineering/eval-awareness-browsecomp.mdcanonical/anthropic-com/engineering/infrastructure-noise.mdcanonical/anthropic-com/engineering/writing-tools-for-agents.mdcanonical/anthropic-com/news/accelerating-scientific-research.mdcanonical/anthropic-com/news/acquires-vercept.mdcanonical/anthropic-com/news/advancing-claude-for-education.mdManages Claude Code docs lifecycle: scrape official sources, validate index integrity/drift, refresh/rebuild index, clear cache. Invoke via /docs-ops <action>.
Manages OpenAI Codex CLI docs: scraping from llms.txt, keyword/NL search, doc_id/alias resolution, index maintenance via Python/Bash scripts.
Manages Gemini CLI docs: scraping from llms.txt/sites, local storage, keyword/NL/tag search, index rebuild/maintenance, doc resolution via scripts.
Share bugs, ideas, or general feedback.
STOP - Before using this skill for ANY Claude Code documentation query:
IF YOU ARE THE MAIN AGENT, you MUST invoke BOTH sources in the same message:
- This skill (
docs-management) - local cache, token-efficientclaude-code-guidesubagent - live web search[Skill tool: docs-management] "Find documentation about {topic}" [Task tool: claude-code-guide] (SAME MESSAGE - USE THIS EXACT PROMPT) "First WebFetch https://code.claude.com/docs/en/claude_code_docs_map.md to find relevant doc pages about {topic}. Then WebFetch those specific pages. Use WebSearch only if needed for additional context. Do NOT use Skill tool (not available). Return key findings with source URLs."⚠️ CRITICAL: claude-code-guide does NOT have Skill tool access. Always prompt it to use WebSearch/WebFetch explicitly. If you see "No such tool: Skill" error, you prompted it wrong.
This is AUTOMATIC. Do NOT wait for user to ask for it.
IF YOU ARE A SUBAGENT: Note in your response that main agent should also query
claude-code-guide.
ABSOLUTE PROHIBITION: NEVER use cd with && in PowerShell when running scripts from this skill.
The Problem: If your current working directory is already inside the skill directory, using relative paths causes PowerShell to resolve paths relative to the current directory instead of the repository root, resulting in path doubling.
REQUIRED Solutions (choose one):
cd with &&)NEVER DO THIS:
cd with &&: cd <relative-path> && python <script> causes path doublingFor all scripts: Always run from repository root using relative paths, OR use helper scripts that handle path resolution automatically.
The file exceeds 25,000 tokens and will ALWAYS fail. You MUST use scripts.
✅ REQUIRED: ALWAYS use manage_index.py scripts for ANY index.yaml access:
python scripts/management/manage_index.py count
python scripts/management/manage_index.py list
python scripts/management/manage_index.py get <doc_id>
python scripts/management/manage_index.py verify
All scripts automatically handle large files via index_manager.py.
Use the consolidated docs-ops skill for common workflows:
/claude-ecosystem:docs-ops scrape - Scrape all configured Claude documentation sources, then refresh index and validate/claude-ecosystem:docs-ops refresh - Refresh the local index and metadata without scraping from remote sources/claude-ecosystem:docs-ops validate - Validate the index and references for consistency and drift without scraping/claude-ecosystem:docs-ops rebuild-index - Force rebuild the search index from filesystem/claude-ecosystem:docs-ops clear-cache - Clear the documentation search cacheThis skill provides automation tooling for documentation management. It manages:
Core value: Prevents link rot, enables offline access, optimizes token costs, automates maintenance, and provides resilient doc_id-based references.
| I want to... | Command | Example |
|---|---|---|
| Find docs by keywords | search | search skills progressive |
| Find docs by natural language | query | query "how to create skills" |
| Get full document content | content | content code-claude-com-docs-en-skills |
| Get specific section | content --section | content <doc_id> --section "metadata" |
| Resolve doc_id to path | resolve | resolve code-claude-com-docs-en-skills |
| Find related docs | related | related code-claude-com-docs-en-skills |
| List docs by category | category | category api |
| List docs by tag | tag | tag skills |
Key Workflow: search by keywords → Get doc_id from results → content by doc_id
Important distinctions:
search and query accept flexible keywords (no full doc_id needed)content, resolve, and related require full doc_id (e.g., code-claude-com-docs-en-skills)This skill should be used when:
CRITICAL: This section is the authoritative source for Claude Code documentation access patterns.
Skills cannot spawn subagents. Only the main conversation thread can use the Task tool.
This means:
claude-code-guide in parallelclaude-code-guide for live coverageIf you are the main agent:
claude-code-guide subagent in the SAME messageIf you are a subagent:
claude-code-guide yourself (architectural constraint)⚠️ MANDATORY DEFAULT BEHAVIOR - NOT OPTIONAL:
When ANY Claude Code documentation query is detected, the main agent MUST automatically:
docs-management skill (local cache)claude-code-guide subagent (live web) in the same messageThis is automatic. The user does NOT need to ask for it.
Use both sources automatically when user asks about:
# Main agent sends BOTH in a single message (AUTOMATIC):
[Skill tool: docs-management]
"Find documentation about {topic}"
[Task tool: claude-code-guide] (same message = parallel execution)
"First WebFetch https://code.claude.com/docs/en/claude_code_docs_map.md to find
relevant doc pages about {topic}. Then WebFetch those specific pages. Use WebSearch
only if needed for additional context. Do NOT use Skill tool (not available).
Return key findings with source URLs."
IMPORTANT: claude-code-guide is a built-in subagent with tools: Glob, Grep, Read, WebFetch, WebSearch.
It does NOT have the Skill tool - it's designed for web search, not local skill invocation.
Always prompt it to use WebSearch/WebFetch explicitly.
After both complete:
| Source | Invoke Via | Strengths |
|---|---|---|
docs-management (this skill) | Skill tool | Fast, token-efficient (60-90% savings), hierarchical categories, offline |
claude-code-guide | Task tool | Always current, web search, fetches live URLs |
claude-code-guide only| User Query | What Happens (Automatic) |
|---|---|
| "How do hooks work?" | Both sources invoked automatically |
| "What's the CLAUDE.md syntax?" | Both sources invoked automatically |
| "Help me set up MCP" | Both sources invoked automatically |
| Any Claude Code topic | Both sources invoked automatically |
There is no manual trigger. This is default behavior for Claude Code documentation queries.
When troubleshooting errors, bugs, or unexpected behavior, use three sources in parallel:
| Source | Agent/Skill | Purpose |
|---|---|---|
| Official Docs | docs-management skill | Correct usage, configuration |
| GitHub Issues | claude-code-issue-researcher agent | Known bugs, workarounds |
| Live Web | claude-code-guide subagent | Current discussions |
Troubleshooting triggers (automatically detected by hook):
Example prompt for claude-code-issue-researcher:
Search GitHub issues in anthropics/claude-code for: [ERROR/PROBLEM DESCRIPTION].
Check both open and closed issues. Report issue numbers, status, and any workarounds.
When spawning docs-researcher (or any subagent that uses this skill), the main agent should ALSO spawn claude-code-guide in the same message for comprehensive coverage:
# Main agent spawns BOTH in a single message:
[Task tool: docs-researcher subagent]
"Research Claude Code memory/CLAUDE.md files"
[Task tool: claude-code-guide subagent] (same message = parallel)
"Use WebSearch to find current Claude Code documentation about memory and
CLAUDE.md files on code.claude.com. Return key findings with URLs."
# After both complete, synthesize results
Why both? The docs-researcher uses local cache (fast, token-efficient), while claude-code-guide searches live web (always current). Together they provide comprehensive coverage.
The index includes category hierarchy from the official Claude Code docs map:
| Category | Topics |
|---|---|
| Getting started | overview, quickstart, common-workflows |
| Build with Claude Code | sub-agents, plugins, skills, hooks, mcp, output-styles |
| Deployment | amazon-bedrock, google-vertex-ai, sandboxing |
| Administration | setup, iam, security, costs |
| Configuration | settings, vs-code, jetbrains, memory |
| Reference | cli-reference, slash-commands, hooks |
| Resources | troubleshooting, legal-and-compliance |
Categories are stored in doc_map_category field. Query by category:
resolver.get_by_category("Build with Claude Code") # Returns all docs in category
resolver.list_categories() # Returns all categories with counts
After scraping, update categories from the official docs map:
python scripts/core/enrich_categories.py # Update categories
python scripts/core/enrich_categories.py --dry-run # Preview changes
CRITICAL: This section defines HOW to execute operations in this skill.
For ALL scraping, validation, and index operations, delegate execution to a general-purpose Task agent.
How to invoke:
Use the Task tool with:
subagent_type: "general-purpose"description: Short 3-5 word descriptionprompt: Full task description with execution instructionsScripts run in FOREGROUND by default. Do NOT background them.
When Task agents execute scripts:
python .claude/skills/docs-management/scripts/core/scrape_all_sources.py --parallel --skip-existingrun_in_background=true: Scripts are designed for foreground execution&, no nohup, no background process managementRed flags indicating incorrect execution:
🚩 Using run_in_background=true in Bash tool
🚩 Repeated BashOutput calls in a loop
🚩 Checking process status with ps or pgrep
🚩 Manual polling of script output
🚩 Background job management (&, nohup, jobs)
🚩 Using BashOutput AFTER Task agent completes ← CRITICAL RED FLAG
If you recognize these patterns, STOP and correct immediately.
CRITICAL: When the Task agent reports "Done", READ its report and summarize to the user. DO NOT use BashOutput.
Correct workflow:
CRITICAL: Report ALL errors, warnings, and issues - never suppress or ignore them.
When executing scripts via Task agents:
Red flags that indicate issues:
🚩 Non-zero exit code 🚩 Lines containing "ERROR", "FAILED", "Exception", "Traceback" 🚩 "WARNING" or "WARN" messages 🚩 "404 Not Found", "500 Internal Server Error"
CRITICAL: When reporting scraping results, distinguish behavior by domain.
Domain-Specific .md URL Behavior:
try_markdown: false)Accurate Reporting:
✅ Good (Domain-Specific): "docs.claude.com: 97 URLs using direct .md (97 skipped/unchanged). anthropic.com: 164 URLs using HTML conversion (158 skipped/unchanged)."
❌ Bad (Misleading): "All .md URL attempts returned 404 (expected - these are HTML pages)" ← This is misleading because Claude domains successfully use .md URLs
Use this when you want to rebuild and validate the local index/metadata without scraping:
⚠️ IMPORTANT: Use Python 3.13 for validation - spaCy/Pydantic have compatibility issues with Python 3.14+
# Use Python 3.13 for full compatibility with spaCy
py -3.13 .claude/skills/docs-management/scripts/management/refresh_index.py
Optional flags:
# Check for missing files before rebuilding
py -3.13 .claude/skills/docs-management/scripts/management/refresh_index.py --check-missing-files
# Detect drift (404s, missing files) after rebuilding
py -3.13 .claude/skills/docs-management/scripts/management/refresh_index.py --check-drift
# Detect and automatically cleanup drift
py -3.13 .claude/skills/docs-management/scripts/management/refresh_index.py --check-drift --cleanup-drift
This script runs the full pipeline:
Expected runtime: ~20-30 seconds for ~500 documents
/scrape-official-docs)Use this when the user explicitly wants to hit the network and scrape docs:
# Step 1: Scrape documentation (Python 3.14+ works)
python .claude/skills/docs-management/scripts/core/scrape_all_sources.py \
--parallel \
--skip-existing
# Step 2: IMMEDIATELY run validation after scraping completes
# ⚠️ Use Python 3.13 for validation (spaCy compatibility)
py -3.13 .claude/skills/docs-management/scripts/management/refresh_index.py
# Step 3: Clean up aged-out Anthropic articles (reads max_age from sources.json)
python .claude/skills/docs-management/scripts/maintenance/cleanup_old_anthropic_docs.py --execute
Since --auto-validate is now default: False (for speed), you MUST run validation and cleanup separately immediately after scraping.
Optional: Detect and cleanup drift after scraping:
# Auto-cleanup workflow (detect and cleanup in one flag)
python .claude/skills/docs-management/scripts/core/scrape_all_sources.py \
--parallel \
--skip-existing \
--auto-cleanup
# Then validate (use Python 3.13)
py -3.13 .claude/skills/docs-management/scripts/management/refresh_index.py
⚠️ CRITICAL: Global flags MUST come BEFORE the subcommand:
# ✅ CORRECT - global flags before subcommand
python find_docs.py --json --limit 10 search skills frontmatter
# ❌ WRONG - global flags after subcommand (will fail with "unrecognized arguments")
python find_docs.py search skills frontmatter --json
Examples:
# Resolve doc_id to file path
python .claude/skills/docs-management/scripts/core/find_docs.py resolve <doc_id>
# Search by keywords (default: 25 results)
python .claude/skills/docs-management/scripts/core/find_docs.py search skills progressive-disclosure
# Search with custom limit (global options come before subcommand)
python .claude/skills/docs-management/scripts/core/find_docs.py --limit 10 search skills
# Search without limit (returns all matching results)
python .claude/skills/docs-management/scripts/core/find_docs.py --no-limit search skills
# Search with minimum score threshold (filters low-relevance results)
python .claude/skills/docs-management/scripts/core/find_docs.py --min-score 20 search skills
# Natural language search
python .claude/skills/docs-management/scripts/core/find_docs.py query "how to create skills"
# List by category
python .claude/skills/docs-management/scripts/core/find_docs.py category api
# List by tag
python .claude/skills/docs-management/scripts/core/find_docs.py tag skills
Search Accepts Flexible Keywords (no full doc_ids needed):
search skillssearch skills progressive disclosuresearch skill matches "skills"Search Options:
| Option | Default | Description |
|---|---|---|
--limit N | 25 | Maximum number of results to return |
--no-limit | - | Return all matching results (no limit) |
--min-score N | - | Only return results with relevance score >= N |
--fast | - | Index-only search (skip content grep) |
--separate | - | Show index matches and content matches separately |
--no-context | - | Hide grep context lines in content matches |
--clear-cache | - | Rebuild search cache before operation |
--category | - | Filter results by category |
--tags | - | Filter results by tags |
--json | - | Output results as JSON |
--verbose | - | Show relevance scores for debugging |
When results are truncated, output shows "showing X of Y total" to indicate more results are available.
By default, search performs both index and content search:
Result markers:
[SUBSECTION] - Match found in a specific document section (click to extract)[CONTENT] - Match found in file content (not in metadata index)Performance options:
--fast for index-only search (faster, but may miss content-only matches)--separate to display index and content matches in separate sectionsResults are ranked by a multi-factor scoring system:
Use --verbose to see scores for debugging relevance issues.
| Command | Purpose | Subsection Support |
|---|---|---|
resolve <doc_id> | Resolve doc_id to file path | ❌ No |
content <doc_id> | Get document content | ✅ Yes (--section) |
search <keywords> | Keyword search | ❌ No |
query "<text>" | Natural language search | ❌ No |
category <name> | List docs in category | ❌ No |
tag <name> | List docs with tag | ❌ No |
related <doc_id> | Find related documents | ❌ No |
Related Documents Example:
# Find documents related to skills documentation
python .claude/skills/docs-management/scripts/core/find_docs.py related code-claude-com-docs-en-skills
# Limit results
python .claude/skills/docs-management/scripts/core/find_docs.py --limit 5 related code-claude-com-docs-en-skills
Related documents are scored by shared tags (3x weight) and shared keywords (2x weight).
To extract a specific section from a document (60-90% token savings):
Option 1: Using find_docs.py content command:
python scripts/core/find_docs.py content code-claude-com-docs-en-skills --section "Available metadata fields"
Option 2: Using get_subsection_content.py (dedicated script):
python scripts/core/get_subsection_content.py code-claude-com-docs-en-skills \
--section "Available metadata fields"
Discover Available Sections (--list-sections):
# List all sections in a document
python scripts/core/get_subsection_content.py code-claude-com-docs-en-skills --list-sections
# Output shows hierarchical structure with heading levels
# # Agent Skills
# ## Create your first Skill
# ## How Skills work
# ### Where Skills live
Fuzzy Section Matching:
Section names support fuzzy matching - partial or word-overlap matches work:
# Exact: "Available metadata fields"
# Fuzzy: "metadata fields" -> matches "Available metadata fields"
# Fuzzy: "tool access" -> matches "Restrict tool access with allowed-tools"
python scripts/core/get_subsection_content.py code-claude-com-docs-en-skills --section "metadata fields"
# Output: Fuzzy match: 'metadata fields' -> 'Available metadata fields'
Note: The resolve command ONLY returns file paths. Use content to get actual document content with optional section extraction.
The docs-management skill uses a unified configuration system with a single source of truth.
Configuration Files:
config/defaults.yaml - Central configuration file with all default valuesconfig/config_registry.py - Canonical configuration system with environment variable supportreferences/sources.json - Documentation sources configuration (required for scraping)Path Configuration:
All paths configured in config/defaults.yaml under the paths section.
Environment Variable Overrides:
All configuration values can be overridden using environment variables: CLAUDE_DOCS_<SECTION>_<KEY>
Full details: references/technical-details.md#configuration
Required: pyyaml, requests, beautifulsoup4, markdownify
Optional (recommended): spacy, yake (for enhanced keyword extraction)
Quick setup:
python .claude/skills/docs-management/scripts/setup/setup_dependencies.py --install-required
Auto-installation: The extract-keywords command automatically installs optional dependencies if missing.
Full details: references/technical-details.md#dependencies
Fetch documentation from official sources and store in canonical storage. Features: sitemap/docs map parsing, HTML→Markdown conversion, direct .md URL fetching (30-40% token savings), automatic metadata tracking, domain-based folder organization.
Guide: references/capabilities/scraping-guide.md
Extract specific markdown sections for internal skill operations. Features: ATX-style heading structure parsing, section boundaries detection, provenance frontmatter, token economics (60-90% savings typical).
Guide: references/capabilities/extraction-guide.md
Detect new and removed documentation pages from sitemaps, and detect content changes via hash comparison. Features: new/removed page detection, content hash comparison, automatic stale marking, change reporting and audit logs.
Guide: references/capabilities/change-detection-guide.md
Discover and resolve documentation references using doc_id, keywords, or natural language queries. Features: doc_id resolution, keyword-based search, natural language query processing, subsection discovery and extraction, category and tag filtering, alias resolution.
Maintain index metadata, keywords, tags, and rebuild index from filesystem. Scripts: manage_index.py, rebuild_index.py, generate_report.py, verify_index.py.
Remove documentation that has aged out based on published_at dates. Anthropic sources (engineering, news, research) have a max_age_days threshold configured in references/sources.json. Articles older than this threshold are skipped during scraping; this cleanup removes any previously-scraped articles that have since aged out.
# Clean up aged-out Anthropic articles (dry-run by default)
# Reads max_age_days from sources.json automatically
python scripts/maintenance/cleanup_old_anthropic_docs.py
# Execute cleanup (actually delete files)
python scripts/maintenance/cleanup_old_anthropic_docs.py --execute
# Override with custom age threshold if needed
python scripts/maintenance/cleanup_old_anthropic_docs.py --max-age 90 --execute
This should be run after scraping and validation to ensure the canonical directory stays clean.
Common maintenance and operational workflows for documentation management:
Detailed Workflows: references/workflows.md
Lightweight audit:
py -3.13 .claude/skills/docs-management/scripts/validation/validate_index_vs_docs.py --summary-only
Tag configuration audit:
py -3.13 .claude/skills/docs-management/scripts/validation/audit_tag_config.py --summary-only
Full details: references/workflows.md#metadata--keyword-audit
MUST use PowerShell (recommended) or prefix Git Bash commands with MSYS_NO_PATHCONV=1
Git Bash on Windows converts Unix paths to Windows paths, breaking filter patterns.
Problem: spaCy installation fails with Python 3.14+.
Solution: The script automatically detects and uses Python 3.13 if available. No manual intervention needed!
If Python 3.13 not available: Install Python 3.13:
winget install --id Python.Python.3.13 -e --source wingetbrew install python@3.13sudo apt install python3.13Full troubleshooting: references/troubleshooting.md
The docs-management skill provides a clean public API for external tools:
from official_docs_api import (
find_document,
resolve_doc_id,
get_docs_by_tag,
get_docs_by_category,
search_by_keywords,
detect_drift,
cleanup_drift,
refresh_index
)
Full API documentation: See Public API section in original SKILL.md
For plugin-specific maintenance workflows (versioning, publishing updates, changelog):
See: references/plugin-maintenance.md
Quick reference:
manage_index.py verify and test search before pushingWhen developing this plugin locally, you may want changes to go to your dev repo instead of the installed plugin location. This skill supports explicit dev/prod mode separation via environment variable.
By default, scripts write to wherever the plugin is installed (typically ~/.claude/plugins/marketplaces/...). When OFFICIAL_DOCS_DEV_ROOT is set to a valid skill directory, all paths resolve to that location instead.
One-time setup:
# Navigate to your dev repo skill directory
cd /path/to/your/claude-code-plugins/plugins/claude-ecosystem/skills/docs-management
# Generate shell commands for your shell
python scripts/setup/enable_dev_mode.py
PowerShell:
$env:OFFICIAL_DOCS_DEV_ROOT = "D:\repos\gh\claude-code-plugins\plugins\claude-ecosystem\skills\docs-management"
Bash/Zsh:
export OFFICIAL_DOCS_DEV_ROOT="/path/to/claude-code-plugins/plugins/claude-ecosystem/skills/docs-management"
When you run any major script (scrape, refresh, rebuild), a mode banner will display:
Dev mode:
[DEV MODE] Using development skill directory:
D:\repos\gh\claude-code-plugins\plugins\claude-ecosystem\skills\docs-management
Set via: OFFICIAL_DOCS_DEV_ROOT
Canonical dir: D:\...\canonical
Prod mode:
[PROD MODE] Using installed skill directory
(Set OFFICIAL_DOCS_DEV_ROOT to enable dev mode)
OFFICIAL_DOCS_DEV_ROOT in your terminalgit diff canonical/PowerShell:
Remove-Item Env:OFFICIAL_DOCS_DEV_ROOT
Bash/Zsh:
unset OFFICIAL_DOCS_DEV_ROOT
Full history: See original SKILL.md
Date: 2025-12-29 Model: claude-opus-4-5-20251101
Audit Result: ✅ EXCEPTIONAL PASS (A+) - Score: 50/50 (100%)
Audit Type: Type B (Meta-Skill - Delegation Pattern Compliance)
Status: Production-ready. Serves as the canonical reference implementation for Type B meta-skills.