From docs-tools
Extracts distinct article URLs from Red Hat documentation TOC pages by parsing nav elements. Filters duplicates, sorts results, outputs JSON or list. Useful for crawling multi-page docs sections.
npx claudepluginhub redhat-documentation/redhat-docs-agent-tools --plugin docs-toolsThis skill is limited to using the following tools:
This skill extracts distinct article URLs from Red Hat documentation table of contents (TOC) pages. It parses the navigation element to find all separate documentation articles, making it easy to process or download entire documentation sections.
Downloads HTML from websites and extracts article content from <article> tags like aria-live='polite', removing bloat. Outputs clean HTML, Markdown, or text for documentation sites.
Manages Claude documentation: local index search/discovery/resolution by keywords/tags/natural language, sitemap scraping, metadata/alias handling, drift detection.
Audits ToolUniverse docs for wrong, outdated, or redundant info; fixes or removes via static scans, live code execution, validation, audit, and simplification phases. Use when reviewing docs, before releases, or after API changes.
Share bugs, ideas, or general feedback.
This skill extracts distinct article URLs from Red Hat documentation table of contents (TOC) pages. It parses the navigation element to find all separate documentation articles, making it easy to process or download entire documentation sections.
<nav id="toc"> elementsThe skill uses a Python script that downloads and parses Red Hat docs pages.
Extract article URLs from a docs index:
python3 scripts/toc_extractor.py --url "https://docs.redhat.com/en/documentation/product/version/html/guide/index"
Save to file:
python3 scripts/toc_extractor.py --url "https://docs.redhat.com/..." --output articles.json
List format output:
python3 scripts/toc_extractor.py --url "https://docs.redhat.com/..." --format list
--url URL: The Red Hat docs URL to extract TOC from (required)--output FILE: Save output to file instead of stdout--format {json,list}: Output format (default: json)JSON (default):
{
"source_url": "https://docs.redhat.com/...",
"article_count": 3,
"articles": [
"https://docs.redhat.com/.../article1",
"https://docs.redhat.com/.../article2",
"https://docs.redhat.com/.../article3"
]
}
List:
https://docs.redhat.com/.../article1
https://docs.redhat.com/.../article2
https://docs.redhat.com/.../article3
# Extract all articles from the Configure guide
python3 scripts/toc_extractor.py \
--url "https://docs.redhat.com/en/documentation/red_hat_openshift_lightspeed/1.0/html/configure/index"
Output:
{
"source_url": "https://docs.redhat.com/en/documentation/red_hat_openshift_lightspeed/1.0/html/configure/index",
"article_count": 3,
"articles": [
"https://docs.redhat.com/en/documentation/red_hat_openshift_lightspeed/1.0/html/configure/legal-notice",
"https://docs.redhat.com/en/documentation/red_hat_openshift_lightspeed/1.0/html/configure/ols-configuring-openshift-lightspeed",
"https://docs.redhat.com/en/documentation/red_hat_openshift_lightspeed/1.0/html/configure/olsconfig-api"
]
}
Extract TOC URLs then download each article:
# Step 1: Extract article URLs
python3 scripts/toc_extractor.py \
--url "https://docs.redhat.com/.../configure/index" \
--format list > /tmp/articles.txt
# Step 2: Download each article
while read url; do
filename=$(basename "$url").md
python3 scripts/article_extractor.py \
--url "$url" \
--format markdown \
--output "$filename"
done < /tmp/articles.txt
This skill requires the following Python packages:
requests: For downloading HTML contentbeautifulsoup4: For parsing HTMLInstall dependencies:
python3 -m pip install requests beautifulsoup4
<nav id="toc" class="table-of-contents"> element<a> tags within the TOC#section)page#section → page)<nav id="toc"> structureRed Hat documentation uses a consistent TOC structure:
<nav id="toc" class="table-of-contents" aria-label="Table of contents">
<ol id="toc-list">
<li class="item chapter">
<a class="link" href="/path/to/article">Article Title</a>
</li>
<li class="item chapter">
<details>
<summary>Chapter Title</summary>
<ol class="sub-nav">
<li class="item sub-chapter">
<a class="link" href="/path/to/subarticle">Subarticle</a>
</li>
</ol>
</details>
</li>
</ol>
</nav>
The script extracts all href attributes from links and filters for distinct article pages.