Skill

doc-site-analysis

Systematic approach for analyzing API documentation site structure, discovering HTML patterns, framework signatures, and extraction strategies to inform scraper code generation.

npx claudepluginhub grailautomation/claude-plugins --plugin scraper-generator

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/scraper-generator:doc-site-analysis

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

When analyzing an API documentation page to understand its structure and extraction patterns, follow this systematic approach.

Supporting Files

examples/workato-analysis.mdreferences/extraction-strategies.mdreferences/framework-signatures.mdreferences/html-patterns.md

SKILL.md

170 lines · ~1.5k tokens

Similar Skills

doc-search

Crawls web documentation into context for up-to-date coding assistance. Quick mode fetches 1-15 pages; deep mode indexes 20-100 pages for search and selective retrieval.

indexandria

docs-to-skill

Ingests documentation site URLs, discovers pages via sitemap or nav crawl, extracts markdown, and generates Claude Code skill packages with SKILL.md indexes and references.

9 tools

docs-to-skill

page-analysis

234

Analyzes web page content, structure, layout, metadata, and detects frameworks like React, Next.js, Vue, Angular, Svelte using browser automation.

1 tool

openbrowser

Stats

LanguageTypeScript

Parent stars0

MaintenanceGood

Last CommitMar 3, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Doc Site Analysis Skill

When analyzing an API documentation page to understand its structure and extraction patterns, follow this systematic approach.

Core Objective

Your goal is to discover how a documentation site organizes its API reference content so you can later write code to extract that content deterministically. You're not extracting data now—you're discovering the patterns that will inform scraper code generation.

Phase 1: Initial Reconnaissance

Fetch the target URL and observe the raw HTML structure. Look for:

Page framework signatures - Most doc sites use static site generators with distinctive patterns
Content organization - How is the page divided into sections?
Navigation structure - Sidebar, breadcrumbs, table of contents
Endpoint listing patterns - Tables, lists, or inline definitions

Document your observations before proceeding. The goal is to understand the "shape" of the documentation.

Phase 2: Identify the Index Pattern

API documentation typically provides an index of available endpoints. Common patterns:

Quick Reference Tables

Look for tables with columns like Method/Resource/Description or similar. These are gold—they give you a complete list of endpoints in structured form.

<table>
  <tr><th>Type</th><th>Resource</th><th>Description</th></tr>
  <tr>
    <td>GET</td>
    <td><a href="#get-connection">connections/:connection_id</a></td>
    <td>Get connection details</td>
  </tr>
</table>

Key signals:

Table has HTTP method column (GET, POST, PUT, DELETE, PATCH)
Links contain anchor fragments (#heading-id) pointing to detail sections
Resource column contains URL paths with parameters

Sidebar Navigation

Some sites list endpoints in the sidebar. Look for:

Nested lists under section headings
Links to anchors or separate pages
HTTP method badges or prefixes

Section Headings as Index

When no explicit index exists, the headings themselves form the index:

H2/H3 elements containing method + path patterns
Pattern: "GET /api/users" or "List Users (GET)"

Phase 3: Map Section Boundaries

Once you know where endpoints are listed, understand how detail sections are organized:

Heading-based sections - Each endpoint gets an H2/H3, content until next heading belongs to it
Container-based sections - Each endpoint wrapped in a div with class or ID
Flat structure - All content flows sequentially, headings are the only markers

For heading-based sections (most common), note:

The heading level used (H2, H3)
Whether headings have IDs (essential for anchor linking)
The pattern in heading text (method first? path first? description?)

Phase 4: Locate Content Elements

Within each endpoint section, identify where to find:

Parameters

Usually in tables with columns: Name, Type, Required, Description

<table>
  <tr><th>Name</th><th>Type</th><th>Required</th><th>Description</th></tr>
  <tr><td>id</td><td>integer</td><td>yes</td><td>User ID</td></tr>
</table>

Look for section subheadings like "Request Parameters", "Query Parameters", "Body Parameters"

Request/Response Examples

Code blocks with language hints:

<pre><code class="language-json">{"name": "example"}</code></pre>
<pre><code class="language-curl">curl -X GET ...</code></pre>

Or syntax-highlighted divs:

<div class="highlight-json"><pre>...</pre></div>

Descriptions

Prose paragraphs between the heading and first table/code block. May contain important context about authentication, rate limits, or special behavior.

Phase 5: Detect Edge Cases

Real documentation has inconsistencies. Look for:

Broken anchor links - Index links that don't match heading IDs
Multiple table formats - Different pages may use different table structures
Nested content - Examples inside collapsible sections or tabs
Generated IDs - Headings without explicit IDs, where browser generates them from text

Document any patterns that differ from the main structure.

Output: Site Analysis Document

After analysis, produce a structured document containing:

site:
  name: "Workato API Documentation"
  base_url: "https://docs.workato.com"
  framework: "VuePress"  # or Docusaurus, ReadMe, custom, unknown

index_pattern:
  type: "quick_reference_table"  # or sidebar, headings, list
  location: "top of page"
  columns: ["Type", "Resource", "Description"]
  link_column: "Resource"
  anchor_format: "#heading-id"

section_pattern:
  type: "heading_based"
  heading_level: "h2"
  id_source: "explicit"  # or generated
  text_format: "{description}"  # e.g., "Get connection details"

content_elements:
  parameters:
    type: "table"
    columns: ["Name", "Type", "Required", "Description"]

  request_examples:
    type: "code_block"
    languages: ["curl", "json"]
    container: "pre > code"

  response_examples:
    type: "code_block"
    languages: ["json"]
    container: "div.highlight-json pre"

edge_cases:
  - "First endpoint in some pages has broken anchor link"
  - "Some tables use 'Required?' instead of 'Required'"

This structured output becomes the input for scraper code generation.

Reference Materials

For detailed information on specific topics:

references/html-patterns.md - Common HTML structures for tables, code blocks, navigation
references/framework-signatures.md - How to identify VuePress, Docusaurus, ReadMe, etc.
references/extraction-strategies.md - Parsing approaches for different patterns

For a worked example:

examples/workato-analysis.md - Complete analysis of Workato API documentation

doc-site-analysis

Invocation

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

doc-site-analysis

Invocation

Context Preview

Supporting Files

SKILL.md

Doc Site Analysis Skill

Core Objective

Phase 1: Initial Reconnaissance

Phase 2: Identify the Index Pattern

Quick Reference Tables

Sidebar Navigation

Section Headings as Index

Phase 3: Map Section Boundaries

Phase 4: Locate Content Elements

Parameters

Request/Response Examples

Descriptions

Phase 5: Detect Edge Cases

Output: Site Analysis Document

Reference Materials

Similar Skills

Help us improve

Doc Site Analysis Skill

Core Objective

Phase 1: Initial Reconnaissance

Phase 2: Identify the Index Pattern

Quick Reference Tables

Sidebar Navigation

Section Headings as Index

Phase 3: Map Section Boundaries

Phase 4: Locate Content Elements

Parameters

Request/Response Examples

Descriptions

Phase 5: Detect Edge Cases

Output: Site Analysis Document

Reference Materials