Skill

scrape

npx claudepluginhub roxabi/roxabi-plugins --plugin web-intel

Tool Access

This skill is limited to using the following tools:

BashRead

Preview

Extract structured content from a URL → JSON with content, metadata, and platform-specific fields.

SKILL.md

Similar Skills

webpage-reader

Extracts clean Markdown from web pages by stripping navigation, ads, sidebars, footers, and boilerplate using Defuddle. Use for reading docs, articles, blog posts, research papers, or release notes.

3 files

claude-superskills

tooyoung:ink-reader

Reads public URLs into clean Markdown with platform-aware fallback strategies for WeChat, Zhihu, Bilibili, X/Twitter, and generic sites using Jina Reader, WebFetch, Playwright.

oh-my-daily-skills

citedy-content-ingestion

Ingests URLs to extract structured content from YouTube videos (via Gemini Video API), web articles, PDFs, and audio files, providing transcripts, summaries, and metadata for LLM pipelines.

1 file

citedy-seo-agent

Stats

Parent Repo Stars6

Parent Repo Forks1

Last CommitApr 2, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Scrape

Extract structured content from a URL → JSON with content, metadata, and platform-specific fields.

Entry

/scrape https://example.com

¬U → → DP(B)to get one.

Step 1 — Locate Plugin

PLUGIN_ROOT=$(find ~/projects -maxdepth 4 -path "*/web-intel/pyproject.toml" -print -quit 2>/dev/null | xargs dirname) if [ -z "$PLUGIN_ROOT" ]; then echo "ERROR: web-intel plugin not found. Install: claude plugin install web-intel" exit 1 fi

First Use

First invocation in session only:

cd "$PLUGIN_ROOT" && uv run python scripts/doctor.py

exit 1 → show output, stop, guide install. Optional warnings → inform user, continue.

Skip on subsequent invocations.

Step 2 — Run Scraper

cd "$PLUGIN_ROOT" && SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt uv run python scripts/scraper.py "$URL"

Step 3 — Present Results

Parse JSON → clean summary: content type (twitter|github|youtube|reddit|webpage), title, author (if ∃), content preview (first 500 chars), platform-specific metadata (stars, score, view_count, like_count, upload_date, tags, chapters, transcript, etc.). Show full JSON in <details> block.

Error Handling

success: false → show error, suggest alternatives (WebFetch, manual input)

Scraper ¬installed → cd $PLUGIN_ROOT && uv sync --extra all

$ARGUMENTS

Scrape

Extract structured content from a URL → JSON with content, metadata, and platform-specific fields.

Entry

/scrape https://example.com

¬U → → DP(B)to get one.

Step 1 — Locate Plugin

PLUGIN_ROOT=$(find ~/projects -maxdepth 4 -path "*/web-intel/pyproject.toml" -print -quit 2>/dev/null | xargs dirname)
if [ -z "$PLUGIN_ROOT" ]; then
  echo "ERROR: web-intel plugin not found. Install: claude plugin install web-intel"
  exit 1
fi

First Use

First invocation in session only:

cd "$PLUGIN_ROOT" && uv run python scripts/doctor.py

exit 1 → show output, stop, guide install. Optional warnings → inform user, continue.
Skip on subsequent invocations.

Step 2 — Run Scraper

cd "$PLUGIN_ROOT" && SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt uv run python scripts/scraper.py "$URL"

Step 3 — Present Results

Error Handling

success: false → show error, suggest alternatives (WebFetch, manual input)
Scraper ¬installed → cd $PLUGIN_ROOT && uv sync --extra all

$ARGUMENTS