From brightdata-plugin
Scrapes web content as clean markdown, HTML, JSON, or screenshots using Bright Data CLI (bdata scrape). For single pages, URL lists, paginated listings; verifies CLI setup and hands off to specialized skills.
npx claudepluginhub brightdata/skills --plugin brightdata-pluginThis skill uses the workspace's default tool permissions.
Get clean content (markdown, HTML, JSON, screenshot) from one or more URLs via the Bright Data CLI. This skill owns the "fetch raw or lightly-structured content" job. For platform-specific structured data (Amazon, LinkedIn, TikTok, etc.), **stop and use `data-feeds` instead** — you'll get clean JSON without selector logic.
Scrapes any webpage as markdown bypassing bot detection/CAPTCHA or searches Google with structured JSON results via Bright Data API. Requires API key and Unlocker zone.
Scrapes single pages or crawls sites using Firecrawl v2.5 API to LLM-ready markdown and structured data. Handles JS rendering, bot bypass, browser automation for dynamic content extraction.
Guides Bright Data CLI usage for scraping URLs, web searches on Google/Bing/Yandex, structured data extraction from 40+ platforms like Amazon/LinkedIn/Instagram, proxy zone management, and account budget checks.
Share bugs, ideas, or general feedback.
Get clean content (markdown, HTML, JSON, screenshot) from one or more URLs via the Bright Data CLI. This skill owns the "fetch raw or lightly-structured content" job. For platform-specific structured data (Amazon, LinkedIn, TikTok, etc.), stop and use data-feeds instead — you'll get clean JSON without selector logic.
Before any scrape, verify the CLI is installed and authenticated:
if ! command -v bdata >/dev/null 2>&1; then
echo "bdata CLI not installed — see bright-data-best-practices/references/cli-setup.md"
elif ! bdata zones >/dev/null 2>&1; then
echo "bdata not authenticated — run: bdata login (or: bdata login --device for SSH)"
fi
If either check fails, halt and route the user to skills/bright-data-best-practices/references/cli-setup.md. Do not attempt the legacy curl fallback silently — ask the user first.
| Situation | Action |
|---|---|
| Single URL | bdata scrape <url> -f markdown |
| Small list (≤ ~20 URLs) | shell loop, 1 at a time (see references/patterns.md) |
| Larger list (dozens+) | xargs -P 4 with parallelism cap (see references/patterns.md) |
| Paginated listing | scrape page 1 → extract next-page URL → append → repeat (see references/examples.md) |
| JS-heavy / login-gated / interaction-required | escalate to bdata browser (see brightdata-cli skill) |
| Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, … | stop — hand off to data-feeds |
| No URL yet, just a topic | hand off to search |
Core commands:
# Clean markdown (default)
bdata scrape "https://example.com/article" -f markdown -o article.md
# Raw HTML (when you need the DOM)
bdata scrape "https://example.com" -f html -o page.html
# Structured JSON (when the Unlocker returns parsed fields)
bdata scrape "https://example.com" -f json --pretty -o page.json
# Visual snapshot (saves PNG)
bdata scrape "https://example.com" -f screenshot -o page.png
# Geo-targeted (override the exit country)
bdata scrape "https://example.com" --country de -f markdown
Full flag reference: references/flags.md.
test -s "$out_path" — or, for stdout, at least 200 bytes of content.Access DeniedJust a momentAttention RequiredChecking your browsercaptchacf-browser-verificationcloudflare (with < 2KB total body)\$\d); an article should contain at least one <h1> or # heading.--country (e.g., --country de if the origin site is US)bdata browser for full JS rendering (hand off to brightdata-cli skill)Do not report success until all checks above pass.
2>/dev/null — you'll miss auth failures and rate-limit errors.bdata scrape on Amazon/LinkedIn/TikTok/Instagram/YouTube/Reddit URLs — these are supported by data-feeds and return structured data directly. Scraping loses the structure.bdata scrape sequentially for large lists instead of using xargs -P 4 (or similar) with a parallelism cap.curl against api.brightdata.com directly — legacy path; only when the CLI isn't available.references/flags.md — every flag with when-to-use notes.references/patterns.md — shell-loop batching, xargs parallelism, pagination recipe, retry/backoff, block-page recovery chain, legacy curl fallback.references/examples.md — (1) single page → markdown, (2) batch a list of URLs with parallelism cap, (3) paginated listing, (4) block-page recovery.