From newsroom
Web scraping guide for sub-agents. Covers Firecrawl CLI fallback scraping when WebFetch fails (JS-heavy sites, anti-bot walls, 403 errors, empty content) and advanced capabilities like structured data extraction with Zod schemas, multi-page crawls, and search-plus-scrape. Use when WebFetch returns garbage or empty pages, when you need typed data from a page (prices, features, specs), or when you need to ingest multiple pages from a site.
npx claudepluginhub nathanvale/side-quest-plugins --plugin newsroomThis skill uses the workspace's default tool permissions.
**Required tools for consuming agents**: WebFetch, Bash(bunx firecrawl-cli *), Read
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Automates semantic versioning and release workflow for Claude Code plugins: bumps versions in package.json, marketplace.json, plugin.json; verifies builds; creates git tags, GitHub releases, changelogs.
Required tools for consuming agents: WebFetch, Bash(bunx firecrawl-cli *), Read
Integration: Any newsroom sub-agent should consult this skill when WebFetch fails or when structured/multi-page scraping is needed.
| Need | Tool | Details |
|---|---|---|
| Page content as markdown | WebFetch first, then Firecrawl CLI | See below |
| Structured data from a page (prices, features, specs) | Firecrawl extract | Read references/structured-extraction.md |
| Multiple pages from one site | Firecrawl crawl | Read references/crawling.md |
| Search the web + scrape results | Firecrawl search | Read references/crawling.md |
WebFetch is free, fast, and already available. Use it by default.
Works for: blogs, news articles, documentation, static pages, most forum threads.
Switch to Firecrawl CLI when WebFetch returns:
Do NOT retry WebFetch on the same URL -- it will fail again.
Requires: firecrawl-cli (install: npm install -g firecrawl-cli or use via bunx firecrawl-cli). Authenticates via FIRECRAWL_API_KEY env var or firecrawl auth --api-key <key>.
If firecrawl-cli is not installed or FIRECRAWL_API_KEY is unset, skip to Step 4 (Report Gaps). Do not retry or attempt workarounds.
Output to stdout (default -- pipe or capture as needed):
bunx firecrawl-cli scrape "<url>"
Output to file (more token-efficient -- read from disk instead of context):
bunx firecrawl-cli scrape "<url>" -o /tmp/scrape-output.md
Then use the Read tool on /tmp/scrape-output.md to pull only what you need into context.
Handles: JS rendering, dynamic content, basic anti-bot bypass, clean Markdown output (strips nav, headers, footers with --only-main-content).
Does NOT handle: login-gated content, CAPTCHAs, form filling, aggressive Cloudflare Turnstile.
For multiple URLs, scrape each separately to different files:
bunx firecrawl-cli scrape "<url1>" -o /tmp/scrape-1.md
bunx firecrawl-cli scrape "<url2>" -o /tmp/scrape-2.md
The CLI is beta (released Jan 2026) -- expect quirks and flag changes. Run bunx firecrawl-cli scrape --help for current options.
If both WebFetch and Firecrawl fail: