From firecrawl-pack
Migrates web scraping from Puppeteer, Playwright, Cheerio to Firecrawl API using code examples, comparisons, and lib detection.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin firecrawl-packThis skill is limited to using the following tools:
!`npm list puppeteer playwright cheerio 2>/dev/null | grep -E "puppeteer|playwright|cheerio" || echo 'No scraping libs found'`
Identifies Firecrawl pitfalls like unbounded crawls, missing formats, JS waits, wrong imports, and polling errors in code reviews, audits, and integrations.
Builds production-ready web scrapers for any website using Bright Data APIs including Web Unlocker, Browser, and SERP. Guides site analysis, selector extraction, pagination handling, and code implementation in Python or Node.js.
Scrapes single pages or crawls sites using Firecrawl v2.5 API to LLM-ready markdown and structured data. Handles JS rendering, bot bypass, browser automation for dynamic content extraction.
Share bugs, ideas, or general feedback.
!npm list puppeteer playwright cheerio 2>/dev/null | grep -E "puppeteer|playwright|cheerio" || echo 'No scraping libs found'
Migrate from custom scraping (Puppeteer, Playwright, Cheerio) or competing APIs to Firecrawl. Firecrawl eliminates browser management, anti-bot handling, and JS rendering infrastructure. This skill shows equivalent code for common scraping patterns.
| Feature | Puppeteer/Playwright | Cheerio | Firecrawl |
|---|---|---|---|
| JS rendering | Manual browser | No | Automatic |
| Anti-bot bypass | DIY (stealth plugin) | No | Built-in |
| Output format | Raw HTML | Parsed HTML | Markdown/JSON/HTML |
| Infrastructure | Browser instances | None | API call |
| Concurrent scraping | Manage browser pool | Simple | Managed by Firecrawl |
| Cost model | Compute (CPU/RAM) | Free | Credits per page |
// BEFORE: Puppeteer (20+ lines, browser management)
import puppeteer from "puppeteer";
async function scrapePuppeteer(url: string) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: "networkidle2" });
const html = await page.content();
const title = await page.title();
await browser.close();
return { html, title };
}
// AFTER: Firecrawl (5 lines, no browser needed)
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
async function scrapeFirecrawl(url: string) {
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
waitFor: 2000,
});
return { markdown: result.markdown, title: result.metadata?.title };
}
// BEFORE: fetch + cheerio (manual parsing)
import * as cheerio from "cheerio";
async function scrapeCheerio(url: string) {
const html = await fetch(url).then(r => r.text());
const $ = cheerio.load(html);
return {
title: $("h1").first().text(),
content: $("main").text(),
links: $("a").map((_, el) => $(el).attr("href")).get(),
};
}
// AFTER: Firecrawl with extract (LLM-powered, no CSS selectors)
async function extractFirecrawl(url: string) {
const result = await firecrawl.scrapeUrl(url, {
formats: ["extract", "links"],
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
content: { type: "string" },
},
},
},
});
return {
title: result.extract?.title,
content: result.extract?.content,
links: result.links,
};
}
// BEFORE: Playwright crawler (100+ lines, queue, browser pool)
// - launch browser pool
// - manage visited URLs set
// - extract links, enqueue
// - handle errors per page
// - close browsers on exit
// AFTER: Firecrawl crawl (10 lines)
async function crawlSite(baseUrl: string) {
const result = await firecrawl.crawlUrl(baseUrl, {
limit: 100,
maxDepth: 3,
includePaths: ["/docs/*", "/api/*"],
excludePaths: ["/blog/*"],
scrapeOptions: {
formats: ["markdown"],
onlyMainContent: true,
},
});
return result.data?.map(page => ({
url: page.metadata?.sourceURL,
title: page.metadata?.title,
content: page.markdown,
}));
}
// Adapter interface for gradual migration
interface ScrapeAdapter {
scrape(url: string): Promise<{ title: string; content: string }>;
crawl(url: string, maxPages: number): Promise<Array<{ url: string; content: string }>>;
}
class FirecrawlAdapter implements ScrapeAdapter {
private client: FirecrawlApp;
constructor() {
this.client = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
}
async scrape(url: string) {
const result = await this.client.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
});
return {
title: result.metadata?.title || "",
content: result.markdown || "",
};
}
async crawl(url: string, maxPages: number) {
const result = await this.client.crawlUrl(url, {
limit: maxPages,
scrapeOptions: { formats: ["markdown"], onlyMainContent: true },
});
return (result.data || []).map(page => ({
url: page.metadata?.sourceURL || url,
content: page.markdown || "",
}));
}
}
// Feature flag controlled migration
function getScrapeAdapter(): ScrapeAdapter {
if (process.env.USE_FIRECRAWL === "true") {
return new FirecrawlAdapter();
}
return new LegacyPuppeteerAdapter();
}
set -euo pipefail
# After migration is complete and verified
npm uninstall puppeteer puppeteer-core
npm uninstall playwright @playwright/test
npm uninstall cheerio
# Remove browser downloads
npx playwright uninstall --all 2>/dev/null || true
# Verify no lingering references
grep -r "puppeteer\|playwright\|cheerio" src/ --include="*.ts" || echo "Clean!"
@mendable/firecrawl-jsscrapeUrlcrawlUrlextract or markdown| Issue | Cause | Solution |
|---|---|---|
| Different output format | Puppeteer returns HTML, Firecrawl markdown | Adjust downstream consumers |
| Missing CSS selector data | Firecrawl doesn't use selectors | Use extract with JSON schema |
| Higher latency for single pages | API call vs local browser | Acceptable trade-off for zero infra |
| Content differences | Different JS wait timing | Tune waitFor parameter |
For advanced troubleshooting, see firecrawl-advanced-troubleshooting.