From firecrawl-pack
Optimizes Firecrawl scraping performance with caching, batch endpoints, markdown formats, waitFor tuning, and onlyMainContent. Use for slow scrapes, credit efficiency, or high-throughput pipelines.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin firecrawl-packThis skill is limited to using the following tools:
Optimize Firecrawl API performance by choosing efficient scraping modes, caching results, using batch endpoints, and minimizing unnecessary rendering. Key levers: format selection (markdown vs HTML vs screenshot), `waitFor` tuning, `onlyMainContent`, and batch vs individual scraping.
Optimizes Firecrawl API costs with crawl limits, map-then-scrape, targeted batch scraping, and markdown formats. For billing analysis, cost reduction, and credit budget alerts.
Automate web scraping of single pages, site crawling, structured data extraction, and batch URL processing using Firecrawl via Composio integration. Ideal for gathering web data in terminal workflows.
Scrapes single pages or crawls sites using Firecrawl v2.5 API to LLM-ready markdown and structured data. Handles JS rendering, bot bypass, browser automation for dynamic content extraction.
Share bugs, ideas, or general feedback.
Optimize Firecrawl API performance by choosing efficient scraping modes, caching results, using batch endpoints, and minimizing unnecessary rendering. Key levers: format selection (markdown vs HTML vs screenshot), waitFor tuning, onlyMainContent, and batch vs individual scraping.
| Operation | Typical | With JS Wait | With Screenshot |
|---|---|---|---|
| scrapeUrl (markdown) | 2-5s | 5-10s | 8-15s |
| scrapeUrl (extract) | 3-8s | 8-15s | N/A |
| crawlUrl (10 pages) | 20-40s | 40-80s | N/A |
| mapUrl | 1-3s | N/A | N/A |
| batchScrapeUrls (10) | 10-20s | 20-40s | N/A |
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// SLOW: requesting everything
const slow = await firecrawl.scrapeUrl(url, {
formats: ["markdown", "html", "links", "screenshot"],
// screenshot + full HTML = 3-5x slower
});
// FAST: request only what you need
const fast = await firecrawl.scrapeUrl(url, {
formats: ["markdown"], // markdown only = fastest
onlyMainContent: true, // skip nav/footer/sidebar
});
// Default: no JS wait (fastest, works for static sites)
const staticResult = await firecrawl.scrapeUrl("https://docs.example.com", {
formats: ["markdown"],
// No waitFor needed — content is in initial HTML
});
// SPA/dynamic pages: add minimal wait
const spaResult = await firecrawl.scrapeUrl("https://app.example.com", {
formats: ["markdown"],
waitFor: 3000, // 3s — enough for most SPAs
onlyMainContent: true,
});
// Heavy interactive page: use actions instead of long wait
const heavyResult = await firecrawl.scrapeUrl("https://dashboard.example.com", {
formats: ["markdown"],
actions: [
{ type: "wait", selector: ".data-table" }, // wait for specific element
{ type: "scroll", direction: "down" }, // trigger lazy loading
],
});
import { LRUCache } from "lru-cache";
import { createHash } from "crypto";
const scrapeCache = new LRUCache<string, any>({
max: 500, // max 500 cached pages
ttl: 3600000, // 1 hour TTL
});
async function cachedScrape(url: string) {
const key = createHash("md5").update(url).digest("hex");
const cached = scrapeCache.get(key);
if (cached) {
console.log(`Cache hit: ${url}`);
return cached;
}
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
});
if (result.success) {
scrapeCache.set(key, result);
}
return result;
}
// Typical savings: 50-80% credit reduction for repeated scrapes
// SLOW: sequential individual scrapes
const urls = ["https://a.com", "https://b.com", "https://c.com"];
for (const url of urls) {
await firecrawl.scrapeUrl(url, { formats: ["markdown"] }); // 3 API calls
}
// FAST: single batch scrape call
const batchResult = await firecrawl.batchScrapeUrls(urls, {
formats: ["markdown"],
onlyMainContent: true,
});
// 1 API call, internally parallelized
// EXPENSIVE: crawl everything, filter later
await firecrawl.crawlUrl("https://docs.example.com", { limit: 1000 });
// CHEAPER: map first (1 credit), then scrape only what you need
const map = await firecrawl.mapUrl("https://docs.example.com");
const apiDocs = (map.links || []).filter(url =>
url.includes("/api/") || url.includes("/reference/")
);
console.log(`Map: ${map.links?.length} total, ${apiDocs.length} relevant`);
// Batch scrape only relevant URLs
const result = await firecrawl.batchScrapeUrls(apiDocs.slice(0, 50), {
formats: ["markdown"],
});
async function timedScrape(url: string) {
const start = Date.now();
const result = await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
const duration = Date.now() - start;
console.log({
url,
durationMs: duration,
contentLength: result.markdown?.length || 0,
success: result.success,
charsPerSecond: Math.round((result.markdown?.length || 0) / (duration / 1000)),
});
return result;
}
| Issue | Cause | Solution |
|---|---|---|
| Scrape > 10s | Screenshot or full HTML requested | Use formats: ["markdown"] only |
| Empty content | waitFor too short for SPA | Increase or use actions with selector |
| High credit burn | Scraping same URLs repeatedly | Implement URL-based caching |
| Batch timeout | Too many URLs in one batch | Split into chunks of 50 |
| Cache stale data | TTL too long | Reduce TTL or add cache invalidation |
const url = "https://docs.firecrawl.dev";
// Compare different format configurations
for (const formats of [["markdown"], ["markdown", "html"], ["markdown", "html", "screenshot"]]) {
const start = Date.now();
await firecrawl.scrapeUrl(url, { formats: formats as any, onlyMainContent: true });
console.log(`${formats.join(",")}: ${Date.now() - start}ms`);
}
For cost optimization, see firecrawl-cost-tuning.