Skill

firecrawl-known-pitfalls

Identify and avoid FireCrawl anti-patterns and common integration mistakes. Use when reviewing FireCrawl code for issues, onboarding new developers, or auditing existing FireCrawl integrations for best practices violations. Trigger with phrases like "firecrawl mistakes", "firecrawl anti-patterns", "firecrawl pitfalls", "firecrawl what not to do", "firecrawl code review".

From firecrawl-pack

Install

Run in your terminal

npx claudepluginhub nickloveinvesting/nick-love-plugins --plugin firecrawl-pack

Tool Access

This skill is limited to using the following tools:

ReadGrep

Skill Content

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

138.6k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

evaluation-methodology

1 file

Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.

plugin-eval

32.9k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Firecrawl Known Pitfalls

Overview

Real gotchas when using Firecrawl for web scraping and crawling. Firecrawl handles JavaScript rendering and anti-bot bypassing, but its async crawl model and credit-based pricing create specific failure modes.

Prerequisites

Firecrawl API key configured
Understanding of async job patterns
Awareness of credit-based billing model

Instructions

Step 1: Handle Async Crawl Jobs Properly

crawlUrl returns a job ID, not results. Polling too aggressively wastes credits and may trigger rate limits.

import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

// BAD: assuming synchronous results
const result = await firecrawl.crawlUrl('https://example.com');
console.log(result.data); // This is a job object, not page data!

// GOOD: use the async crawl with proper polling
const crawl = await firecrawl.asyncCrawlUrl('https://example.com', {
  limit: 50,
  scrapeOptions: { formats: ['markdown'] }
});
// Poll with backoff
let status;
do {
  await new Promise(r => setTimeout(r, 5000));  # 5000: 5 seconds in ms
  status = await firecrawl.checkCrawlStatus(crawl.id);
} while (status.status === 'scraping');

Step 2: Avoid Credit Burn on Large Sites

Firecrawl charges per page. Crawling without limits on large sites burns credits fast.

// BAD: no limit on a site with 100K pages
await firecrawl.crawlUrl('https://docs.large-project.org');  // burns entire quota

// GOOD: set explicit limits and use URL filters
await firecrawl.crawlUrl('https://docs.large-project.org', {
  limit: 100,
  includePaths: ['/api/*', '/guides/*'],
  excludePaths: ['/changelog/*', '/blog/*'],
  maxDepth: 3
});

Step 3: Don't Assume Markdown Output by Default

Firecrawl can return HTML, markdown, links, or screenshots. Not specifying format returns raw HTML.

// BAD: getting HTML when you wanted clean text
const result = await firecrawl.scrapeUrl('https://example.com');
// result.html exists but result.markdown may be absent

// GOOD: specify output format explicitly
const result = await firecrawl.scrapeUrl('https://example.com', {
  formats: ['markdown', 'links'],
  onlyMainContent: true  // strips nav, footer, sidebars
});
console.log(result.markdown);

Step 4: Handle JavaScript-Heavy Pages

Some SPAs need extra wait time for content to render. Default timeouts may capture loading states.

// BAD: scraping an SPA with default settings
const result = await firecrawl.scrapeUrl('https://app.example.com/dashboard');
// Gets "Loading..." instead of actual content

// GOOD: configure wait time for JS rendering
const result = await firecrawl.scrapeUrl('https://app.example.com/dashboard', {
  waitFor: 5000,  // wait 5s for JS to render  # 5000: 5 seconds in ms
  formats: ['markdown'],
  onlyMainContent: true
});

Step 5: Respect robots.txt and Rate Limits

Firecrawl honors robots.txt by default. Disabling it risks IP bans and legal issues.

// BAD: aggressive crawling that ignores site limits
await firecrawl.crawlUrl('https://example.com', {
  limit: 10000,  # 10000: 10 seconds in ms
  // No delay between requests = potential IP ban
});

// GOOD: respect site constraints
await firecrawl.crawlUrl('https://example.com', {
  limit: 200,  # HTTP 200 OK
  maxDepth: 3,
  // Firecrawl handles rate limiting internally
});

Error Handling

Issue	Cause	Solution
Empty markdown	JS not rendered	Increase `waitFor` timeout
Credit depletion	No crawl limit set	Always set `limit` parameter
402 Payment Required	Out of credits	Check balance before large crawls
Partial crawl results	Site blocks crawler	Use `scrapeUrl` for individual pages
Stale job status	Polling stopped early	Poll until `completed` or `failed`

Examples

Safe Batch Scraping

const urls = ['https://a.com', 'https://b.com', 'https://c.com'];
const results = await firecrawl.batchScrapeUrls(urls, {
  formats: ['markdown'],
  onlyMainContent: true
});

Resources

Output

Configuration files or code changes applied to the project
Validation report confirming correct implementation
Summary of changes made and their rationale