Skill

firecrawl-core-workflow-b

Executes Firecrawl workflows for LLM-powered structured data extraction from web pages using JSON schemas or prompts, batch scraping multiple URLs, and site mapping.

Javascript

Typescript

Node

data-engineering

ai-ml

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin firecrawl-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(npm:*)Grep

Preview

Secondary workflow complementing the scrape/crawl workflow. Covers LLM-powered structured data extraction with JSON schemas, batch scraping multiple known URLs, and rapid site map discovery. Use this when you need typed data rather than raw markdown.

SKILL.md

Similar Skills

Firecrawl Automation

Automate web scraping of single pages, site crawling, structured data extraction, and batch URL processing using Firecrawl via Composio integration. Ideal for gathering web data in terminal workflows.

superpowers

firecrawl-hello-world

1.9k

Provides minimal TypeScript examples for Firecrawl: scrape pages to markdown, crawl sites, map URLs, extract structured data via LLM. Use for quick starts or API testing.

3 tools

firecrawl-pack

Firecrawl Web Scraper Skill

Scrapes single pages or crawls sites using Firecrawl v2.5 API to LLM-ready markdown and structured data. Handles JS rendering, bot bypass, browser automation for dynamic content extraction.

6 files

firecrawl-scraper

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Firecrawl Core Workflow B — Extract, Batch & Map

Overview

Prerequisites

@mendable/firecrawl-js installed
FIRECRAWL_API_KEY environment variable set
Understanding of JSON Schema (for extract)

Instructions

Step 1: LLM Extract — Structured Data from Pages

import FirecrawlApp from "@mendable/firecrawl-js";

const firecrawl = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY!,
});

// Extract structured data using an LLM + JSON schema
const result = await firecrawl.scrapeUrl("https://firecrawl.dev/pricing", {
  formats: ["extract"],
  extract: {
    schema: {
      type: "object",
      properties: {
        plans: {
          type: "array",
          items: {
            type: "object",
            properties: {
              name: { type: "string" },
              price: { type: "string" },
              credits_per_month: { type: "number" },
              features: { type: "array", items: { type: "string" } },
            },
            required: ["name", "price"],
          },
        },
      },
    },
  },
});

console.log("Extracted plans:", JSON.stringify(result.extract, null, 2));

Step 2: Extract with Prompt (No Schema)

// Use natural language prompt instead of rigid schema
const result = await firecrawl.scrapeUrl("https://news.ycombinator.com", {
  formats: ["extract"],
  extract: {
    prompt: "Extract the top 5 stories with their title, URL, points, and comment count",
  },
});

console.log(result.extract);

Step 3: Batch Scrape Known URLs

// Scrape multiple specific URLs at once — more efficient than individual calls
const batchResult = await firecrawl.batchScrapeUrls(
  [
    "https://docs.firecrawl.dev/features/scrape",
    "https://docs.firecrawl.dev/features/crawl",
    "https://docs.firecrawl.dev/features/extract",
    "https://docs.firecrawl.dev/features/map",
  ],
  {
    formats: ["markdown"],
    onlyMainContent: true,
  }
);

for (const page of batchResult.data || []) {
  console.log(`${page.metadata?.title}: ${page.markdown?.length} chars`);
}

Step 4: Async Batch Scrape (Large Sets)

// Start async batch scrape for many URLs — returns job ID
const job = await firecrawl.asyncBatchScrapeUrls(
  urls,  // array of 100+ URLs
  { formats: ["markdown"] }
);

// Poll for completion
let status = await firecrawl.checkBatchScrapeStatus(job.id);
while (status.status !== "completed") {
  await new Promise(r => setTimeout(r, 5000));
  status = await firecrawl.checkBatchScrapeStatus(job.id);
}

console.log(`Batch complete: ${status.data?.length} pages`);

Step 5: Map — Rapid URL Discovery

// Discover all URLs on a site in ~2-3 seconds
// Uses sitemap.xml + SERP + cached crawl data
const mapResult = await firecrawl.mapUrl("https://docs.firecrawl.dev");

const urls = mapResult.links || [];
console.log(`Discovered ${urls.length} URLs`);

// Categorize by section
const sections = {
  docs: urls.filter(u => u.includes("/docs/")),
  api: urls.filter(u => u.includes("/api-reference/")),
  features: urls.filter(u => u.includes("/features/")),
  other: urls.filter(u => !u.includes("/docs/") && !u.includes("/api-reference/")),
};

Object.entries(sections).forEach(([name, list]) => {
  console.log(`  ${name}: ${list.length} URLs`);
});

Step 6: Map + Selective Scrape Pipeline

// 1. Map to discover URLs, 2. Filter, 3. Batch scrape relevant ones
async function intelligentScrape(siteUrl: string, pathFilter: string) {
  const map = await firecrawl.mapUrl(siteUrl);
  const relevant = (map.links || []).filter(url => url.includes(pathFilter));

  console.log(`Map found ${map.links?.length} URLs, ${relevant.length} match filter`);

  if (relevant.length === 0) return [];
  if (relevant.length <= 10) {
    return firecrawl.batchScrapeUrls(relevant, { formats: ["markdown"] });
  }

  // For large sets, use async batch
  const job = await firecrawl.asyncBatchScrapeUrls(relevant.slice(0, 100), {
    formats: ["markdown"],
  });
  // ...poll for completion
  return job;
}

await intelligentScrape("https://docs.firecrawl.dev", "/features/");

Output

Typed JSON objects extracted from web pages
Batch scrape results for multiple URLs
Complete site URL map for discovery
Filtered scrape pipeline combining map + batch

Error Handling

Error	Cause	Solution
Empty `extract`	Page content too complex for LLM	Simplify schema, shorten prompt
Inconsistent extraction	Prompt too long	Keep prompts short and focused
Batch scrape timeout	Too many URLs	Use async batch with polling
Map returns few URLs	Site has no sitemap.xml	Use `crawlUrl` for thorough discovery
`402 Payment Required`	Credits exhausted	Reduce batch size, check balance

Examples

Extract Products from E-Commerce

const products = await firecrawl.scrapeUrl("https://store.example.com/products", {
  formats: ["extract"],
  extract: {
    schema: {
      type: "object",
      properties: {
        products: {
          type: "array",
          items: {
            type: "object",
            properties: {
              name: { type: "string" },
              price: { type: "number" },
              availability: { type: "string" },
            },
            required: ["name", "price"],
          },
        },
      },
    },
  },
});

Resources

Next Steps

For common errors, see firecrawl-common-errors.