From firecrawl-pack
Scrape single pages or crawl sites into LLM-ready markdown via Firecrawl JS library. Handles sync/async jobs, depth limits, path filters, JS rendering.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin firecrawl-packThis skill is limited to using the following tools:
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with `scrapeUrl`, multi-page crawling with `crawlUrl`, async crawl jobs with polling, and content processing pipelines.
Scrapes single pages or crawls sites using Firecrawl v2.5 API to LLM-ready markdown and structured data. Handles JS rendering, bot bypass, browser automation for dynamic content extraction.
Scrapes URLs to markdown/HTML/JSON, crawls websites with BFS, searches web, maps URLs, and extracts structured data via Firecrawl MCP for AI agents.
Provides minimal TypeScript examples for Firecrawl: scrape pages to markdown, crawl sites, map URLs, extract structured data via LLM. Use for quick starts or API testing.
Share bugs, ideas, or general feedback.
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with scrapeUrl, multi-page crawling with crawlUrl, async crawl jobs with polling, and content processing pipelines.
@mendable/firecrawl-js installedFIRECRAWL_API_KEY environment variable setimport FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Scrape a single page to clean markdown
const result = await firecrawl.scrapeUrl("https://docs.example.com/api", {
formats: ["markdown"],
onlyMainContent: true, // strips nav, footer, sidebars
waitFor: 2000, // wait 2s for JS to render
});
if (result.success) {
console.log("Title:", result.metadata?.title);
console.log("Source:", result.metadata?.sourceURL);
console.log("Markdown:", result.markdown?.substring(0, 200));
}
// Crawl a site — Firecrawl follows links, renders JS, returns all pages
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
limit: 50, // max pages to crawl
maxDepth: 3, // link depth from start URL
includePaths: ["/docs/*", "/api/*"], // only these paths
excludePaths: ["/blog/*", "/changelog/*"],
allowBackwardLinks: false, // only crawl child paths
scrapeOptions: {
formats: ["markdown"],
onlyMainContent: true,
},
});
console.log(`Crawled ${crawlResult.data?.length} pages`);
for (const page of crawlResult.data || []) {
console.log(` ${page.metadata?.sourceURL}: ${page.markdown?.length} chars`);
}
// Start an async crawl job — returns immediately with job ID
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
console.log(`Crawl started: ${job.id}`);
// Poll for completion with backoff
let pollInterval = 2000;
let status = await firecrawl.checkCrawlStatus(job.id);
while (status.status === "scraping") {
console.log(`Progress: ${status.completed}/${status.total} pages`);
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000);
status = await firecrawl.checkCrawlStatus(job.id);
}
if (status.status === "completed") {
console.log(`Done: ${status.data?.length} pages scraped`);
} else {
console.error("Crawl failed:", status.error);
}
import { writeFileSync, mkdirSync } from "fs";
function processResults(pages: any[], outputDir: string) {
mkdirSync(outputDir, { recursive: true });
const manifest = pages.map((page, i) => {
const url = page.metadata?.sourceURL || `page-${i}`;
const slug = new URL(url).pathname
.replace(/\//g, "_")
.replace(/^_|_$/g, "") || "index";
const filename = `${slug}.md`;
// Clean markdown: collapse whitespace, remove JS links
const content = (page.markdown || "")
.replace(/\n{3,}/g, "\n\n")
.replace(/\[.*?\]\(javascript:.*?\)/g, "")
.trim();
writeFileSync(`${outputDir}/${filename}`, content);
return { url, filename, chars: content.length };
});
writeFileSync(`${outputDir}/manifest.json`, JSON.stringify(manifest, null, 2));
return manifest;
}
manifest.json with URL-to-file mapping| Error | Cause | Solution |
|---|---|---|
Empty markdown | JS content not rendered | Increase waitFor to 5000ms |
429 Too Many Requests | Rate limit hit | Back off, reduce concurrency |
| Crawl returns few pages | URL filters too strict | Widen includePaths patterns |
402 Payment Required | Credits exhausted | Check balance, reduce limit |
| Partial crawl results | Site blocks bot on some pages | Use scrapeUrl for failed URLs individually |
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown", "html", "links"],
onlyMainContent: true,
});
console.log("Markdown:", result.markdown?.length);
console.log("HTML:", result.html?.length);
console.log("Links:", result.links?.length);
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 100,
scrapeOptions: { formats: ["markdown"] },
webhook: {
url: "https://api.yourapp.com/webhooks/firecrawl",
events: ["completed", "page"],
},
});
console.log(`Crawl ${job.id} started — webhook will fire on completion`);
For structured data extraction, see firecrawl-core-workflow-b.