From firecrawl-pack
Execute Firecrawl primary workflow: scrape and crawl websites into LLM-ready markdown. Use when scraping single pages, crawling entire sites, or building content ingestion pipelines with Firecrawl's scrapeUrl and crawlUrl methods. Trigger with phrases like "firecrawl scrape", "firecrawl crawl site", "scrape page to markdown", "crawl documentation".
npx claudepluginhub flight505/skill-forge --plugin firecrawl-packThis skill is limited to using the following tools:
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with `scrapeUrl`, multi-page crawling with `crawlUrl`, async crawl jobs with polling, and content processing pipelines.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with scrapeUrl, multi-page crawling with crawlUrl, async crawl jobs with polling, and content processing pipelines.
@mendable/firecrawl-js installedFIRECRAWL_API_KEY environment variable setimport FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Scrape a single page to clean markdown
const result = await firecrawl.scrapeUrl("https://docs.example.com/api", {
formats: ["markdown"],
onlyMainContent: true, // strips nav, footer, sidebars
waitFor: 2000, // wait 2s for JS to render
});
if (result.success) {
console.log("Title:", result.metadata?.title);
console.log("Source:", result.metadata?.sourceURL);
console.log("Markdown:", result.markdown?.substring(0, 200));
}
// Crawl a site — Firecrawl follows links, renders JS, returns all pages
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
limit: 50, // max pages to crawl
maxDepth: 3, // link depth from start URL
includePaths: ["/docs/*", "/api/*"], // only these paths
excludePaths: ["/blog/*", "/changelog/*"],
allowBackwardLinks: false, // only crawl child paths
scrapeOptions: {
formats: ["markdown"],
onlyMainContent: true,
},
});
console.log(`Crawled ${crawlResult.data?.length} pages`);
for (const page of crawlResult.data || []) {
console.log(` ${page.metadata?.sourceURL}: ${page.markdown?.length} chars`);
}
// Start an async crawl job — returns immediately with job ID
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
console.log(`Crawl started: ${job.id}`);
// Poll for completion with backoff
let pollInterval = 2000;
let status = await firecrawl.checkCrawlStatus(job.id);
while (status.status === "scraping") {
console.log(`Progress: ${status.completed}/${status.total} pages`);
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000);
status = await firecrawl.checkCrawlStatus(job.id);
}
if (status.status === "completed") {
console.log(`Done: ${status.data?.length} pages scraped`);
} else {
console.error("Crawl failed:", status.error);
}
import { writeFileSync, mkdirSync } from "fs";
function processResults(pages: any[], outputDir: string) {
mkdirSync(outputDir, { recursive: true });
const manifest = pages.map((page, i) => {
const url = page.metadata?.sourceURL || `page-${i}`;
const slug = new URL(url).pathname
.replace(/\//g, "_")
.replace(/^_|_$/g, "") || "index";
const filename = `${slug}.md`;
// Clean markdown: collapse whitespace, remove JS links
const content = (page.markdown || "")
.replace(/\n{3,}/g, "\n\n")
.replace(/\[.*?\]\(javascript:.*?\)/g, "")
.trim();
writeFileSync(`${outputDir}/${filename}`, content);
return { url, filename, chars: content.length };
});
writeFileSync(`${outputDir}/manifest.json`, JSON.stringify(manifest, null, 2));
return manifest;
}
manifest.json with URL-to-file mapping| Error | Cause | Solution |
|---|---|---|
Empty markdown | JS content not rendered | Increase waitFor to 5000ms |
429 Too Many Requests | Rate limit hit | Back off, reduce concurrency |
| Crawl returns few pages | URL filters too strict | Widen includePaths patterns |
402 Payment Required | Credits exhausted | Check balance, reduce limit |
| Partial crawl results | Site blocks bot on some pages | Use scrapeUrl for failed URLs individually |
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown", "html", "links"],
onlyMainContent: true,
});
console.log("Markdown:", result.markdown?.length);
console.log("HTML:", result.html?.length);
console.log("Links:", result.links?.length);
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 100,
scrapeOptions: { formats: ["markdown"] },
webhook: {
url: "https://api.yourapp.com/webhooks/firecrawl",
events: ["completed", "page"],
},
});
console.log(`Crawl ${job.id} started — webhook will fire on completion`);
For structured data extraction, see firecrawl-core-workflow-b.