Skill

firecrawl

From craft-workspace-webconsulting-skills

Searches, scrapes, crawls, and interacts with web pages via the Firecrawl CLI, returning clean markdown for LLM context. Use for web research, content extraction, documentation downloads, and page interaction.

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/craft-workspace-webconsulting-skills:firecrawl

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Search, scrape, and interact with the web. Returns clean markdown optimized for LLM context windows.

Supporting Files

rules/install.mdrules/security.md

SKILL.md

309 lines · ~4.2k tokens

Stats

LanguageShell

Stars31

Forks6

MaintenanceExcellent

Last CommitJun 13, 2026

Actions

View Source View Plugin View on GitHub View README

Firecrawl CLI

Search, scrape, and interact with the web. Returns clean markdown optimized for LLM context windows.

Run firecrawl --help or firecrawl <command> --help for full option details.

If the task is to integrate Firecrawl into an application, add FIRECRAWL_API_KEY to a project, or choose endpoint usage in product code, use the firecrawl-build skills. If the task is an outcome workflow such as deep research, SEO audit, QA, lead generation, knowledge-base creation, dashboard reporting, shopping research, or website design-system extraction, use the firecrawl-workflows skills. They are already installed alongside this CLI skill when you run firecrawl init.

Prerequisites

Must be installed and authenticated. Check with firecrawl --status.

  🔥 firecrawl cli v1.8.0

  ● Authenticated via FIRECRAWL_API_KEY
  Concurrency: 0/100 jobs (parallel scrape limit)
  Credits: 500,000 remaining

Concurrency: Max parallel jobs. Run parallel operations up to this limit.
Credits: Remaining API credits. Each operation consumes credits.

If not ready, see rules/install.md. For output handling guidelines, see rules/security.md.

Before doing real work, verify the setup with one small request:

mkdir -p .firecrawl
firecrawl scrape "https://firecrawl.dev" -o .firecrawl/install-check.md

firecrawl search "query" --scrape --limit 3

Workflow

Follow this escalation pattern:

Search - No specific URL yet. Find pages, answer questions, discover sources.
Scrape - Have a URL. Extract its content directly.
Map + Scrape - Large site or need a specific subpage. Use map --search to find the right URL, then scrape it.
Crawl - Need bulk content from an entire site section (e.g., all /docs/).
Monitor - Need recurring checks or ongoing alerts. Prefer setting a monitor with --page plus --goal instead of doing repeated one-off scrapes.
Interact - Scrape first, then interact with the page (pagination, modals, form submissions, multi-step navigation).

Need	Command	When
Find pages on a topic	`search`	No specific URL yet
Get a page's content	`scrape`	Have a URL, page is static or JS-rendered
Find URLs within a site	`map`	Need to locate a specific subpage
Bulk extract a site section	`crawl`	Need many pages (e.g., all /docs/)
AI-powered data extraction	`agent`	Need structured data from complex sites
Interact with a page	`scrape` + `interact`	Content requires clicks, form fills, pagination, or login
Download a site to files	`download`	Save an entire site as local files
Parse a local file	`parse`	File on disk (PDF, DOCX, XLSX, etc.) — not a URL
Watch pages for changes	`monitor`	Schedule recurring scrapes/crawls, diff against snapshots

For detailed command reference, run firecrawl <command> --help.

Scrape vs interact:

Use scrape first. It handles static pages and JS-rendered SPAs.
Use scrape + interact when you need to interact with a page, such as clicking buttons, filling out forms, navigating through a complex site, infinite scroll, or when scrape fails to grab all the content you need.
Never use interact for web searches - use search instead.

Monitor: Schedule recurring scrapes or crawls and diff each result against the last retained snapshot. Bias toward monitor when the user's goal is ongoing change detection, alerting, or repeated checks over time. For a single page, default to setting a monitor with --page <url> and --goal "...". Use for product pages, docs, blogs, changelogs, competitor sites — any page where changes matter. Each monitor should include a short goal describing what changes matter, and each check labels pages as same, new, changed, removed, or error, with webhook and email notification options.

When writing --goal, convert the user's monitoring intent into a concise 2-3 sentence monitor goal, similar to the web app setup flow:

Start with Alert when ... and state what should trigger an alert using the user's stated intent.
Restate scope the user mentioned, such as top N, price, role type, company, region, topic, status, or a specific entity.
Include an Ignore ... sentence only for intent-specific exclusions that are obvious from the request, such as points/comments for rankings, unrelated marketing copy for pricing, or general company-page updates for jobs.
Do not repeat generic noise exclusions in every goal; the judge already handles whitespace, casing, punctuation, encoding, formatting-only changes, request/session IDs, cache busters, tracking params, generic metadata noise, and unrelated page chrome.
Do not invent page-specific sections, entities, thresholds, exclusions, or business rules unless the user mentioned them.
If the user is vague, keep the goal broad rather than guessing exclusions.
If the user asks for "any change", preserve that and do not add exclusions.
If the user mentions noise they do not care about, include that explicitly.

Good goal examples:

User intent: top 10 hackernews stories Goal: Alert when stories enter, leave, or change rank within the Hacker News top 10. Ignore points, comments, and timestamps. Do not alert on changes outside the top 10.
User intent: pricing changes Goal: Alert when pricing information changes, including prices, plan names, billing periods, tiers, limits, or included features. Ignore unrelated marketing copy, testimonials, and regional currency display changes unless the underlying offer changes.
User intent: new engineering roles Goal: Alert when a new engineering role is posted. Ignore general company-page updates unless they add, remove, or change an engineering role.
User intent: track this page Goal: Alert when substantive visible content on this page changes.
User intent: any change Goal: Alert when any visible page content changes, including copy, numbers, timestamps, counters, links, and layout text.

# create from flags
firecrawl monitor create --name "Blog" --schedule "every 5 minutes" \
  --goal "Alert when a new blog post is published." \
  --page https://example.com/blog --email alerts@example.com

# multiple pages
firecrawl monitor create --name "Product pages" --schedule "every 5 minutes" \
  --goal "Alert when pricing, docs, or changelog content changes." \
  --scrape-urls https://example.com/pricing,https://example.com/docs,https://example.com/changelog

# webhook notifications
firecrawl monitor create --name "Docs webhook" --schedule "every 5 minutes" \
  --goal "Alert when docs content changes." \
  --page https://example.com/docs \
  --webhook-url https://example.com/webhook \
  --webhook-events monitor.page,monitor.check.completed

# or from JSON (positional file, or piped stdin)
firecrawl monitor create monitor.json
cat monitor.json | firecrawl monitor create

firecrawl monitor list --limit 20
firecrawl monitor run <monitorId>             # trigger a check now
firecrawl monitor checks <monitorId>          # list checks
firecrawl monitor check <monitorId> <checkId> --page-status changed
firecrawl monitor update <monitorId> --state paused
firecrawl monitor delete <monitorId>

Schedules accept cron (--cron "*/5 * * * *") or natural language (--schedule "every 5 minutes"). Minimum interval is 5 minutes. Targets are --page <url> for one page, --scrape-urls a,b,c for multiple scrape URLs, or --crawl-url <url> for a whole-site crawl each check. Use --goal for flag-based monitor creation, or include "goal": "..." in JSON payloads. Note: --state (not --status) sets active/paused; --page-status (not --status) filters page results on check — avoids collision with the global --status flag. Monitoring is not available for zero-data-retention teams.

JSON-mode change tracking: By default monitors diff each page's markdown and you get a unified text diff back. When you care about specific structured fields (price, headline, in-stock flag, items in a list) instead of the whole page, add a changeTracking format with modes: ["json"] and a JSON schema to the target's scrapeOptions.formats. The flag-based form doesn't cover this — pass a JSON body via file or stdin:

cat > pricing-monitor.json <<'EOF'
{
  "name": "Pricing watch",
  "goal": "Alert when plan prices or headline features change",
  "schedule": { "text": "hourly", "timezone": "UTC" },
  "targets": [{
    "type": "scrape",
    "urls": ["https://example.com/pricing"],
    "scrapeOptions": {
      "formats": [{
        "type": "changeTracking",
        "modes": ["json"],
        "prompt": "Extract pricing tiers and headline features for each plan.",
        "schema": {
          "type": "object",
          "properties": {
            "plans": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name":     { "type": "string" },
                  "price":    { "type": "string" },
                  "features": { "type": "array", "items": { "type": "string" } }
                }
              }
            }
          }
        }
      }]
    }
  }]
}
EOF
firecrawl monitor create pricing-monitor.json

The check response then carries a per-field diff (paths like plans[0].price) and the full extraction at this run, instead of (or in addition to) a markdown diff. Each changed page in pages[] looks like:

{
  "url": "https://example.com/pricing",
  "status": "changed",
  "diff": {
    "json": {
      "plans[0].price": { "previous": "$19/mo", "current": "$24/mo" },
      "plans[1].features[2]": {
        "previous": "10 GB storage",
        "current": "25 GB storage"
      }
    }
  },
  "snapshot": {
    "json": {
      "plans": [
        /* current full extraction */
      ]
    }
  }
}

Use modes: ["json", "git-diff"] for mixed mode: you get both diff.json (per-field) and diff.text (markdown sidecar), and the page is marked changed whenever either surface changed. For markdown-only monitors, diff.text holds the unified diff and diff.json is a parse-diff AST ({ files: [...] }); there is no snapshot.

Avoid redundant fetches:

search --scrape already fetches full page content. Don't re-scrape those URLs.
Check .firecrawl/ for existing data before fetching again.

When to Load References

Searching the web or finding sources first -> firecrawl-search
Scraping a known URL -> firecrawl-scrape
Finding URLs on a known site -> firecrawl-map
Bulk extraction from a docs section or site -> firecrawl-crawl
AI-powered structured extraction from complex sites -> firecrawl-agent
Clicks, forms, login, pagination, or post-scrape browser actions -> firecrawl-interact
Downloading a site to local files -> firecrawl-download
Parsing a local file (PDF, DOCX, XLSX, HTML, etc.) -> firecrawl-parse
Detecting content changes on a website and getting notified by webhook or email (pricing, jobs, posts, docs, status pages, anything ongoing) -> firecrawl-monitor
Install, auth, or setup problems -> rules/install.md
Output handling and safe file-reading patterns -> rules/security.md
Integrating Firecrawl into an app, adding FIRECRAWL_API_KEY to .env, or choosing endpoint usage in product code -> use the firecrawl-build skills (already installed alongside this CLI skill)
Producing Firecrawl-powered deliverables such as research briefs, SEO audits, QA reports, lead lists, knowledge bases, or design-system extraction -> use the firecrawl-workflows skills (already installed alongside this CLI skill). These skills infer from context first and ask only short blocking questions when needed.

Output & Organization

Unless the user specifies to return in context, write results to .firecrawl/ with -o. Add .firecrawl/ to .gitignore. Always quote URLs - shell interprets ? and & as special characters.

firecrawl search "react hooks" -o .firecrawl/search-react-hooks.json --json
firecrawl scrape "<url>" -o .firecrawl/page.md

Naming conventions:

.firecrawl/search-{query}.json
.firecrawl/search-{query}-scraped.json
.firecrawl/{site}-{path}.md

Never read entire output files at once. Use grep, head, or incremental reads:

wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md

Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.

Working with Results

These patterns are useful when working with file-based output (-o flag) for complex tasks:

# Extract URLs from search
jq -r '.data.web[].url' .firecrawl/search.json

# Get titles and URLs
jq -r '.data.web[] | "\(.title): \(.url)"' .firecrawl/search.json

After search: send feedback (refunds 1 credit)

Search costs 2 credits per call. After you finish using a search result, send structured feedback in the background. The first feedback per search id refunds 1 credit and feeds search-quality improvements.

SEARCH_ID=$(jq -r '.id' .firecrawl/search-react-hooks.json)

firecrawl search-feedback "$SEARCH_ID" \
  --rating good \
  --valuable-sources '[{"url":"https://react.dev/reference/react/hooks","reason":"Authoritative"}]' \
  --missing-content '[{"topic":"useDeferredValue example"},{"topic":"Server Components hooks"}]' \
  --query-suggestions "Boost react.dev for react-hooks queries" \
  --silent &

The most useful field is --missing-content: an array of specific pieces of content you expected to find but didn't. Use one entry per missing topic. Bad/partial feedback with detailed --missing-content is just as valuable as good feedback.

Opt out: export FIRECRAWL_NO_SEARCH_FEEDBACK=1 makes the CLI skip every feedback call silently. Respect that flag — do not try to work around it. See firecrawl-search for the full pattern.

Parallelization

Run independent operations in parallel. Check firecrawl --status for concurrency limit:

firecrawl scrape "<url-1>" -o .firecrawl/1.md &
firecrawl scrape "<url-2>" -o .firecrawl/2.md &
firecrawl scrape "<url-3>" -o .firecrawl/3.md &
wait

For interact, scrape multiple pages and interact with each independently using their scrape IDs.

Credit Usage

firecrawl credit-usage
firecrawl credit-usage --json --pretty -o .firecrawl/credits.json

Credits & Attribution

This skill is based on the excellent work by Firecrawl.

Original repository: https://github.com/firecrawl/cli

Special thanks to Firecrawl for their generous open-source contributions, which helped shape this skill collection. Adapted by webconsulting.at for this skill collection

firecrawl

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

firecrawl

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Firecrawl CLI

Prerequisites

Workflow

When to Load References

Output & Organization

Working with Results

After search: send feedback (refunds 1 credit)

Parallelization

Credit Usage

Credits & Attribution

Similar Skills

Firecrawl CLI

Prerequisites

Workflow

When to Load References

Output & Organization

Working with Results

After search: send feedback (refunds 1 credit)

Parallelization

Credit Usage

Credits & Attribution

Similar Skills