Name: firecrawl
Author: firecrawl

SKILL.md

$2c

Concurrency: Max parallel jobs. Run parallel operations up to this limit.

Credits: Remaining API credits. Each operation consumes credits.

If not ready, see rules/install.md. For output handling guidelines, see rules/security.md.

Before doing real work, verify the setup with one small request:

mkdir -p .firecrawl

firecrawl scrape "https://firecrawl.dev" -o .firecrawl/install-check.md

firecrawl search "query" --scrape --limit 3

Workflow

Follow this escalation pattern:

Search - No specific URL yet. Find pages, answer questions, discover sources.

Scrape - Have a URL. Extract its content directly.

Map + Scrape - Large site or need a specific subpage. Use map --search to find the right URL, then scrape it.

Crawl - Need bulk content from an entire site section (e.g., all /docs/).

Interact - Scrape first, then interact with the page (pagination, modals, form submissions, multi-step navigation).

Need

Command

When

Find pages on a topic

search

No specific URL yet

Get a page's content

scrape

Have a URL, page is static or JS-rendered

Find URLs within a site

map

Need to locate a specific subpage

Bulk extract a site section

crawl

Need many pages (e.g., all /docs/)

AI-powered data extraction

agent

Need structured data from complex sites

Interact with a page

scrape + interact

Content requires clicks, form fills, pagination, or login

Download a site to files

download

Save an entire site as local files

Parse a local file

parse

File on disk (PDF, DOCX, XLSX, etc.) — not a URL

Watch pages for changes

monitor

Schedule recurring scrapes/crawls, diff against snapshots

For detailed command reference, run firecrawl <command> --help.

Scrape vs interact:

Use scrape first. It handles static pages and JS-rendered SPAs.

Use scrape + interact when you need to interact with a page, such as clicking buttons, filling out forms, navigating through a complex site, infinite scroll, or when scrape fails to grab all the content you need.

Never use interact for web searches - use search instead.

Monitor: Schedule recurring scrapes or crawls and diff each result against the last retained snapshot. Use for product pages, docs, blogs, changelogs, competitor sites — any page where changes matter. Each check labels pages as same, new, changed, removed, or error, with webhook and email notification options.

# create from flags

firecrawl monitor create --name "Blog" --schedule "every 30 minutes" \

  --scrape-urls https://example.com/blog --email alerts@example.com

# or from JSON (positional file, or piped stdin)

firecrawl monitor create monitor.json

cat monitor.json | firecrawl monitor create

firecrawl monitor list --limit 20

firecrawl monitor run <monitorId>             # trigger a check now

firecrawl monitor checks <monitorId>          # list checks

firecrawl monitor check <monitorId> <checkId> --page-status changed

firecrawl monitor update <monitorId> --state paused

firecrawl monitor delete <monitorId>

Schedules accept cron (--cron "*/30 * * * *") or natural language (--schedule "every 30 minutes"). Minimum interval is 15 minutes. Targets are either --scrape-urls a,b,c (scrape) or --crawl-url <url> (crawl whole site each check). Note: --state (not --status) sets active/paused; --page-status (not --status) filters page results on check — avoids collision with the global --status flag. Monitoring is not available for zero-data-retention teams.

JSON-mode change tracking: By default monitors diff each page's markdown and you get a unified text diff back. When you care about specific structured fields (price, headline, in-stock flag, items in a list) instead of the whole page, add a changeTracking format with modes: ["json"] and a JSON schema to the target's scrapeOptions.formats. The flag-based form doesn't cover this — pass a JSON body via file or stdin:

cat > pricing-monitor.json <<'EOF'

{

  "name": "Pricing watch",

  "schedule": { "text": "hourly", "timezone": "UTC" },

  "targets": [{

    "type": "scrape",

    "urls": ["https://example.com/pricing"],

    "scrapeOptions": {

      "formats": [{

        "type": "changeTracking",

        "modes": ["json"],

        "prompt": "Extract pricing tiers and headline features for each plan.",

        "schema": {

          "type": "object",

          "properties": {

            "plans": {

              "type": "array",

              "items": {

                "type": "object",

                "properties": {

                  "name":     { "type": "string" },

                  "price":    { "type": "string" },

                  "features": { "type": "array", "items": { "type": "string" } }

                }

              }

            }

          }

        }

      }]

    }

  }]

}

EOF

firecrawl monitor create pricing-monitor.json

The check response then carries a per-field diff (paths like plans[0].price) and the full extraction at this run, instead of (or in addition to) a markdown diff. Each changed page in pages[] looks like:

{

  "url": "https://example.com/pricing",

  "status": "changed",

  "diff": {

    "json": {

      "plans[0].price": { "previous": "$19/mo", "current": "$24/mo" },

      "plans[1].features[2]": {

        "previous": "10 GB storage",

        "current": "25 GB storage"

      }

    }

  },

  "snapshot": {

    "json": {

      "plans": [

        /* current full extraction */

      ]

    }

  }

}

Use modes: ["json", "git-diff"] for mixed mode: you get both diff.json (per-field) and diff.text (markdown sidecar), and the page is marked changed whenever either surface changed. For markdown-only monitors, diff.text holds the unified diff and diff.json is a parse-diff AST ({ files: [...] }); there is no snapshot.

Avoid redundant fetches:

search --scrape already fetches full page content. Don't re-scrape those URLs.

Check .firecrawl/ for existing data before fetching again.

When to Load References

Searching the web or finding sources first -> firecrawl-search

Scraping a known URL -> firecrawl-scrape

Finding URLs on a known site -> firecrawl-map

Bulk extraction from a docs section or site -> firecrawl-crawl

AI-powered structured extraction from complex sites -> firecrawl-agent

Clicks, forms, login, pagination, or post-scrape browser actions -> firecrawl-interact

Downloading a site to local files -> firecrawl-download

Parsing a local file (PDF, DOCX, XLSX, HTML, etc.) -> firecrawl-parse

Install, auth, or setup problems -> rules/install.md

Output handling and safe file-reading patterns -> rules/security.md

**Integrating Firecrawl into an app, adding FIRECRAWL_API_KEY to .env, or choosing endpoint usage in product code** -> use the firecrawl-build skills (already installed alongside this CLI skill)

Producing Firecrawl-powered deliverables such as research briefs, SEO audits, QA reports, lead lists, knowledge bases, or design-system extraction -> use the firecrawl-workflows skills (already installed alongside this CLI skill). These skills infer from context first and ask only short blocking questions when needed.

Output & Organization

Unless the user specifies to return in context, write results to .firecrawl/ with -o. Add .firecrawl/ to .gitignore. Always quote URLs - shell interprets ? and & as special characters.

firecrawl search "react hooks" -o .firecrawl/search-react-hooks.json --json

firecrawl scrape "<url>" -o .firecrawl/page.md

Naming conventions:

.firecrawl/search-{query}.json

.firecrawl/search-{query}-scraped.json

.firecrawl/{site}-{path}.md

Never read entire output files at once. Use grep, head, or incremental reads:

wc -l .firecrawl/file.md &#x26;&#x26; head -50 .firecrawl/file.md

grep -n "keyword" .firecrawl/file.md

Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.

Working with Results

These patterns are useful when working with file-based output (-o flag) for complex tasks:

# Extract URLs from search

jq -r '.data.web[].url' .firecrawl/search.json

# Get titles and URLs

jq -r '.data.web[] | "\(.title): \(.url)"' .firecrawl/search.json

After search: send feedback (refunds 1 credit)

Search costs 2 credits per call. After you finish using a search result, send structured feedback in the background. The first feedback per search id refunds 1 credit and feeds search-quality improvements.

SEARCH_ID=$(jq -r '.id' .firecrawl/search-react-hooks.json)

firecrawl search-feedback "$SEARCH_ID" \

  --rating good \

  --valuable-sources '[{"url":"https://react.dev/reference/react/hooks","reason":"Authoritative"}]' \

  --missing-content '[{"topic":"useDeferredValue example"},{"topic":"Server Components hooks"}]' \

  --query-suggestions "Boost react.dev for react-hooks queries" \

  --silent &#x26;

The most useful field is --missing-content: an array of specific pieces of content you expected to find but didn't. Use one entry per missing topic. Bad/partial feedback with detailed --missing-content is just as valuable as good feedback.

Opt out: export FIRECRAWL_NO_SEARCH_FEEDBACK=1 makes the CLI skip every feedback call silently. Respect that flag — do not try to work around it. See firecrawl-search for the full pattern.

Parallelization

Run independent operations in parallel. Check firecrawl --status for concurrency limit:

firecrawl scrape "<url-1>" -o .firecrawl/1.md &#x26;

firecrawl scrape "<url-2>" -o .firecrawl/2.md &#x26;

firecrawl scrape "<url-3>" -o .firecrawl/3.md &#x26;

wait

For interact, scrape multiple pages and interact with each independently using their scrape IDs.

Credit Usage

firecrawl credit-usage

firecrawl credit-usage --json --pretty -o .firecrawl/credits.json

firecrawl

SKILL.md

Workflow

When to Load References

Output &#x26; Organization

Working with Results

After search: send feedback (refunds 1 credit)

Parallelization

Credit Usage

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers

Output & Organization