SKILL.md
$2c
- Concurrency: Max parallel jobs. Run parallel operations up to this limit.
- Credits: Remaining API credits. Each operation consumes credits.
If not ready, see rules/install.md. For output handling guidelines, see rules/security.md.
Before doing real work, verify the setup with one small request:
mkdir -p .firecrawl
firecrawl scrape "https://firecrawl.dev" -o .firecrawl/install-check.md
firecrawl search "query" --scrape --limit 3
Workflow
Follow this escalation pattern:
- Search - No specific URL yet. Find pages, answer questions, discover sources.
- Scrape - Have a URL. Extract its content directly.
- Map + Scrape - Large site or need a specific subpage. Use
map --searchto find the right URL, then scrape it.
- Crawl - Need bulk content from an entire site section (e.g., all /docs/).
- Interact - Scrape first, then interact with the page (pagination, modals, form submissions, multi-step navigation).
Need
Command
When
Find pages on a topic
search
No specific URL yet
Get a page's content
scrape
Have a URL, page is static or JS-rendered
Find URLs within a site
map
Need to locate a specific subpage
Bulk extract a site section
crawl
Need many pages (e.g., all /docs/)
AI-powered data extraction
agent
Need structured data from complex sites
Interact with a page
scrape + interact
Content requires clicks, form fills, pagination, or login
Download a site to files
download
Save an entire site as local files
Parse a local file
parse
File on disk (PDF, DOCX, XLSX, etc.) — not a URL
Watch pages for changes
monitor
Schedule recurring scrapes/crawls, diff against snapshots
For detailed command reference, run firecrawl <command> --help.
Scrape vs interact:
- Use
scrapefirst. It handles static pages and JS-rendered SPAs.
- Use
scrape+interactwhen you need to interact with a page, such as clicking buttons, filling out forms, navigating through a complex site, infinite scroll, or when scrape fails to grab all the content you need.
- Never use interact for web searches - use
searchinstead.
Monitor: Schedule recurring scrapes or crawls and diff each result against the last retained snapshot. Use for product pages, docs, blogs, changelogs, competitor sites — any page where changes matter. Each check labels pages as same, new, changed, removed, or error, with webhook and email notification options.
Subcommands: create | list | get | update | delete | run | checks | check.
# create from flags
firecrawl monitor create --name "Blog" --schedule "every 30 minutes" \
--scrape-urls https://example.com/blog --email alerts@example.com
# or from JSON (positional file, or piped stdin)
firecrawl monitor create monitor.json
cat monitor.json | firecrawl monitor create
firecrawl monitor list --limit 20
firecrawl monitor run <monitorId> # trigger a check now
firecrawl monitor checks <monitorId> # list checks
firecrawl monitor check <monitorId> <checkId> --page-status changed
firecrawl monitor update <monitorId> --state paused
firecrawl monitor delete <monitorId>
Schedules accept cron (--cron "*/30 * * * *") or natural language (--schedule "every 30 minutes"). Minimum interval is 15 minutes. Targets are either --scrape-urls a,b,c (scrape) or --crawl-url <url> (crawl whole site each check). Note: --state (not --status) sets active/paused; --page-status (not --status) filters page results on check — avoids collision with the global --status flag. Monitoring is not available for zero-data-retention teams.
JSON-mode change tracking: By default monitors diff each page's markdown and you get a unified text diff back. When you care about specific structured fields (price, headline, in-stock flag, items in a list) instead of the whole page, add a changeTracking format with modes: ["json"] and a JSON schema to the target's scrapeOptions.formats. The flag-based form doesn't cover this — pass a JSON body via file or stdin:
cat > pricing-monitor.json <<'EOF'
{
"name": "Pricing watch",
"schedule": { "text": "hourly", "timezone": "UTC" },
"targets": [{
"type": "scrape",
"urls": ["https://example.com/pricing"],
"scrapeOptions": {
"formats": [{
"type": "changeTracking",
"modes": ["json"],
"prompt": "Extract pricing tiers and headline features for each plan.",
"schema": {
"type": "object",
"properties": {
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "string" },
"features": { "type": "array", "items": { "type": "string" } }
}
}
}
}
}
}]
}
}]
}
EOF
firecrawl monitor create pricing-monitor.json
The check response then carries a per-field diff (paths like plans[0].price) and the full extraction at this run, instead of (or in addition to) a markdown diff. Each changed page in pages[] looks like:
{
"url": "https://example.com/pricing",
"status": "changed",
"diff": {
"json": {
"plans[0].price": { "previous": "$19/mo", "current": "$24/mo" },
"plans[1].features[2]": {
"previous": "10 GB storage",
"current": "25 GB storage"
}
}
},
"snapshot": {
"json": {
"plans": [
/* current full extraction */
]
}
}
}
Use modes: ["json", "git-diff"] for mixed mode: you get both diff.json (per-field) and diff.text (markdown sidecar), and the page is marked changed whenever either surface changed. For markdown-only monitors, diff.text holds the unified diff and diff.json is a parse-diff AST ({ files: [...] }); there is no snapshot.
Avoid redundant fetches:
search --scrapealready fetches full page content. Don't re-scrape those URLs.
- Check
.firecrawl/for existing data before fetching again.
When to Load References
- Searching the web or finding sources first -> firecrawl-search
- Scraping a known URL -> firecrawl-scrape
- Finding URLs on a known site -> firecrawl-map
- Bulk extraction from a docs section or site -> firecrawl-crawl
- AI-powered structured extraction from complex sites -> firecrawl-agent
- Clicks, forms, login, pagination, or post-scrape browser actions -> firecrawl-interact
- Downloading a site to local files -> firecrawl-download
- Parsing a local file (PDF, DOCX, XLSX, HTML, etc.) -> firecrawl-parse
- Install, auth, or setup problems -> rules/install.md
- Output handling and safe file-reading patterns -> rules/security.md
- **Integrating Firecrawl into an app, adding
FIRECRAWL_API_KEYto.env, or choosing endpoint usage in product code** -> use thefirecrawl-buildskills (already installed alongside this CLI skill)
- Producing Firecrawl-powered deliverables such as research briefs, SEO audits, QA reports, lead lists, knowledge bases, or design-system extraction -> use the
firecrawl-workflowsskills (already installed alongside this CLI skill). These skills infer from context first and ask only short blocking questions when needed.
Output & Organization
Unless the user specifies to return in context, write results to .firecrawl/ with -o. Add .firecrawl/ to .gitignore. Always quote URLs - shell interprets ? and & as special characters.
firecrawl search "react hooks" -o .firecrawl/search-react-hooks.json --json
firecrawl scrape "<url>" -o .firecrawl/page.md
Naming conventions:
.firecrawl/search-{query}.json
.firecrawl/search-{query}-scraped.json
.firecrawl/{site}-{path}.md
Never read entire output files at once. Use grep, head, or incremental reads:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.
Working with Results
These patterns are useful when working with file-based output (-o flag) for complex tasks:
# Extract URLs from search
jq -r '.data.web[].url' .firecrawl/search.json
# Get titles and URLs
jq -r '.data.web[] | "\(.title): \(.url)"' .firecrawl/search.json
After search: send feedback (refunds 1 credit)
Search costs 2 credits per call. After you finish using a search result, send structured feedback in the background. The first feedback per search id refunds 1 credit and feeds search-quality improvements.
SEARCH_ID=$(jq -r '.id' .firecrawl/search-react-hooks.json)
firecrawl search-feedback "$SEARCH_ID" \
--rating good \
--valuable-sources '[{"url":"https://react.dev/reference/react/hooks","reason":"Authoritative"}]' \
--missing-content '[{"topic":"useDeferredValue example"},{"topic":"Server Components hooks"}]' \
--query-suggestions "Boost react.dev for react-hooks queries" \
--silent &
The most useful field is --missing-content: an array of specific pieces of content you expected to find but didn't. Use one entry per missing topic. Bad/partial feedback with detailed --missing-content is just as valuable as good feedback.
Opt out: export FIRECRAWL_NO_SEARCH_FEEDBACK=1 makes the CLI skip every feedback call silently. Respect that flag — do not try to work around it. See firecrawl-search for the full pattern.
Parallelization
Run independent operations in parallel. Check firecrawl --status for concurrency limit:
firecrawl scrape "<url-1>" -o .firecrawl/1.md &
firecrawl scrape "<url-2>" -o .firecrawl/2.md &
firecrawl scrape "<url-3>" -o .firecrawl/3.md &
wait
For interact, scrape multiple pages and interact with each independently using their scrape IDs.
Credit Usage
firecrawl credit-usage
firecrawl credit-usage --json --pretty -o .firecrawl/credits.json