firecrawl-scraper

Web scraping and content extraction via Firecrawl API with format conversion, page interaction, and batch processing. Five endpoint modes: scrape single pages, crawl entire websites, map URLs, batch-scrape multiple URLs, and check crawl job status Supports multiple output formats including markdown, HTML, JSON with schema extraction, screenshots, and PDF parsing Browser automation actions (click, scroll, wait, write, JavaScript execution) enable interaction before content extraction Content filtering via CSS selectors, regex path patterns, and depth/limit controls for targeted crawling Requires FIRECRAWL_API_KEY environment variable or .env file configuration

INSTALLATION
npx skills add https://github.com/benedictking/firecrawl-scraper --skill firecrawl-scraper
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$29

Execution Method

Use Task tool to invoke firecrawl-fetcher sub-skill, passing command and JSON (stdin):

Task parameters:

- subagent_type: Bash

- description: "Call Firecrawl API"

- prompt: cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs <scrape|crawl|map|batch-scrape|crawl-status> [--wait]

  { ...payload... }

  JSON

Payload Examples

1) Scrape Single Page

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape

{

  "url": "https://example.com",

  "formats": ["markdown", "links"],

  "onlyMainContent": true,

  "includeTags": [],

  "excludeTags": ["nav", "footer"],

  "waitFor": 0,

  "timeout": 30000

}

JSON

Available formats:

  • "markdown", "html", "rawHtml", "links", "images", "summary"
  • {"type": "json", "prompt": "Extract product info", "schema": {...}}
  • {"type": "screenshot", "fullPage": true, "quality": 85}

2) Scrape with Actions (Page Interaction)

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape

{

  "url": "https://example.com",

  "formats": ["markdown"],

  "actions": [

    {"type": "wait", "milliseconds": 2000},

    {"type": "click", "selector": "#load-more"},

    {"type": "wait", "milliseconds": 1000},

    {"type": "scroll", "direction": "down", "amount": 500}

  ]

}

JSON

Available actions:

  • wait, click, write, press, scroll, screenshot, scrape, executeJavascript

3) Parse PDF

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape

{

  "url": "https://example.com/document.pdf",

  "formats": ["markdown"],

  "parsers": ["pdf"]

}

JSON

4) Extract Structured JSON

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape

{

  "url": "https://example.com/product",

  "formats": [

    {

      "type": "json",

      "prompt": "Extract product information",

      "schema": {

        "type": "object",

        "properties": {

          "name": {"type": "string"},

          "price": {"type": "number"},

          "description": {"type": "string"}

        },

        "required": ["name", "price"]

      }

    }

  ]

}

JSON

5) Crawl Entire Website

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl

{

  "url": "https://docs.example.com",

  "formats": ["markdown"],

  "includePaths": ["^/docs/.*"],

  "excludePaths": ["^/blog/.*"],

  "maxDiscoveryDepth": 3,

  "limit": 100,

  "allowExternalLinks": false,

  "allowSubdomains": false

}

JSON

5.1) Crawl + Wait for Completion

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl --wait

{

  "url": "https://docs.example.com",

  "formats": ["markdown"],

  "limit": 100

}

JSON

6) Map Website URLs

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs map

{

  "url": "https://example.com",

  "search": "documentation",

  "limit": 5000

}

JSON

7) Batch Scrape Multiple URLs

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs batch-scrape

{

  "urls": [

    "https://example.com/page1",

    "https://example.com/page2",

    "https://example.com/page3"

  ],

  "formats": ["markdown"]

}

JSON

8) Check Crawl Status

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id>

Wait for completion:

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id> --wait

Key Features

Formats

  • markdown: Clean markdown content
  • html: Parsed HTML
  • rawHtml: Original HTML
  • links: All links on page
  • images: All images on page
  • summary: AI-generated summary
  • json: Structured data extraction with schema
  • screenshot: Page screenshot (PNG)

Content Control

  • onlyMainContent: Extract only main content (default: true)
  • includeTags: CSS selectors to include
  • excludeTags: CSS selectors to exclude
  • waitFor: Wait time before scraping (ms)
  • maxAge: Cache duration (default: 48 hours)

Actions (Browser Automation)

  • wait: Wait for specified time
  • click: Click element by selector
  • write: Input text into field
  • press: Press keyboard key
  • scroll: Scroll page
  • executeJavascript: Run custom JS

Crawl Options

  • includePaths: Regex patterns to include
  • excludePaths: Regex patterns to exclude
  • maxDiscoveryDepth: Maximum crawl depth
  • limit: Maximum pages to crawl
  • allowExternalLinks: Follow external links
  • allowSubdomains: Follow subdomains

Environment Variables &#x26; API Key

Two ways to configure API Key (priority: environment variable > .env):

  • Environment variable: FIRECRAWL_API_KEY
  • .env file: Place in .claude/skills/firecrawl-scraper/.env, can copy from .env.example

Response Format

All endpoints return JSON with:

  • success: Boolean indicating success
  • data: Extracted content (format depends on endpoint)
  • For crawl: Returns job ID, use crawl-status (or GET /v2/crawl/{id}) to check status
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card