firecrawl-knowledge-base

Name: firecrawl-knowledge-base
Author: firecrawl

Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic…

INSTALLATION

npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-knowledge-base

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Firecrawl Knowledge Base

Use this to turn URLs or topics into organized LLM-ready content.

Onboarding Interview

Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.

Firecrawl Collection Plan

Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.

For files, follow the Firecrawl download-style convention:

.firecrawl/

  <hostname>/

    <path>/

      index.md

Parallel Work

If appropriate, use sub-agents or equivalent parallel task runners:

one docs section per researcher

official docs, tutorials, community discussions, and references by source type

source scraping vs chunk generation vs manifest generation

Output Modes

Reference: markdown files, index.md, and sources.json.

RAG: markdown files plus chunk files and manifest.json.

Training: scraped source files plus training-data.jsonl and training-metadata.json.

Docs mirror: complete markdown mirror with a table of contents.

Final Deliverable

# Knowledge Base: [Source]

## Summary

[What was collected and why]

## Output Structure

[Files/directories created]

## Coverage

[Sections, source types, counts]

## Usage Notes

[How to use in RAG, docs, training, or agent context]

## Sources

[URLs collected]

## Rerun Inputs

workflow: firecrawl-knowledge-base

source: [url/topic]

goal: [reference/rag/train/docs]

depth: [quick/thorough/exhaustive]

output_dir: [.firecrawl/]

Quality Bar

Preserve code examples and formatting.

Remove boilerplate navigation where possible.

Include source URLs in frontmatter or metadata.