web-scraper

Extract structured data from websites. Use when: collecting competitor pricing; scraping product listings; extracting contact information; gathering research…

INSTALLATION
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill web-scraper
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Web Scraper

Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data.

When to Use This Skill

  • Competitor research - Scrape pricing, features, positioning
  • Lead generation - Extract contact info from directories
  • Content audit - Pull headings, links, meta data
  • Price monitoring - Track competitor pricing changes
  • Data collection - Gather research data from multiple sources

What Claude Does vs What You Decide

Claude Does

You Decide

Structures analysis frameworks

Strategic priorities

Synthesizes market data

Competitive positioning

Identifies opportunities

Resource allocation

Creates strategic options

Final strategy selection

Suggests implementation approaches

Execution decisions

Dependencies

pip install beautifulsoup4 requests pandas click lxml

Commands

Scrape Elements

python scripts/main.py scrape https://example.com --selector "h1,h2,p"

python scripts/main.py scrape https://example.com --selector ".product-price"

Extract Links

python scripts/main.py links https://example.com

python scripts/main.py links https://example.com --internal-only

Extract Emails

python scripts/main.py emails https://example.com

python scripts/main.py emails https://example.com --depth 2

Extract Structured Data

python scripts/main.py structured https://example.com/article --schema article

python scripts/main.py structured https://example.com/product --schema product

Examples

Example 1: Scrape Competitor Pricing

python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"

# Output:

# Extracted 6 elements

# 1. Starter - $29/mo

# 2. Pro - $99/mo

# 3. Enterprise - Contact us

Example 2: Extract Article Content

python scripts/main.py structured https://blog.example.com/post --schema article

# Output: article_data.json

# {

#   "title": "How to Scale Your Startup",

#   "author": "Jane Doe",

#   "date": "2024-01-15",

#   "content": "...",

#   "word_count": 1523

# }

CSS Selector Reference

Selector

Description

Example

tag

Element type

h1, p, div

.class

Class name

.price, .title

#id

Element ID

#main-content

tag.class

Tag with class

div.product

tag[attr]

Has attribute

a[href]

parent > child

Direct child

ul > li

tag1, tag2

Multiple

h1, h2, h3

Ethical Scraping Guidelines

  • Check robots.txt - Respect site's scraping policy
  • Rate limit - Don't overload servers (1-2 req/sec)
  • Identify yourself - Use descriptive User-Agent
  • Cache requests - Don't re-scrape unchanged pages
  • Terms of Service - Check if scraping is allowed

Skill Boundaries

What This Skill Does Well

  • Structuring strategic analysis
  • Identifying market opportunities
  • Creating strategic frameworks
  • Synthesizing competitive data

What This Skill Cannot Do

  • Replace market research
  • Guarantee strategic success
  • Know proprietary competitor info
  • Make executive decisions

Related Skills

Skill Metadata

  • Mode: centaur
category: automation

subcategory: data-extraction

dependencies: [beautifulsoup4, requests, pandas]

difficulty: intermediate

time_saved: 5+ hours/week
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card