Name: scrape
Author: brightdata

SKILL.md

$27

Situation

Action

Single URL

bdata scrape <url> -f markdown

Small list (≤ ~20 URLs)

shell loop, 1 at a time (see references/patterns.md)

Larger list (dozens+)

xargs -P 4 with parallelism cap (see references/patterns.md)

Paginated listing

scrape page 1 → extract next-page URL → append → repeat (see references/examples.md)

JS-heavy / login-gated / interaction-required

escalate to bdata browser (see brightdata-cli skill)

Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, …

**stop — hand off to data-feeds**

No URL yet, just a topic

**hand off to search**

Action

Core commands:

# Clean markdown (default)

bdata scrape "https://example.com/article" -f markdown -o article.md

# Raw HTML (when you need the DOM)

bdata scrape "https://example.com" -f html -o page.html

# Structured JSON (when the Unlocker returns parsed fields)

bdata scrape "https://example.com" -f json --pretty -o page.json

# Visual snapshot (saves PNG)

bdata scrape "https://example.com" -f screenshot -o page.png

# Geo-targeted (override the exit country)

bdata scrape "https://example.com" --country de -f markdown

Full flag reference: references/flags.md.

Verification gate (run before claiming success)

Non-empty output: test -s "$out_path" — or, for stdout, at least 200 bytes of content.

Not a block page — grep the output for any of these signatures (case-insensitive):

Access Denied

Just a moment

Attention Required

Checking your browser

captcha

cf-browser-verification

cloudflare (with < 2KB total body)

Expected markers present for the task: e.g., a product page should contain a price pattern (\$\d); an article should contain at least one <h1> or # heading.

On failure, escalation ladder:

Retry with a different --country (e.g., --country de if the origin site is US)

Escalate to bdata browser for full JS rendering (hand off to brightdata-cli skill)

Do not report success until all checks above pass.

Red flags

Claiming success without inspecting the output.

Silencing errors with 2>/dev/null — you'll miss auth failures and rate-limit errors.

Running bdata scrape on Amazon/LinkedIn/TikTok/Instagram/YouTube/Reddit URLs — these are supported by data-feeds and return structured data directly. Scraping loses the structure.

Scraping the same URL repeatedly in the same task — cache the first result.

Looping bdata scrape sequentially for large lists instead of using xargs -P 4 (or similar) with a parallelism cap.

Using curl against api.brightdata.com directly — legacy path; only when the CLI isn't available.

References

references/flags.md — every flag with when-to-use notes.

references/patterns.md — shell-loop batching, xargs parallelism, pagination recipe, retry/backoff, block-page recovery chain, legacy curl fallback.

references/examples.md — (1) single page → markdown, (2) batch a list of URLs with parallelism cap, (3) paginated listing, (4) block-page recovery.

scrape

SKILL.md

Action

Verification gate (run before claiming success)

Red flags

References

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers