SKILL.md
$27
Situation
Action
Single URL
bdata scrape <url> -f markdown
Small list (≤ ~20 URLs)
shell loop, 1 at a time (see references/patterns.md)
Larger list (dozens+)
xargs -P 4 with parallelism cap (see references/patterns.md)
Paginated listing
scrape page 1 → extract next-page URL → append → repeat (see references/examples.md)
JS-heavy / login-gated / interaction-required
escalate to bdata browser (see brightdata-cli skill)
Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, …
**stop — hand off to data-feeds**
No URL yet, just a topic
**hand off to search**
Action
Core commands:
# Clean markdown (default)
bdata scrape "https://example.com/article" -f markdown -o article.md
# Raw HTML (when you need the DOM)
bdata scrape "https://example.com" -f html -o page.html
# Structured JSON (when the Unlocker returns parsed fields)
bdata scrape "https://example.com" -f json --pretty -o page.json
# Visual snapshot (saves PNG)
bdata scrape "https://example.com" -f screenshot -o page.png
# Geo-targeted (override the exit country)
bdata scrape "https://example.com" --country de -f markdown
Full flag reference: references/flags.md.
Verification gate (run before claiming success)
- Non-empty output:
test -s "$out_path"— or, for stdout, at least 200 bytes of content.
- Not a block page — grep the output for any of these signatures (case-insensitive):
Access Denied
Just a moment
Attention Required
Checking your browser
captcha
cf-browser-verification
cloudflare(with < 2KB total body)
- Expected markers present for the task: e.g., a product page should contain a price pattern (
\$\d); an article should contain at least one<h1>or#heading.
- On failure, escalation ladder:
- Retry with a different
--country(e.g.,--country deif the origin site is US)
- Escalate to
bdata browserfor full JS rendering (hand off tobrightdata-cliskill)
Do not report success until all checks above pass.
Red flags
- Claiming success without inspecting the output.
- Silencing errors with
2>/dev/null— you'll miss auth failures and rate-limit errors.
- Running
bdata scrapeon Amazon/LinkedIn/TikTok/Instagram/YouTube/Reddit URLs — these are supported bydata-feedsand return structured data directly. Scraping loses the structure.
- Scraping the same URL repeatedly in the same task — cache the first result.
- Looping
bdata scrapesequentially for large lists instead of usingxargs -P 4(or similar) with a parallelism cap.
- Using
curlagainstapi.brightdata.comdirectly — legacy path; only when the CLI isn't available.
References
- references/flags.md — every flag with when-to-use notes.
- references/patterns.md — shell-loop batching,
xargsparallelism, pagination recipe, retry/backoff, block-page recovery chain, legacycurlfallback.
- references/examples.md — (1) single page → markdown, (2) batch a list of URLs with parallelism cap, (3) paginated listing, (4) block-page recovery.