SKILL.md
CLI Setup Reference
Install, authentication, and troubleshooting for the Bright Data CLI (bdata) are documented in a single canonical place:
Consult it before any task that shells out to bdata.
Bright Data APIs
Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.
Choosing the Right API
Use Case
API
Why
Scrape any webpage by URL (no interaction)
Web Unlocker
HTTP-based, auto-bypasses bot detection, cheapest
Google / Bing / Yandex search results
SERP API
Specialized for SERP extraction, returns structured data
Structured data from Amazon, LinkedIn, Instagram, TikTok, etc.
Web Scraper API
Pre-built scrapers, no parsing needed
Click, scroll, fill forms, run JS, intercept XHR
Browser API
Full browser automation
Puppeteer / Playwright / Selenium automation
Browser API
Connects via CDP/WebDriver
Authentication Pattern (All APIs)
All APIs share the same authentication model. The env vars below apply to direct REST API integrations — if you are using the bdata CLI, bdata login handles all of these automatically (see references/cli-setup.md).
export BRIGHTDATA_API_KEY="your-api-key" # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name" # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name" # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD" # Browser API credentials
REST API authentication header for Web Unlocker and SERP API:
Authorization: Bearer YOUR_API_KEY
Web Unlocker API
HTTP-based scraping proxy. Best for simple page fetches without browser interaction.
Endpoint: POST https://api.brightdata.com/request
import requests
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_ZONE_NAME",
"url": "https://example.com/product/123",
"format": "raw"
}
)
html = response.text
Key Parameters
Parameter
Type
Description
zone
string
Zone name (required)
url
string
Target URL with http:// or https:// (required)
format
string
"raw" (HTML) or "json" (structured wrapper) (required)
method
string
HTTP verb, default "GET"
country
string
2-letter ISO for geo-targeting (e.g., "us", "de")
data_format
string
Transform: "markdown" or "screenshot"
async
boolean
true for async mode
Quick Patterns
# Get markdown (best for LLM input)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)
# Geo-targeted request
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
# Screenshot for debugging
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
# Async for bulk processing
json={"zone": ZONE, "url": url, "format": "raw", "async": True}
Critical rule: Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.
See references/web-unlocker.md for complete reference including proxy interface, special headers, async flow, features, and billing.
SERP API
Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.
Endpoint: POST https://api.brightdata.com/request (same as Web Unlocker)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_SERP_ZONE",
"url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
"format": "raw"
}
)
data = response.json()
for result in data.get("organic", []):
print(result["rank"], result["title"], result["link"])
Essential Google URL Parameters
Parameter
Description
Example
q
Search query
q=python+web+scraping
brd_json
Parsed JSON output
brd_json=1 (always use for data pipelines)
gl
Country for search
gl=us
hl
Language
hl=en
start
Pagination offset
start=10 (page 2), start=20 (page 3)
tbm
Search type
tbm=nws (news), tbm=isch (images), tbm=vid (videos)
brd_mobile
Device
brd_mobile=1 (mobile), brd_mobile=ios
brd_browser
Browser
brd_browser=chrome
brd_ai_overview
Trigger AI Overview
brd_ai_overview=2
uule
Encoded geo location
for precise location targeting
Note: num parameter is deprecated as of September 2025. Use start for pagination.
Parsed JSON Response Structure
{
"organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
"paid": [],
"people_also_ask": [],
"knowledge_graph": {},
"related_searches": [],
"general": {"results_cnt": 1240000000, "query": "..."}
}
Bing Key Parameters
Parameter
Description
q
Search query
setLang
Language (prefer 4-letter: en-US)
cc
Country code
first
Pagination (increment by 10: 1, 11, 21...)
safesearch
off, moderate, strict
brd_mobile
Device type
Async for Bulk SERP
# Submit
response = requests.post(
"https://api.brightdata.com/request",
params={"async": "1"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")
# Retrieve (retrieve calls are NOT billed)
result = requests.get(
"https://api.brightdata.com/serp/get_result",
params={"response_id": response_id},
headers={"Authorization": f"Bearer {API_KEY}"}
)
Billing: Pay per 1,000 successful requests only. Async retrieve calls are not billed.
See references/serp-api.md for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.
Web Scraper API
Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.
Sync Endpoint: POST https://api.brightdata.com/datasets/v3/scrape
Async Endpoint: POST https://api.brightdata.com/datasets/v3/trigger
# Sync (up to 20 URLs, returns immediately)
response = requests.post(
"https://api.brightdata.com/datasets/v3/scrape",
params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)
if response.status_code == 200:
data = response.json() # Results ready
elif response.status_code == 202:
snapshot_id = response.json()["snapshot_id"] # Poll for completion
Parameters
Parameter
Type
Description
dataset_id
string
Scraper identifier from the Scraper Library (required)
format
string
json (default), ndjson, jsonl, csv
custom_output_fields
string
Pipe-separated fields: url|title|price
include_errors
boolean
Include error info in results
Request Body
{
"input": [
{ "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
{ "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
]
}
Poll for Async Results
import time
# Trigger
snapshot_id = requests.post(
"https://api.brightdata.com/datasets/v3/trigger",
params={"dataset_id": DATASET_ID, "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]
# Poll
while True:
status = requests.get(
f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)
# Download
data = requests.get(
f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
params={"format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"}
).json()
Progress status values: starting → running → ready | failed
Data retention: 30 days.
Billing: Per delivered record. Invalid input URLs that fail are still billable.
See references/web-scraper-api.md for complete reference including scraper types, output formats, delivery options, and billing details.
Browser API (Scraping Browser)
Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.
Connection:
- Playwright/Puppeteer:
wss://${AUTH}@brd.superproxy.io:9222
- Selenium:
https://${AUTH}@brd.superproxy.io:9515
const { chromium } = require("playwright-core");
const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
page = await browser.new_page()
page.set_default_navigation_timeout(120000)
await page.goto("https://example.com", wait_until="domcontentloaded")
html = await page.content()
await browser.close()
Custom CDP Functions
Function
Purpose
Captcha.solve
Manually trigger CAPTCHA solving
Captcha.setAutoSolve
Enable/disable auto CAPTCHA solving
Proxy.setLocation
Set precise geo location (call BEFORE goto)
Proxy.useSession
Maintain same IP across sessions
Emulation.setDevice
Apply device profile (iPhone 14, etc.)
Emulation.getSupportedDevices
List available device profiles
Unblocker.enableAdBlock
Block ads to save bandwidth
Unblocker.disableAdBlock
Re-enable ads
Input.type
Fast text input for bulk form filling
Browser.addCertificate
Install client SSL cert for session
Page.inspect
Get DevTools debug URL for live session
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();
// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });
// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
latitude: 37.7749,
longitude: -122.4194,
distance: 10,
strict: true
});
// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });
// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });
Session Rules
- One initial navigation per session — new URL = new session
- Idle timeout: 5 minutes
- Max duration: 30 minutes
Geolocation
- Country-level: append
-country-usto credentials username
- EU-wide: append
-country-eu(routes through 29+ European countries)
- Precise: use
Proxy.setLocationCDP command (before navigation)
Error Codes
Code
Issue
Fix
407
Wrong port
Playwright/Puppeteer → 9222, Selenium → 9515
403
Bad auth
Check credentials format and zone type
503
Service scaling
Wait 1 minute, reconnect
Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.
See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.
Detailed References
- references/web-unlocker.md — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
- references/serp-api.md — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
- references/web-scraper-api.md — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
- references/browser-api.md — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes