bright-data-best-practices

Build production-ready Bright Data integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code,…

INSTALLATION
npx skills add https://github.com/brightdata/skills --skill bright-data-best-practices
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

CLI Setup Reference

Install, authentication, and troubleshooting for the Bright Data CLI (bdata) are documented in a single canonical place:

references/cli-setup.md

Consult it before any task that shells out to bdata.

Bright Data APIs

Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.

Choosing the Right API

Use Case

API

Why

Scrape any webpage by URL (no interaction)

Web Unlocker

HTTP-based, auto-bypasses bot detection, cheapest

Google / Bing / Yandex search results

SERP API

Specialized for SERP extraction, returns structured data

Structured data from Amazon, LinkedIn, Instagram, TikTok, etc.

Web Scraper API

Pre-built scrapers, no parsing needed

Click, scroll, fill forms, run JS, intercept XHR

Browser API

Full browser automation

Puppeteer / Playwright / Selenium automation

Browser API

Connects via CDP/WebDriver

Authentication Pattern (All APIs)

All APIs share the same authentication model. The env vars below apply to direct REST API integrations — if you are using the bdata CLI, bdata login handles all of these automatically (see references/cli-setup.md).

export BRIGHTDATA_API_KEY="your-api-key"         # From Control Panel > Account Settings

export BRIGHTDATA_UNLOCKER_ZONE="zone-name"       # Web Unlocker zone name

export BRIGHTDATA_SERP_ZONE="serp-zone-name"      # SERP API zone name

export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD"  # Browser API credentials

REST API authentication header for Web Unlocker and SERP API:

Authorization: Bearer YOUR_API_KEY

Web Unlocker API

HTTP-based scraping proxy. Best for simple page fetches without browser interaction.

Endpoint: POST https://api.brightdata.com/request

import requests

response = requests.post(

    "https://api.brightdata.com/request",

    headers={"Authorization": f"Bearer {API_KEY}"},

    json={

        "zone": "YOUR_ZONE_NAME",

        "url": "https://example.com/product/123",

        "format": "raw"

    }

)

html = response.text

Key Parameters

Parameter

Type

Description

zone

string

Zone name (required)

url

string

Target URL with http:// or https:// (required)

format

string

"raw" (HTML) or "json" (structured wrapper) (required)

method

string

HTTP verb, default "GET"

country

string

2-letter ISO for geo-targeting (e.g., "us", "de")

data_format

string

Transform: "markdown" or "screenshot"

async

boolean

true for async mode

Quick Patterns

# Get markdown (best for LLM input)

response = requests.post(

    "https://api.brightdata.com/request",

    headers={"Authorization": f"Bearer {API_KEY}"},

    json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}

)

# Geo-targeted request

json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}

# Screenshot for debugging

json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}

# Async for bulk processing

json={"zone": ZONE, "url": url, "format": "raw", "async": True}

Critical rule: Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.

See references/web-unlocker.md for complete reference including proxy interface, special headers, async flow, features, and billing.

SERP API

Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.

Endpoint: POST https://api.brightdata.com/request (same as Web Unlocker)

response = requests.post(

    "https://api.brightdata.com/request",

    headers={"Authorization": f"Bearer {API_KEY}"},

    json={

        "zone": "YOUR_SERP_ZONE",

        "url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",

        "format": "raw"

    }

)

data = response.json()

for result in data.get("organic", []):

    print(result["rank"], result["title"], result["link"])

Essential Google URL Parameters

Parameter

Description

Example

q

Search query

q=python+web+scraping

brd_json

Parsed JSON output

brd_json=1 (always use for data pipelines)

gl

Country for search

gl=us

hl

Language

hl=en

start

Pagination offset

start=10 (page 2), start=20 (page 3)

tbm

Search type

tbm=nws (news), tbm=isch (images), tbm=vid (videos)

brd_mobile

Device

brd_mobile=1 (mobile), brd_mobile=ios

brd_browser

Browser

brd_browser=chrome

brd_ai_overview

Trigger AI Overview

brd_ai_overview=2

uule

Encoded geo location

for precise location targeting

Note: num parameter is deprecated as of September 2025. Use start for pagination.

Parsed JSON Response Structure

{

  "organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],

  "paid": [],

  "people_also_ask": [],

  "knowledge_graph": {},

  "related_searches": [],

  "general": {"results_cnt": 1240000000, "query": "..."}

}

Bing Key Parameters

Parameter

Description

q

Search query

setLang

Language (prefer 4-letter: en-US)

cc

Country code

first

Pagination (increment by 10: 1, 11, 21...)

safesearch

off, moderate, strict

brd_mobile

Device type

Async for Bulk SERP

# Submit

response = requests.post(

    "https://api.brightdata.com/request",

    params={"async": "1"},

    headers={"Authorization": f"Bearer {API_KEY}"},

    json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}

)

response_id = response.headers.get("x-response-id")

# Retrieve (retrieve calls are NOT billed)

result = requests.get(

    "https://api.brightdata.com/serp/get_result",

    params={"response_id": response_id},

    headers={"Authorization": f"Bearer {API_KEY}"}

)

Billing: Pay per 1,000 successful requests only. Async retrieve calls are not billed.

See references/serp-api.md for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.

Web Scraper API

Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.

Sync Endpoint: POST https://api.brightdata.com/datasets/v3/scrape

Async Endpoint: POST https://api.brightdata.com/datasets/v3/trigger

# Sync (up to 20 URLs, returns immediately)

response = requests.post(

    "https://api.brightdata.com/datasets/v3/scrape",

    params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},

    headers={"Authorization": f"Bearer {API_KEY}"},

    json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}

)

if response.status_code == 200:

    data = response.json()  # Results ready

elif response.status_code == 202:

    snapshot_id = response.json()["snapshot_id"]  # Poll for completion

Parameters

Parameter

Type

Description

dataset_id

string

Scraper identifier from the Scraper Library (required)

format

string

json (default), ndjson, jsonl, csv

custom_output_fields

string

Pipe-separated fields: url|title|price

include_errors

boolean

Include error info in results

Request Body

{

  "input": [

    { "url": "https://www.amazon.com/dp/B09X7M8TBQ" },

    { "url": "https://www.amazon.com/dp/B0B7CTCPKN" }

  ]

}

Poll for Async Results

import time

# Trigger

snapshot_id = requests.post(

    "https://api.brightdata.com/datasets/v3/trigger",

    params={"dataset_id": DATASET_ID, "format": "json"},

    headers={"Authorization": f"Bearer {API_KEY}"},

    json={"input": [{"url": u} for u in urls]}

).json()["snapshot_id"]

# Poll

while True:

    status = requests.get(

        f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",

        headers={"Authorization": f"Bearer {API_KEY}"}

    ).json()["status"]

    if status == "ready": break

    if status == "failed": raise Exception("Job failed")

    time.sleep(10)

# Download

data = requests.get(

    f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",

    params={"format": "json"},

    headers={"Authorization": f"Bearer {API_KEY}"}

).json()

Progress status values: startingrunningready | failed

Data retention: 30 days.

Billing: Per delivered record. Invalid input URLs that fail are still billable.

See references/web-scraper-api.md for complete reference including scraper types, output formats, delivery options, and billing details.

Browser API (Scraping Browser)

Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.

Connection:

  • Playwright/Puppeteer: wss://${AUTH}@brd.superproxy.io:9222
  • Selenium: https://${AUTH}@brd.superproxy.io:9515
const { chromium } = require("playwright-core");

const AUTH = process.env.BROWSER_AUTH;

const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);

const page = await browser.newPage();

page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes

await page.goto("https://example.com", { waitUntil: "domcontentloaded" });

const html = await page.content();

await browser.close();
from playwright.async_api import async_playwright

async with async_playwright() as p:

    browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")

    page = await browser.new_page()

    page.set_default_navigation_timeout(120000)

    await page.goto("https://example.com", wait_until="domcontentloaded")

    html = await page.content()

    await browser.close()

Custom CDP Functions

Function

Purpose

Captcha.solve

Manually trigger CAPTCHA solving

Captcha.setAutoSolve

Enable/disable auto CAPTCHA solving

Proxy.setLocation

Set precise geo location (call BEFORE goto)

Proxy.useSession

Maintain same IP across sessions

Emulation.setDevice

Apply device profile (iPhone 14, etc.)

Emulation.getSupportedDevices

List available device profiles

Unblocker.enableAdBlock

Block ads to save bandwidth

Unblocker.disableAdBlock

Re-enable ads

Input.type

Fast text input for bulk form filling

Browser.addCertificate

Install client SSL cert for session

Page.inspect

Get DevTools debug URL for live session

// CDP session pattern for custom functions

const client = await page.target().createCDPSession();

// CAPTCHA solve with timeout

const result = await client.send("Captcha.solve", { timeout: 30000 });

// Precise geo location (must be before goto)

await client.send("Proxy.setLocation", {

  latitude: 37.7749,

  longitude: -122.4194,

  distance: 10,

  strict: true

});

// Block unnecessary resources

await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });

// Device emulation

await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });

Session Rules

  • One initial navigation per session — new URL = new session
  • Idle timeout: 5 minutes
  • Max duration: 30 minutes

Geolocation

  • Country-level: append -country-us to credentials username
  • EU-wide: append -country-eu (routes through 29+ European countries)
  • Precise: use Proxy.setLocation CDP command (before navigation)

Error Codes

Code

Issue

Fix

407

Wrong port

Playwright/Puppeteer → 9222, Selenium → 9515

403

Bad auth

Check credentials format and zone type

503

Service scaling

Wait 1 minute, reconnect

Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.

See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.

Detailed References

  • references/web-unlocker.md — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
  • references/serp-api.md — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
  • references/browser-api.md — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card