SKILL.md
Browser Automation
Automate browser interactions using the browse CLI with Claude.
Setup check
Before running any browser commands, verify the CLI is available:
which browse || npm install -g browse
Environment Selection (Local vs Remote)
The CLI supports explicit per-command environment flags. If you do nothing, the next session defaults to Browserbase when BROWSERBASE_API_KEY is set and to local otherwise.
Local mode
browse open <url> --localstarts a clean isolated local browser
browse open <url> --auto-connectattaches to an already-running debuggable Chrome; use--localwhen no debuggable Chrome is available
browse open <url> --cdp <port|url>attaches to a specific CDP target
- Best for: development, localhost, trusted sites, and reproducible runs
Remote mode (Browserbase)
browse open <url> --remotestarts a Browserbase session
- Without a local flag, Browserbase is also the default when
BROWSERBASE_API_KEYis set
- Provides: Browserbase Identity, Verified browsers, automatic CAPTCHA solving, residential proxies, session persistence
- Use remote mode when: the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
- Get credentials at https://browserbase.com/settings
When to choose which
- Repeatable local testing / clean state:
browse open <url> --local
- Reuse your local login/cookies:
browse open <url> --auto-connect
- Simple browsing (docs, wikis, public APIs): local mode is fine
- Protected sites (login walls, CAPTCHAs, anti-scraping): use remote mode
- If local mode fails with bot detection or access denied: switch to remote mode
Commands
Most driver commands work across local, remote, and CDP sessions after the daemon starts.
Navigation
browse open <url> # Go to URL
browse open <url> --local # Go to URL in a clean local browser
browse open <url> --remote # Go to URL in a Browserbase session
browse reload # Reload current page
browse back # Go back in history
browse forward # Go forward in history
Page state (prefer snapshot over screenshot)
browse snapshot # Get accessibility tree with element refs (fast, structured)
browse screenshot --path <path> # Take visual screenshot (slow, uses vision tokens)
browse get url # Get current URL
browse get title # Get page title
browse get text <selector> # Get text content (use "body" for all text)
browse get html <selector> # Get HTML content of element
browse get value <selector> # Get form field value
Use browse snapshot as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use browse screenshot when you need visual context (layout, images, debugging).
Interaction
browse click <ref> # Click element by ref from snapshot (e.g., @0-5)
browse type <text> # Type text into focused element
browse fill <selector> <value> # Fill input; add --press-enter if Enter is needed
browse select <selector> <values...> # Select dropdown option(s)
browse press <key> # Press key (Enter, Tab, Escape, Cmd+A, etc.)
browse mouse drag <fromX> <fromY> <toX> <toY> # Drag from one point to another
browse mouse scroll <x> <y> <deltaX> <deltaY> # Scroll at coordinates
browse highlight <selector> # Highlight element on page
browse is visible <selector> # Check if element is visible
browse is checked <selector> # Check if element is checked
browse wait <type> [arg] # Wait for: load, selector, timeout
Session management
browse stop # Stop the browser daemon
browse status # Check daemon status and resolved mode
browse tab list # List all open tabs
browse tab switch <index-or-target-id> # Switch to tab by index or target ID
browse tab close [index-or-target-id] # Close tab
Typical workflow
If the environment matters, put --local, --remote, --auto-connect, or --cdp <port|url> on the first browser command.
browse open <url> --localorbrowse open <url> --remote— navigate to the page
browse snapshot— read the accessibility tree to understand page structure and get element refs
browse click <ref>/browse type <text>/browse fill <selector> <value>— interact using refs from snapshot
browse snapshot— confirm the action worked
- Repeat 3-4 as needed
browse stop— close the browser when done
Quick Example
browse open https://example.com
browse snapshot # see page structure + element refs
browse click @0-5 # click element with ref 0-5
browse get title
browse stop
Mode Comparison
Feature
Local
Browserbase
Speed
Faster
Slightly slower
Setup
Chrome required
API key required
Reuse existing local cookies
With browse open <url> --auto-connect
N/A
Verified browser
No
Yes (Browserbase Verified browser via Identity)
CAPTCHA solving
No
Yes (automatic reCAPTCHA/hCaptcha)
Residential proxies
No
Yes (201 countries, geo-targeting)
Session persistence
No
Yes (cookies/auth persist via contexts)
Best for
Development/simple pages
Protected sites, Browserbase Identity + Verified access, production scraping
Best Practices
- Choose the local strategy deliberately: use
browse open <url> --localfor clean state,browse open <url> --auto-connectfor existing local credentials, andbrowse open <url> --remotefor protected sites
- **Always
browse openfirst** before interacting
- **Use
browse snapshot** to check page state — it's fast and gives you element refs
- Only screenshot when visual context is needed (layout checks, images, debugging)
- Use refs from snapshot to click/interact — e.g.,
browse click @0-5
- **
browse stop** when done to clean up the browser session and clear the env override
Troubleshooting
- "No active page": Run
browse stop, then checkbrowse status. If it still says running, kill the zombie daemon withpkill -f "browse.*daemon", then retrybrowse open
- Chrome not found: Install Chrome, use
browse open <url> --auto-connectif you already have a debuggable Chrome running, or switch tobrowse open <url> --remote
- Action fails: Run
browse snapshotto see available elements and their refs
- Browserbase fails: Verify API key is set
Switching to Remote Mode
Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.
Don't switch for simple sites (docs, wikis, public APIs, localhost).
browse open <url> --local # clean isolated local browser
browse open <url> --auto-connect # attach to existing debuggable Chrome
browse open <url> --remote # Browserbase session
Mode flags are applied when a session starts. After browse stop, the next start falls back to env-var-based auto detection. Use browse status to inspect the resolved mode and target while the daemon is running.
For detailed examples, see EXAMPLES.md.
For API reference, see REFERENCE.md.