agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking…

INSTALLATION
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill agent-browser
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$29

agent-browser fill @e1 "user@example.com"

agent-browser fill @e2 "password123"

agent-browser click @e3

agent-browser wait --load networkidle

agent-browser snapshot -i # Check result

## Command Chaining

Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

Chain open + wait + snapshot in one call

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

Chain multiple interactions

agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

Navigate and capture

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png


**When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).

## Handling Authentication

When automating a site that requires login, choose the approach that fits:

**Option 1: Import auth from the user's browser (fastest for one-off tasks)**

Connect to the user's running Chrome (they're already logged in)

agent-browser --auto-connect state save ./auth.json

Use that auth state

agent-browser --state ./auth.json open https://app.example.com/dashboard


State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.

**Option 2: Persistent profile (simplest for recurring tasks)**

First run: login manually or via automation

agent-browser --profile ~/.myapp open https://app.example.com/login

... fill credentials, submit ...

All future runs: already authenticated

agent-browser --profile ~/.myapp open https://app.example.com/dashboard


**Option 3: Session name (auto-save/restore cookies + localStorage)**

agent-browser --session-name myapp open https://app.example.com/login

... login flow ...

agent-browser close # State auto-saved

Next time: state auto-restored

agent-browser --session-name myapp open https://app.example.com/dashboard


**Option 4: Auth vault (credentials stored encrypted, login by name)**

echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin

agent-browser auth login myapp


`auth login` navigates with `load` and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.

**Option 5: State file (manual save/load)**

After logging in:

agent-browser state save ./auth.json

In a future session:

agent-browser state load ./auth.json

agent-browser open https://app.example.com/dashboard


See `references/authentication.md` for OAuth, 2FA, cookie-based auth, and token refresh patterns.

## Essential Commands

Navigation

agent-browser open <url> # Navigate (aliases: goto, navigate)

agent-browser close # Close browser

Snapshot

agent-browser snapshot -i # Interactive elements with refs (recommended)

agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)

agent-browser snapshot -s "#selector" # Scope to CSS selector

Interaction (use @refs from snapshot)

agent-browser click @e1 # Click element

agent-browser click @e1 --new-tab # Click and open in new tab

agent-browser fill @e2 "text" # Clear and type text

agent-browser type @e2 "text" # Type without clearing

agent-browser select @e1 "option" # Select dropdown option

agent-browser check @e1 # Check checkbox

agent-browser press Enter # Press key

agent-browser keyboard type "text" # Type at current focus (no selector)

agent-browser keyboard inserttext "text" # Insert without key events

agent-browser scroll down 500 # Scroll page

agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container

Get information

agent-browser get text @e1 # Get element text

agent-browser get url # Get current URL

agent-browser get title # Get page title

agent-browser get cdp-url # Get CDP WebSocket URL

Wait

agent-browser wait @e1 # Wait for element

agent-browser wait --load networkidle # Wait for network idle

agent-browser wait --url "**/page" # Wait for URL pattern

agent-browser wait 2000 # Wait milliseconds

agent-browser wait --text "Welcome" # Wait for text to appear (substring match)

agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear

agent-browser wait "#spinner" --state hidden # Wait for element to disappear

Downloads

agent-browser download @e1 ./file.pdf # Click element to trigger download

agent-browser wait --download ./output.zip # Wait for any download to complete

agent-browser --download-path ./downloads open <url> # Set default download directory

Network

agent-browser network requests # Inspect tracked requests

agent-browser network route "*/api/" --abort # Block matching requests

agent-browser network har start # Start HAR recording

agent-browser network har stop ./capture.har # Stop and save HAR file

Viewport &#x26; Device Emulation

agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)

agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)

agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)

Capture

agent-browser screenshot # Screenshot to temp dir

agent-browser screenshot --full # Full page screenshot

agent-browser screenshot --annotate # Annotated screenshot with numbered element labels

agent-browser screenshot --screenshot-dir ./shots # Save to custom directory

agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80

agent-browser pdf output.pdf # Save as PDF

Clipboard

agent-browser clipboard read # Read text from clipboard

agent-browser clipboard write "Hello, World!" # Write text to clipboard

agent-browser clipboard copy # Copy current selection

agent-browser clipboard paste # Paste from clipboard

Diff (compare page states)

agent-browser diff snapshot # Compare current vs last snapshot

agent-browser diff snapshot --baseline before.txt # Compare current vs saved file

agent-browser diff screenshot --baseline before.png # Visual pixel diff

agent-browser diff url <url1> <url2> # Compare two pages

agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy

agent-browser diff url <url1> <url2> --selector "#main" # Scope to element


## Batch Execution

Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.

echo '[

["open", "https://example.com"],

["snapshot", "-i"],

["click", "@e1"],

["screenshot", "result.png"]

]' | agent-browser batch --json

Stop on first error

agent-browser batch --bail < commands.json


Use `batch` when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or `&#x26;&#x26;` chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).

## Common Patterns

### Form Submission

agent-browser open https://example.com/signup

agent-browser snapshot -i

agent-browser fill @e1 "Jane Doe"

agent-browser fill @e2 "jane@example.com"

agent-browser select @e3 "California"

agent-browser check @e4

agent-browser click @e5

agent-browser wait --load networkidle


### Authentication with Auth Vault (Recommended)

Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY)

Recommended: pipe password via stdin to avoid shell history exposure

echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin

Login using saved profile (LLM never sees password)

agent-browser auth login github

List/show/delete profiles

agent-browser auth list

agent-browser auth show github

agent-browser auth delete github


`auth login` waits for username/password/submit selectors before interacting, with a timeout tied to the default action timeout.

### Authentication with State Persistence

Login once and save state

agent-browser open https://app.example.com/login

agent-browser snapshot -i

agent-browser fill @e1 "$USERNAME"

agent-browser fill @e2 "$PASSWORD"

agent-browser click @e3

agent-browser wait --url "**/dashboard"

agent-browser state save auth.json

Reuse in future sessions

agent-browser state load auth.json

agent-browser open https://app.example.com/dashboard


### Session Persistence

Auto-save/restore cookies and localStorage across browser restarts

agent-browser --session-name myapp open https://app.example.com/login

... login flow ...

agent-browser close # State auto-saved to ~/.agent-browser/sessions/

Next time, state is auto-loaded

agent-browser --session-name myapp open https://app.example.com/dashboard

Encrypt state at rest

export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)

agent-browser --session-name secure open https://app.example.com

Manage saved states

agent-browser state list

agent-browser state show myapp-default.json

agent-browser state clear myapp

agent-browser state clean --older-than 7


### Working with Iframes

Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.

agent-browser open https://example.com/checkout

agent-browser snapshot -i

@e1 [heading] "Checkout"

@e2 [Iframe] "payment-frame"

@e3 [input] "Card number"

@e4 [input] "Expiry"

@e5 [button] "Pay"

Interact directly — no frame switch needed

agent-browser fill @e3 "4111111111111111"

agent-browser fill @e4 "12/28"

agent-browser click @e5

To scope a snapshot to one iframe:

agent-browser frame @e2

agent-browser snapshot -i # Only iframe content

agent-browser frame main # Return to main frame


### Data Extraction

agent-browser open https://example.com/products

agent-browser snapshot -i

agent-browser get text @e5 # Get specific element text

agent-browser get text body > page.txt # Get all page text

JSON output for parsing

agent-browser snapshot -i --json

agent-browser get text @e1 --json


### Parallel Sessions

agent-browser --session site1 open https://site-a.com

agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i

agent-browser --session site2 snapshot -i

agent-browser session list


### Connect to Existing Chrome

Auto-discover running Chrome with remote debugging enabled

agent-browser --auto-connect open https://example.com

agent-browser --auto-connect snapshot

Or with explicit CDP port

agent-browser --cdp 9222 snapshot


Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.

### Color Scheme (Dark Mode)

Persistent dark mode via flag (applies to all pages and new tabs)

agent-browser --color-scheme dark open https://example.com

Or via environment variable

AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com

Or set during session (persists for subsequent commands)

agent-browser set media dark


### Viewport &#x26; Responsive Testing

Set a custom viewport size (default is 1280x720)

agent-browser set viewport 1920 1080

agent-browser screenshot desktop.png

Test mobile-width layout

agent-browser set viewport 375 812

agent-browser screenshot mobile.png

Retina/HiDPI: same CSS layout at 2x pixel density

Screenshots stay at logical viewport size, but content renders at higher DPI

agent-browser set viewport 1920 1080 2

agent-browser screenshot retina.png

Device emulation (sets viewport + user agent in one step)

agent-browser set device "iPhone 14"

agent-browser screenshot device.png


The `scale` parameter (3rd argument) sets `window.devicePixelRatio` without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.

### Visual Browser (Debugging)

agent-browser --headed open https://example.com

agent-browser highlight @e1 # Highlight element

agent-browser inspect # Open Chrome DevTools for the active page

agent-browser record start demo.webm # Record session

agent-browser profiler start # Start Chrome DevTools profiling

agent-browser profiler stop trace.json # Stop and save profile (path optional)


Use `AGENT_BROWSER_HEADED=1` to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.

### Local Files (PDFs, HTML)

Open local files with file:// URLs

agent-browser --allow-file-access open file:///path/to/document.pdf

agent-browser --allow-file-access open file:///path/to/page.html

agent-browser screenshot output.png


### iOS Simulator (Mobile Safari)

List available iOS simulators

agent-browser device list

Launch Safari on a specific device

agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

Same workflow as desktop - snapshot, interact, re-snapshot

agent-browser -p ios snapshot -i

agent-browser -p ios tap @e1 # Tap (alias for click)

agent-browser -p ios fill @e2 "text"

agent-browser -p ios swipe up # Mobile-specific gesture

Take screenshot

agent-browser -p ios screenshot mobile.png

Close session (shuts down simulator)

agent-browser -p ios close


**Requirements:** macOS with Xcode, Appium (`npm install -g appium &#x26;&#x26; appium driver install xcuitest`)

**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.

## Security

All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.

### Content Boundaries (Recommended for AI Agents)

Enable `--content-boundaries` to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:

export AGENT_BROWSER_CONTENT_BOUNDARIES=1

agent-browser snapshot

Output:

--- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---

[accessibility tree]

--- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---


### Domain Allowlist

Restrict navigation to trusted domains. Wildcards like `*.example.com` also match the bare domain `example.com`. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:

export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"

agent-browser open https://example.com # OK

agent-browser open https://malicious.com # Blocked


### Action Policy

Use a policy file to gate destructive actions:

export AGENT_BROWSER_ACTION_POLICY=./policy.json


Example `policy.json`:

{ "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }


Auth vault operations (`auth login`, etc.) bypass action policy but domain allowlist still applies.

### Output Limits

Prevent context flooding from large pages:

export AGENT_BROWSER_MAX_OUTPUT=50000


## Diffing (Verifying Changes)

Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.

Typical workflow: snapshot -> action -> diff

agent-browser snapshot -i # Take baseline snapshot

agent-browser click @e2 # Perform action

agent-browser diff snapshot # See what changed (auto-compares to last snapshot)


For visual regression testing or monitoring:

Save a baseline screenshot, then compare later

agent-browser screenshot baseline.png

... time passes or changes are made ...

agent-browser diff screenshot --baseline baseline.png

Compare staging vs production

agent-browser diff url https://staging.example.com https://prod.example.com --screenshot


`diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.

## Timeouts and Slow Pages

The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:

Wait for network activity to settle (best for slow pages)

agent-browser wait --load networkidle

Wait for a specific element to appear

agent-browser wait "#content"

agent-browser wait @e1

Wait for a specific URL pattern (useful after redirects)

agent-browser wait --url "**/dashboard"

Wait for a JavaScript condition

agent-browser wait --fn "document.readyState === 'complete'"

Wait a fixed duration (milliseconds) as a last resort

agent-browser wait 5000


When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.

## Session Management and Cleanup

When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:

Each agent gets its own isolated session

agent-browser --session agent1 open site-a.com

agent-browser --session agent2 open site-b.com

Check active sessions

agent-browser session list


Always close your browser session when done to avoid leaked processes:

agent-browser close # Close default session

agent-browser --session agent1 close # Close specific session


If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.

To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):

AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com


## Ref Lifecycle (Important)

Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:

- Clicking links or buttons that navigate

- Form submissions

- Dynamic content loading (dropdowns, modals)

agent-browser click @e5 # Navigates to new page

agent-browser snapshot -i # MUST re-snapshot

agent-browser click @e1 # Use new refs


## Annotated Screenshots (Vision Mode)

Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.

agent-browser screenshot --annotate

Output includes the image path and a legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

agent-browser click @e2 # Click using ref from annotated screenshot


Use annotated screenshots when:

- The page has unlabeled icon buttons or visual-only elements

- You need to verify visual layout or styling

- Canvas or chart elements are present (invisible to text snapshots)

- You need spatial reasoning about element positions

## Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

agent-browser find text "Sign In" click

agent-browser find label "Email" fill "user@test.com"

agent-browser find role button click --name "Submit"

agent-browser find placeholder "Search" type "query"

agent-browser find testid "submit-btn" click


## JavaScript Evaluation (eval)

Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.

Simple expressions work with regular quoting

agent-browser eval 'document.title'

agent-browser eval 'document.querySelectorAll("img").length'

Complex JS: use --stdin with heredoc (RECOMMENDED)

agent-browser eval --stdin <<'EVALEOF'

JSON.stringify(

Array.from(document.querySelectorAll("img"))

.filter(i => !i.alt)

.map(i => ({ src: i.src.split("/").pop(), width: i.width }))

)

EVALEOF

Alternative: base64 encoding (avoids all shell escaping issues)

agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"


**Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.

**Rules of thumb:**

- Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine

- Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`

- Programmatic/generated scripts -> use `eval -b` with base64

## Configuration File

Create `agent-browser.json` in the project root for persistent settings:

{

"headed": true,

"proxy": "http://localhost:8080",

"profile": "./browser-data"

}


Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.

## Deep-Dive Documentation

Reference
When to Use

`references/commands.md`
Full command reference with all options

`references/snapshot-refs.md`
Ref lifecycle, invalidation rules, troubleshooting

`references/session-management.md`
Parallel sessions, state persistence, concurrent scraping

`references/authentication.md`
Login flows, OAuth, 2FA handling, state reuse

`references/video-recording.md`
Recording workflows for debugging and documentation

`references/profiling.md`
Chrome DevTools profiling for performance analysis

`references/proxy-support.md`
Proxy configuration, geo-testing, rotating proxies

## Browser Engine Selection

Use `--engine` to choose a local browser engine. The default is `chrome`.

Use Lightpanda (fast headless browser, requires separate install)

agent-browser --engine lightpanda open example.com

Via environment variable

export AGENT_BROWSER_ENGINE=lightpanda

agent-browser open example.com

With custom binary path

agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com


Supported engines:

- `chrome` (default) -- Chrome/Chromium via CDP

- `lightpanda` -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)

Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-file-access`. Install Lightpanda from [https://lightpanda.io/docs/open-source/installation](https://lightpanda.io/docs/open-source/installation).

## Ready-to-Use Templates

Template
Description

`templates/form-automation.sh`
Form filling with validation

`templates/authenticated-session.sh`
Login once, reuse state

`templates/capture-workflow.sh`
Content extraction with screenshots

./templates/form-automation.sh https://example.com/form

./templates/authenticated-session.sh https://app.example.com/login

./templates/capture-workflow.sh https://example.com ./output

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card