SKILL.md

$2c

Core Workflow

Every browser automation follows this pattern:

Open - Navigate to URL, get @e refs for elements

Interact - Use refs to click, fill, drag, etc.

Re-snapshot - After navigation/changes, get fresh refs

Close - End session (returns video if recording)

# 1. Start session

RESULT=$(belt app run agent-browser --function open --session new --input '{

  "url": "https://example.com/login"

}')

SESSION_ID=$(echo $RESULT | jq -r '.session_id')

# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit

belt app run agent-browser --function interact --session $SESSION_ID --input '{

  "action": "fill", "ref": "@e1", "text": "user@example.com"

}'

belt app run agent-browser --function interact --session $SESSION_ID --input '{

  "action": "fill", "ref": "@e2", "text": "password123"

}'

belt app run agent-browser --function interact --session $SESSION_ID --input '{

  "action": "click", "ref": "@e3"

}'

# 3. Re-snapshot after navigation

belt app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done

belt app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

Function

Description

open

Navigate to URL, configure browser (viewport, proxy, video recording)

snapshot

Re-fetch page state with @e refs after DOM changes

interact

Perform actions using @e refs (click, fill, drag, upload, etc.)

screenshot

Take page screenshot (viewport or full page)

execute

Run JavaScript code on the page

close

Close session, returns video if recording was enabled

Interact Actions

Action

Description

Required Fields

click

Click element

ref

dblclick

Double-click element

ref

fill

Clear and type text

ref, text

type

Type text (no clear)

text

press

Press key (Enter, Tab, etc.)

text

select

Select dropdown option

ref, text

hover

Hover over element

ref

check

Check checkbox

ref

uncheck

Uncheck checkbox

ref

drag

Drag and drop

ref, target_ref

upload

Upload file(s)

ref, file_paths

scroll

Scroll page

direction (up/down/left/right), scroll_amount

back

Go back in history

wait

Wait milliseconds

wait_ms

goto

Navigate to URL

url

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"

@e2 [input type="text"] placeholder="Search"

@e3 [button] "Submit"

@e4 [select] "Choose option"

@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

Clicking links/buttons that navigate

Form submissions

Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

# Start with recording enabled (optionally show cursor indicator)

SESSION=$(belt app run agent-browser --function open --session new --input '{

  "url": "https://example.com",

  "record_video": true,

  "show_cursor": true

}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file

belt app run agent-browser --function close --session $SESSION --input '{}'

# Returns: {"success": true, "video": <File>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

belt app run agent-browser --function open --session new --input '{

  "url": "https://example.com",

  "show_cursor": true,

  "record_video": true

}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

belt app run agent-browser --function open --session new --input '{

  "url": "https://example.com",

  "proxy_url": "http://proxy.example.com:8080",

  "proxy_username": "user",

  "proxy_password": "pass"

}'

File Upload

Upload files to file inputs:

belt app run agent-browser --function interact --session $SESSION --input '{

  "action": "upload",

  "ref": "@e5",

  "file_paths": ["/path/to/file.pdf"]

}'

Drag and Drop

Drag elements to targets:

belt app run agent-browser --function interact --session $SESSION --input '{

  "action": "drag",

  "ref": "@e1",

  "target_ref": "@e2"

}'

JavaScript Execution

Run custom JavaScript:

belt app run agent-browser --function execute --session $SESSION --input '{

  "code": "document.querySelectorAll(\"h2\").length"

}'

# Returns: {"result": "5", "screenshot": <File>}

Deep-Dive Documentation

Reference

Description

references/commands.md

Full function reference with all options

references/snapshot-refs.md

Ref lifecycle, invalidation rules, troubleshooting

references/session-management.md

Session persistence, parallel sessions

references/authentication.md

references/video-recording.md

Recording workflows for debugging

references/proxy-support.md

Proxy configuration, geo-testing

Ready-to-Use Templates

Template

Description

templates/form-automation.sh

Form filling with validation

templates/authenticated-session.sh

templates/capture-workflow.sh

Content extraction with screenshots

Examples

Form Submission

SESSION=$(belt app run agent-browser --function open --session new --input '{

  "url": "https://example.com/contact"

}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'

belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'

belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'

belt app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

belt app run agent-browser --function snapshot --session $SESSION --input '{}'

belt app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

SESSION=$(belt app run agent-browser --function open --session new --input '{

  "url": "https://google.com"

}' | jq -r '.session_id')

belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'

belt app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'

belt app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

belt app run agent-browser --function snapshot --session $SESSION --input '{}'

belt app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

SESSION=$(belt app run agent-browser --function open --session new --input '{

  "url": "https://example.com",

  "record_video": true

}' | jq -r '.session_id')

# Take full page screenshot

belt app run agent-browser --function screenshot --session $SESSION --input '{

  "full_page": true

}'

# Close and get video

RESULT=$(belt app run agent-browser --function close --session $SESSION --input '{}')

echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

Start with --session new on first call

Use returned session_id for subsequent calls

Close session when done

Related Skills

# Web search (for research + browse)

npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)

npx skills add inference-sh/skills@llm-models

Documentation

inference.sh Sessions - Session management

Multi-function Apps - How functions work

agent-browser

SKILL.md

Core Workflow

Functions

Interact Actions

Element Refs

Features

Video Recording

Cursor Indicator

Proxy Support

File Upload

Drag and Drop

JavaScript Execution

Deep-Dive Documentation

Ready-to-Use Templates

Examples

Form Submission

Search and Extract

Screenshot with Video

Sessions

Related Skills

Documentation

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers