SKILL.md
$2c
Core Workflow
Every browser automation follows this pattern:
- Open - Navigate to URL, get
@erefs for elements
- Interact - Use refs to click, fill, drag, etc.
- Re-snapshot - After navigation/changes, get fresh refs
- Close - End session (returns video if recording)
# 1. Start session
RESULT=$(belt app run agent-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
# 2. Fill and submit
belt app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
belt app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
belt app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
# 3. Re-snapshot after navigation
belt app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
# 4. Close when done
belt app run agent-browser --function close --session $SESSION_ID --input '{}'
Functions
Function
Description
open
Navigate to URL, configure browser (viewport, proxy, video recording)
snapshot
Re-fetch page state with @e refs after DOM changes
interact
Perform actions using @e refs (click, fill, drag, upload, etc.)
screenshot
Take page screenshot (viewport or full page)
execute
Run JavaScript code on the page
close
Close session, returns video if recording was enabled
Interact Actions
Action
Description
Required Fields
click
Click element
ref
dblclick
Double-click element
ref
fill
Clear and type text
ref, text
type
Type text (no clear)
text
press
Press key (Enter, Tab, etc.)
text
select
Select dropdown option
ref, text
hover
Hover over element
ref
check
Check checkbox
ref
uncheck
Uncheck checkbox
ref
drag
Drag and drop
ref, target_ref
upload
Upload file(s)
ref, file_paths
scroll
Scroll page
direction (up/down/left/right), scroll_amount
back
Go back in history
-
wait
Wait milliseconds
wait_ms
goto
Navigate to URL
url
Element Refs
Elements are returned with @e refs:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
Important: Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading
Features
Video Recording
Record browser sessions for debugging or documentation:
# Start with recording enabled (optionally show cursor indicator)
SESSION=$(belt app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true,
"show_cursor": true
}' | jq -r '.session_id')
# ... perform actions ...
# Close to get the video file
belt app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}
Cursor Indicator
Show a visible cursor in screenshots and video (useful for demos):
belt app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"show_cursor": true,
"record_video": true
}'
The cursor appears as a red dot that follows mouse movements and shows click feedback.
Proxy Support
Route traffic through a proxy server:
belt app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"proxy_url": "http://proxy.example.com:8080",
"proxy_username": "user",
"proxy_password": "pass"
}'
File Upload
Upload files to file inputs:
belt app run agent-browser --function interact --session $SESSION --input '{
"action": "upload",
"ref": "@e5",
"file_paths": ["/path/to/file.pdf"]
}'
Drag and Drop
Drag elements to targets:
belt app run agent-browser --function interact --session $SESSION --input '{
"action": "drag",
"ref": "@e1",
"target_ref": "@e2"
}'
JavaScript Execution
Run custom JavaScript:
belt app run agent-browser --function execute --session $SESSION --input '{
"code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}
Deep-Dive Documentation
Reference
Description
Full function reference with all options
Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md
Session persistence, parallel sessions
Login flows, OAuth, 2FA handling
Recording workflows for debugging
Proxy configuration, geo-testing
Ready-to-Use Templates
Template
Description
Form filling with validation
templates/authenticated-session.sh
Login once, reuse session
Content extraction with screenshots
Examples
Form Submission
SESSION=$(belt app run agent-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')
# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"
belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
belt app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
belt app run agent-browser --function snapshot --session $SESSION --input '{}'
belt app run agent-browser --function close --session $SESSION --input '{}'
Search and Extract
SESSION=$(belt app run agent-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
belt app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
belt app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
belt app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
belt app run agent-browser --function snapshot --session $SESSION --input '{}'
belt app run agent-browser --function close --session $SESSION --input '{}'
Screenshot with Video
SESSION=$(belt app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true
}' | jq -r '.session_id')
# Take full page screenshot
belt app run agent-browser --function screenshot --session $SESSION --input '{
"full_page": true
}'
# Close and get video
RESULT=$(belt app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
Sessions
Browser state persists within a session. Always:
- Start with
--session newon first call
- Use returned
session_idfor subsequent calls
- Close session when done
Related Skills
# Web search (for research + browse)
npx skills add inference-sh/skills@web-search
# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models
Documentation
- inference.sh Sessions - Session management
- Multi-function Apps - How functions work