SKILL.md
$28
For local development against this worktree:
./config/scripts/orca-dev status --json
Core Workflow
Use a snapshot-act-snapshot loop:
- Discover apps:
orca computer list-apps --json
- Get a fresh state for the target app:
orca computer get-app-state --app com.spotify.client --json
-
Choose an element from that state.
-
Perform one action:
orca computer click --app com.spotify.client --element-index 42 --json
- Inspect the action result before deciding whether to act again. Actions return a fresh state:
orca computer click --app com.spotify.client --element-index 42 --json
Element indexes are scoped to the current app state. They can go stale after navigation, focus changes, scrolling, window changes, or app re-rendering. Never carry indexes across unrelated steps without refreshing state.
App Selectors
Prefer bundle IDs returned by list-apps:
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer get-app-state --app com.spotify.client --json
Names are acceptable when unambiguous:
orca computer get-app-state --app Spotify --json
Use pid:<number> only when bundle ID or name matching is ambiguous:
orca computer get-app-state --app pid:12345 --json
Commands
orca computer permissions --json
orca computer capabilities --json
orca computer list-apps --json
orca computer list-windows --app <app> --json
orca computer get-app-state --app <app> --json
orca computer click --app <app> --element-index <index> --json
orca computer perform-secondary-action --app <app> --element-index <index> --action <name> --json
orca computer set-value --app <app> --element-index <index> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> (--element-index <index> | --x <x> --y <y>) --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json
Use --no-screenshot only when pixels are not needed. Screenshots are often the only useful signal for Electron, WebView, or canvas-heavy apps with shallow accessibility trees.
Coordinates are window-local. Use coordinates from the latest screenshot/state for the same target window.
Use --text-stdin or --value-stdin for sensitive text so payloads do not land in shell history.
On Linux and Windows, action payloads still pass through a short-lived local operation file.
printf '%s' "$TEXT" | orca computer set-value --app <app> --element-index <index> --value-stdin --json
Choosing Actions
Prefer semantic actions over raw keyboard input:
- Use
set-valuefor known editable fields.
- Use
clickfor buttons, tabs, menu items, checkboxes, and other direct controls.
- Use
perform-secondary-actiononly when the state lists a concrete action name and the user intent matches it.
- Use
type-textafter focusing a field and confirming the app has a focused text receiver.
- Use
press-keyfor navigation keys, Return, Escape, shortcuts, or submitting a field after the state confirms the right target is active.
Why: keyboard input is process-targeted on macOS, but it still depends on the target app having a valid focused receiver. set-value targets the accessibility element directly and is more reliable when supported.
Foreground And Background
Some actions work while the app is in the background. Treat this as app-dependent:
set-valuecan work in the background when the app exposes a writable accessibility value.
clickand accessibility actions may work in the background for some native controls.
type-textandpress-keyare targeted to the app process on macOS, but the app may ignore them unless it owns focus or already has an active text receiver.
If an action returns success but the UI did not change, do not repeat the same action blindly. Run get-app-state again, inspect the screenshot/tree, then switch to a more semantic action or bring/focus the target if needed.
Screenshots
get-app-state returns an accessibility tree and, by default, a screenshot. Use both:
- Trust the tree for element indexes, names, roles, values, and actions.
- Trust the screenshot for visual confirmation, especially in Electron and WebView apps.
- If the tree is shallow, use screenshot evidence before deciding whether any action is safe.
- If screenshot capture fails or returns no image, the app may be hidden, minimized, off-screen, or have no visible window.
Use restore only when appropriate for the task:
orca computer get-app-state --app <app> --restore-window --json
App-Specific Notes
Browsers
For Edge, Chrome, and similar browsers, prefer setting the address/search field directly:
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer set-value --app com.microsoft.edgemac --element-index <addressBarIndex> --value "test123" --json
orca computer press-key --app com.microsoft.edgemac --key Return --json
orca computer get-app-state --app com.microsoft.edgemac --json
Do not assume raw typing went to the address bar. Confirm the field or page changed after pressing Return.
Spotify
Spotify state can update asynchronously after playback or network-backed search. After a playback click, run get-app-state before clicking again.
For search, prefer set-value on the search combobox, usually named like What do you want to play?. type-text may only work when Spotify owns focus and that field is already focused.
Slack
Slack may expose a shallow accessibility tree while the screenshot contains the useful information. Reading visible Slack UI is acceptable when requested, but do not send messages or trigger workflows unless explicitly asked.
Error Handling
app_not_found: runlist-appsand retry with the bundle ID.
element_not_found: the index is stale; runget-app-stateagain.
action_failed: inspect the element role/actions and try a more semantic action.
- Empty tree or no screenshot: the app may have no visible window, be minimized, or be blocked by permissions.
- Permission errors: the user needs to grant Accessibility or Screen Recording to
Orca Computer Use. Runorca computer permissions --json, use the setup UI, then retryorca computer get-app-state --app <bundle> --json.
Safety Checks
Before acting, classify the action:
- Safe: read state, list apps, inspect screenshot, focus a search box, scroll, open a harmless tab.
- Needs care: typing into a focused field, pressing Return, clicking a primary button.
- Requires explicit user permission: sending messages, posting, purchasing, deleting, submitting forms, changing settings, signing in, or exposing secrets.
When uncertain, stop after get-app-state and report what is visible instead of acting.