gemini-computer-use

Gemini 2.5 Computer Use browser automation with Playwright-based agent loops and safety confirmations. Implements a screenshot-to-action cycle: capture screen, send to Gemini, parse function calls, execute in Playwright, return results until task completion or turn limit Supports multiple browser options: bundled Chromium (default), Chrome/Edge channels via COMPUTER_USE_BROWSER_CHANNEL , or custom executables like Brave Includes safety confirmation workflow that prompts users before executing risky UI actions flagged by the model Provides action exclusion via --exclude flag and recommends sandboxed profiles or containers for safe operation

INSTALLATION
npx skills add https://github.com/am-will/codex-skills --skill gemini-computer-use
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Gemini Computer Use

Quick start

-

Source the env file and set your API key:

cp env.example env.sh

$EDITOR env.sh

source env.sh

-

Create a virtual environment and install dependencies:

python -m venv .venv

source .venv/bin/activate

pip install google-genai playwright

playwright install chromium

-

Run the agent script with a prompt:

python scripts/computer_use_agent.py \

  --prompt "Find the latest blog post title on example.com" \

  --start-url "https://example.com" \

  --turn-limit 6

Browser selection

  • Default: Playwright's bundled Chromium (no env vars required).
  • Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
  • Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Core workflow (agent loop)

  • Capture a screenshot and send the user goal + screenshot to the model.
  • Parse function_call actions in the response.
  • Execute each action in Playwright.
  • If a safety_decision is require_confirmation, prompt the user before executing.
  • Send function_response objects containing the latest URL + screenshot.
  • Repeat until the model returns only text (no actions) or you hit the turn limit.

Operational guidance

  • Run in a sandboxed browser profile or container.
  • Use --exclude to block risky actions you do not want the model to take.
  • Keep the viewport at 1440x900 unless you have a reason to change it.

Resources

  • Script: scripts/computer_use_agent.py
  • Reference notes: references/google-computer-use.md
  • Env template: env.example
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card