gemma-gem-browser-ai

Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.

INSTALLATION
npx skills add https://github.com/aradotso/trending-skills --skill gemma-gem-browser-ai
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

  • Offscreen document (offscreen/): Loads the ONNX model via @huggingface/transformers, runs the agent loop, streams tokens.
  • Service worker (background/): Routes messages, handles take_screenshot and run_javascript.
  • Content script (content/): Injects shadow DOM chat UI, executes DOM tools.
  • **agent/**: Zero-dependency module defining ModelBackend and ToolExecutor interfaces — extractable as a standalone library.

Install & Build

# Prerequisites: Node.js 18+, pnpm

pnpm install

# Development build (logging active, source maps)

pnpm build

# Production build (errors only, minified)

pnpm build:prod

Load the extension:

  • Open chrome://extensions
  • Enable Developer mode
  • Click Load unpacked → select .output/chrome-mv3-dev/

Model download happens automatically on first chat open:

  • onnx-community/gemma-4-E2B-it-ONNX — ~500 MB (default)
  • onnx-community/gemma-4-E4B-it-ONNX — ~1.5 GB

Models are cached in the browser's cache storage after the first run.

Key Interfaces ( agent/ )

ModelBackend

// agent/types.ts

export interface ModelBackend {

  generate(

    messages: ChatMessage[],

    tools: ToolDefinition[],

    options: GenerateOptions

  ): AsyncGenerator<StreamChunk>;

}

export interface ToolDefinition {

  name: string;

  description: string;

  parameters: JSONSchema;

}

export interface GenerateOptions {

  maxNewTokens?: number;

  thinking?: boolean;

}

ToolExecutor

// agent/types.ts

export interface ToolExecutor {

  execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;

}

Agent Loop

// agent/loop.ts — simplified illustration

export async function* runAgentLoop(

  userMessage: string,

  history: ChatMessage[],

  model: ModelBackend,

  tools: ToolExecutor,

  toolDefs: ToolDefinition[],

  maxIterations: number

): AsyncGenerator<AgentEvent> {

  const messages = [...history, { role: "user", content: userMessage }];

  for (let i = 0; i < maxIterations; i++) {

    for await (const chunk of model.generate(messages, toolDefs, {})) {

      if (chunk.type === "token") yield { type: "token", token: chunk.token };

      if (chunk.type === "tool_call") {

        yield { type: "tool_start", name: chunk.name };

        const result = await tools.execute(chunk.name, chunk.args);

        yield { type: "tool_result", name: chunk.name, result };

        messages.push({ role: "tool", name: chunk.name, content: String(result) });

      }

      if (chunk.type === "done") return;

    }

  }

}

Built-in Tools

Tool

Location

Description

read_page_content

Content script

Read page text/HTML or a CSS selector

take_screenshot

Service worker

Capture visible tab as PNG

click_element

Content script

Click by CSS selector

type_text

Content script

Type into input by CSS selector

scroll_page

Content script

Scroll by pixel amount

run_javascript

Service worker

Execute JS in page context

Adding a New Tool

Tools live in two places: the definition (in the offscreen agent) and the executor (in content script or service worker).

Step 1 — Define the tool schema

// offscreen/tools/definitions.ts

export const MY_TOOL_DEFINITION: ToolDefinition = {

  name: "get_page_title",

  description: "Returns the document title of the current page.",

  parameters: {

    type: "object",

    properties: {},

    required: [],

  },

};

Step 2 — Register in the tool list

// offscreen/tools/index.ts

import { MY_TOOL_DEFINITION } from "./definitions";

export const ALL_TOOLS: ToolDefinition[] = [

  // ...existing tools

  MY_TOOL_DEFINITION,

];

Step 3 — Implement execution in the content script

// content/tools/executor.ts

export async function executeContentTool(

  name: string,

  args: Record<string, unknown>

): Promise<unknown> {

  switch (name) {

    case "get_page_title":

      return document.title;

    case "read_page_content": {

      const selector = args.selector as string | undefined;

      if (selector) {

        return document.querySelector(selector)?.textContent ?? "Not found";

      }

      return document.body.innerText;

    }

    case "click_element": {

      const el = document.querySelector(args.selector as string) as HTMLElement;

      if (!el) throw new Error(`Element not found: ${args.selector}`);

      el.click();

      return "clicked";

    }

    case "type_text": {

      const input = document.querySelector(args.selector as string) as HTMLInputElement;

      if (!input) throw new Error(`Input not found: ${args.selector}`);

      input.focus();

      input.value = args.text as string;

      input.dispatchEvent(new Event("input", { bubbles: true }));

      input.dispatchEvent(new Event("change", { bubbles: true }));

      return "typed";

    }

    default:

      throw new Error(`Unknown content tool: ${name}`);

  }

}

Step 4 — Handle service-worker-side tools

// background/tools.ts

export async function executeSwTool(

  name: string,

  args: Record<string, unknown>,

  tabId: number

): Promise<unknown> {

  switch (name) {

    case "take_screenshot": {

      const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });

      return dataUrl;

    }

    case "run_javascript": {

      const results = await chrome.scripting.executeScript({

        target: { tabId },

        func: new Function(args.code as string) as () => unknown,

      });

      return results[0]?.result ?? null;

    }

    default:

      return null; // not a SW tool — forward to content script

  }

}

Message Routing Pattern

The service worker acts as a message bus. All communication uses chrome.runtime.sendMessage.

// Message types (shared/messages.ts)

export type ExtMessage =

  | { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }

  | { type: "TOOL_RESULT"; name: string; result: unknown }

  | { type: "TOKEN"; token: string }

  | { type: "AGENT_DONE" }

  | { type: "AGENT_ERROR"; error: string };

// Offscreen → SW

chrome.runtime.sendMessage<ExtMessage>({

  type: "TOOL_CALL",

  name: "click_element",

  args: { selector: "#submit-btn" },

  tabId: currentTabId,

});

// SW → Content script

chrome.tabs.sendMessage<ExtMessage>(tabId, {

  type: "TOOL_CALL",

  name: "click_element",

  args: { selector: "#submit-btn" },

  tabId,

});

Model Configuration

// offscreen/model.ts — loading with transformers.js

import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";

const MODEL_IDS = {

  E2B: "onnx-community/gemma-4-E2B-it-ONNX",

  E4B: "onnx-community/gemma-4-E4B-it-ONNX",

} as const;

export type ModelSize = keyof typeof MODEL_IDS;

export async function loadModel(

  size: ModelSize,

  onProgress: (progress: number) => void

): Promise<TextGenerationPipeline> {

  return pipeline("text-generation", MODEL_IDS[size], {

    dtype: "q4f16",

    device: "webgpu",

    progress_callback: (p: { progress: number }) => onProgress(p.progress),

  });

}

Settings &#x26; Persistence

Settings are stored via chrome.storage.sync:

export interface GemmaGemSettings {

  modelSize: "E2B" | "E4B";

  thinking: boolean;

  maxIterations: number;

  disabledHosts: string[];

}

const DEFAULT_SETTINGS: GemmaGemSettings = {

  modelSize: "E2B",

  thinking: false,

  maxIterations: 10,

  disabledHosts: [],

};

export async function getSettings(): Promise<GemmaGemSettings> {

  const stored = await chrome.storage.sync.get("settings");

  return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };

}

export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {

  const current = await getSettings();

  await chrome.storage.sync.set({ settings: { ...current, ...patch } });

}

// Disable extension on current host

async function disableOnCurrentSite() {

  const host = new URL(location.href).hostname;

  const settings = await getSettings();

  if (!settings.disabledHosts.includes(host)) {

    await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });

  }

}

Shadow DOM Chat UI Pattern

The content script injects a shadow DOM to isolate styles:

// content/ui.ts

export function injectChatOverlay(): ShadowRoot {

  const host = document.createElement("div");

  host.id = "gemma-gem-host";

  // Prevent page styles from leaking in

  const shadow = host.attachShadow({ mode: "closed" });

  // Inject styles

  const style = document.createElement("style");

  style.textContent = CHAT_STYLES; // imported CSS string

  shadow.appendChild(style);

  // Inject chat container

  const container = document.createElement("div");

  container.id = "gemma-gem-container";

  shadow.appendChild(container);

  document.body.appendChild(host);

  return shadow;

}

Debugging

All logs use [Gemma Gem] prefix. Development builds log info/debug/warn; production only logs errors.

# Service worker logs

chrome://extensions → Gemma Gem → "Inspect views: service worker"

# Offscreen document (most useful: model loading, prompts, tool calls)

chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"

# Content script logs

DevTools on any page → Console (filter: [Gemma Gem])

# All extension contexts

chrome://inspect#other

Key things to check in offscreen document logs:

  • Model download progress
  • Full prompt construction
  • Token counts per turn
  • Raw model output (before tool call parsing)
  • Tool execution results

Common Patterns &#x26; Gotchas

WebGPU availability check:

if (!navigator.gpu) {

  throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");

}

const adapter = await navigator.gpu.requestAdapter();

if (!adapter) throw new Error("No WebGPU adapter found.");

Offscreen document lifecycle — Chrome may suspend the offscreen document. Ping it before sending messages:

async function ensureOffscreen() {

  const existing = await chrome.offscreen.hasDocument();

  if (!existing) {

    await chrome.offscreen.createDocument({

      url: "offscreen.html",

      reasons: [chrome.offscreen.Reason.WORKERS],

      justification: "Run Gemma 4 inference via WebGPU",

    });

  }

}

Context window management — Gemma 4 supports 128K tokens but inference slows with long contexts. Clear history per-page with clear_context or limit stored turns:

const MAX_HISTORY_TURNS = 20;

function trimHistory(messages: ChatMessage[]): ChatMessage[] {

  if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;

  return messages.slice(-MAX_HISTORY_TURNS * 2);

}

Tool call parsing — Gemma 4 emits tool calls in a structured format. If adding custom parsing, guard against partial/streamed JSON:

function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {

  try {

    return JSON.parse(raw);

  } catch {

    return null; // still streaming

  }

}

CSS selector safety for DOM tools:

function safeQuerySelector(selector: string): Element | null {

  try {

    return document.querySelector(selector);

  } catch {

    return null; // invalid selector from model

  }

}
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card