gemini-interactions-api

Unified interface for Gemini models and agents with server-side state, streaming, and tool orchestration. Supports multiple current models (gemini-3-flash-preview, gemini-3-pro-preview, gemini-2.5-flash/pro) and the Deep Research agent; automatically substitute deprecated model IDs with current alternatives Offload conversation history to the server via previous_interaction_id for stateful multi-turn interactions without manual history management Built-in tool orchestration including function calling, Google Search, code execution, URL context, file search, and remote MCP servers Streaming via Server-Sent Events, background execution for long-running tasks, and configurable reasoning depth with thought summaries Available in Python (google-genai >= 1.55.0) and JavaScript/TypeScript (@google/genai >= 1.33.0)

INSTALLATION
npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-interactions-api
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2b

[!WARNING]

Models like gemini-2.0-*, gemini-1.5-* are legacy and deprecated. Never use them.

**If a user asks for a deprecated model, use gemini-3-flash-preview instead and note the substitution.**

Current Agents

  • deep-research-preview-04-2026: Deep Research — fast, interactive
  • deep-research-max-preview-04-2026: Deep Research Max — maximum exhaustiveness

Current SDKs

  • Python: google-genai >= 2.0.0pip install -U google-genai
  • JavaScript/TypeScript: @google/genai >= 2.0.0npm install @google/genai

[!NOTE]

SDK versions ≥ 2.0.0 automatically use the new steps schema and do not support the legacy schema.

Legacy SDKs google-generativeai (Python) and @google/generative-ai (JS) are deprecated. Never use them.

[!CAUTION]

Breaking changes (May 2026): Responses now use steps array instead of outputs, and a polymorphic response_format replaces response_mime_type. Legacy schema removed June 8, 2026. All code below uses the new schema.

Important Additional Notes

  • Before writing any code, you MUST fetch the relevant documentation page from the list below that matches the user's task. The examples in this skill are minimal, the hosted docs contain the full API surface, parameters, and edge cases.
  • Interactions are stored by default (store=true). Paid tier retains for 55 days, free tier for 1 day.
  • Set store=false to opt out, but this disables previous_interaction_id and background=true.
  • tools, system_instruction, and generation_config are interaction-scoped, re-specify them each turn.
  • **Migrating from generateContent**: Read references/migration.md for the scoping, checklist, and before/after code examples. Always confirm scope with the user before editing.
  • Model upgrades: Drop-in, swap the model string. Deprecated models (gemini-2.0-*, gemini-1.5-*) must be replaced, see references/migration.md.

Quick Start

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(

    model="gemini-3-flash-preview",

    input="Tell me a short joke about programming."

)

print(interaction.steps[-1].content[0].text)

JavaScript/TypeScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({

    model: "gemini-3-flash-preview",

    input: "Tell me a short joke about programming.",

});

console.log(interaction.steps.at(-1).content[0].text);

Stateful Conversation

Python

interaction1 = client.interactions.create(

    model="gemini-3-flash-preview",

    input="Hi, my name is Phil."

)

# Second turn — server remembers context

interaction2 = client.interactions.create(

    model="gemini-3-flash-preview",

    input="What is my name?",

    previous_interaction_id=interaction1.id

)

print(interaction2.steps[-1].content[0].text)

JavaScript/TypeScript

const interaction1 = await client.interactions.create({

    model: "gemini-3-flash-preview",

    input: "Hi, my name is Phil.",

});

const interaction2 = await client.interactions.create({

    model: "gemini-3-flash-preview",

    input: "What is my name?",

    previous_interaction_id: interaction1.id,

});

console.log(interaction2.steps.at(-1).content[0].text);

Deep Research Agent

Use deep-research-preview-04-2026 for fast research or deep-research-max-preview-04-2026 for maximum exhaustiveness. Agents require background=True.

Python

import time

interaction = client.interactions.create(

    agent="deep-research-preview-04-2026",

    input="Research the history of Google TPUs.",

    background=True

)

while True:

    interaction = client.interactions.get(interaction.id)

    if interaction.status == "completed":

        print(interaction.steps[-1].content[0].text)

        break

    elif interaction.status == "failed":

        print(f"Failed: {interaction.error}")

        break

    time.sleep(10)

JavaScript/TypeScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

// Start background research

const initialInteraction = await client.interactions.create({

    agent: "deep-research-preview-04-2026",

    input: "Research the history of Google TPUs.",

    background: true,

});

// Poll for results

while (true) {

    const interaction = await client.interactions.get(initialInteraction.id);

    if (interaction.status === "completed") {

        console.log(interaction.steps.at(-1).content[0].text);

        break;

    } else if (["failed", "cancelled"].includes(interaction.status)) {

        console.log(`Failed: ${interaction.status}`);

        break;

    }

    await new Promise(resolve => setTimeout(resolve, 10000));

}

Advanced features: collaborative planning, native visualization, MCP integration, file search, multimodal inputs. See Deep Research docs.

Streaming

Python

for event in client.interactions.create(

    model="gemini-3-flash-preview",

    input="Explain quantum entanglement in simple terms.",

    stream=True,

):

    if event.type == "step.delta":

        if event.delta.type == "text":

            print(event.delta.text, end="", flush=True)

        elif event.delta.type == "thought_summary":

            summary_text = event.delta.content.get('text', '') if hasattr(event.delta, 'content') else getattr(event.delta, 'text', '')

            print(summary_text, end="", flush=True)

    elif event.type == "interaction.complete":

        print(f"\n\nTotal Tokens: {event.interaction.usage.total_tokens}")

JavaScript/TypeScript

const stream = await client.interactions.create({

    model: "gemini-3-flash-preview",

    input: "Explain quantum entanglement in simple terms.",

    stream: true,

});

for await (const event of stream) {

    if (event.type === 'step.delta') {

        if (event.delta.type === 'text') {

            process.stdout.write(event.delta.text);

        } else if (event.delta.type === 'thought_summary') {

            const text = event.delta.content?.text || "";

            process.stdout.write(text);

        }

    } else if (event.type === 'interaction.complete') {

        console.log(`\n\nTotal Tokens: ${event.interaction.usage.total_tokens}`);

    }

}

Documentation Pages

You MUST fetch the matching page below before writing code. These hosted docs are the source of truth for parameters, types, and edge cases — do not rely solely on the examples above.

Core Documentation:

Tools & Function Calling:

Generation & Output:

Multimodal Understanding:

Files & Context:

Advanced Features:

API Reference:

Data Model

An Interaction response contains steps, an array of typed step objects representing a structured timeline of the interaction turn.

Step Types

User steps:

  • user_input: User input (text, audio, multimodal). Contains content array.

Model/server steps:

  • model_output: Final model generation. Contains content array with text, image, audio, etc.
  • thought: Model reasoning/Chain of Thought. Has signature field (required) and optional summary.
  • function_call: Tool call request (id, name, arguments).
  • function_result: Tool result you send back (call_id, name, result).
  • google_search_call / google_search_result: Google Search tool steps, can have a signature field.
  • code_execution_call / code_execution_result: Code execution tool steps, can have a signature field.
  • url_context_call / url_context_result: URL context tool steps, can have a signature field.
  • mcp_server_tool_call / mcp_server_tool_result: Remote MCP tool steps.
  • file_search_call / file_search_result: File search tool steps, can have a signature field.

Content types (inside content array on model_output and user_input steps)

  • text: Text content (text field)
  • image / audio / document / video: Content with data, mime_type, or uri

Streaming Event Types

Event

Description

interaction.created

Interaction created; includes metadata.

interaction.status_update

Interaction-level status change.

step.start

A new step begins. Contains step type and initial metadata.

step.delta

Incremental data for the current step. Contains a typed delta object.

step.stop

The step is complete. Contains index.

interaction.complete

Interaction finished. Contains final usage.

Delta Types

Delta Type

Parent Step

Description

text

model_output

Incremental text token.

audio

model_output

audio chunk (base64).

image

model_output

image chunk (base64).

thought_summary

thought

thinking summary text.

thought_signature

thought

Opaque signature for thought verification.

Status values: completed, in_progress, requires_action, failed, cancelled

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card