nano-banana-2

>

INSTALLATION
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill nano-banana-2
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Nano Banana 2 — Pro Pack on RunComfy

runcomfy.com · Model page · GitHub

Google Nano Banana 2 — the flash-tier text-to-image model in the Gemini family — hosted on the RunComfy Model API. Optimized for ideation, social-thumbnail batches, and rapid drafts with strong in-image typography.

npx skills add agentspace-so/runcomfy-skills --skill nano-banana-2 -g

When to pick this model (vs siblings)

Nano Banana 2 is the flash-tier of the Google image-gen line. Pick it when iteration speed and predictable framing matter more than maximum detail.

You wantUse
Rapid drafts, social thumbnails, batch variantsNano Banana 2
In-image typography with predictable renderingNano Banana 2
Web-grounded image (current events / real entities)Nano Banana 2 + enable_web_search
Image edit (preserve subject, swap background)Nano Banana Edit (sibling skill)
Heavy stylization, painterly lookFlux 2
Maximum prompt adherence + multilingual textGPT Image 2
2K–4K hero shots, max realismSeedream 5
Hyperrealistic portraitNano Banana Pro

If the user said "Nano Banana" / "nano-banana-2" / "Gemini image" explicitly, route here regardless. If they said "Nano Banana" without specifying 2 vs Pro, default to Pro for portraits and 2 for everything else.

Prerequisites

  • RunComfy CLInpm i -g @runcomfy/cli
  • RunComfy accountruncomfy login opens a browser device-code flow.
  • CI / containers — set RUNCOMFY_TOKEN=<token> instead of runcomfy login.

Endpoints + input schema

google/nano-banana-2/text-to-image

FieldTypeRequiredDefaultNotes
promptstringyesSubject-first description.
num_imagesintno11–4. Use 4 for ideation rounds.
seedintno0Reuse for reproducibility.
aspect_ratioenumnoautoauto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16.
resolutionenumno1K0.5K (drafts), 1K (default), 2K (final), 4K (max).
output_formatenumnopngpng, jpeg, webp.
safety_toleranceintno41 (strict) – 6 (permissive).
limit_generationsboolnotrueLimit each prompt round to one generation.
enable_web_searchboolnofalseAdds web grounding (extra cost + latency).

For image edit (preserve subject + apply changes), see the sibling nano-banana-edit skill.

How to invoke

Default draft (1K, square, png):

runcomfy run google/nano-banana-2/text-to-image \

  --input '{"prompt": "<user prompt>"}' \

  --output-dir <absolute/path>

Vertical 4-up batch for ideation:

runcomfy run google/nano-banana-2/text-to-image \

  --input '{

    "prompt": "<user prompt>",

    "num_images": 4,

    "aspect_ratio": "9:16",

    "resolution": "0.5K"

  }' \

  --output-dir <absolute/path>

Final at 2K with seed lock:

runcomfy run google/nano-banana-2/text-to-image \

  --input '{

    "prompt": "<user prompt>",

    "resolution": "2K",

    "aspect_ratio": "16:9",

    "seed": 42

  }' \

  --output-dir <absolute/path>

Web-grounded (current event / real entity):

runcomfy run google/nano-banana-2/text-to-image \

  --input '{

    "prompt": "<prompt referencing a real-world event from this week>",

    "enable_web_search": true

  }' \

  --output-dir <absolute/path>

Prompting — what actually works

Subject-first declarative grammar. "A cinematic close-up portrait of an American woman standing under neon lights in rainy Tokyo, shallow depth of field, reflective wet streets, ultra-detailed, realistic skin texture" — primary subject, then action, environment, style, camera. Front-load subject; trail with directives.

Exact text quoting for in-image typography. "The label reads 'AURA' in clean bold sans-serif, centered, white on black" — quote the literal characters. Specify placement and font style. Don't say "with the brand name on it" and hope.

Consistent seeds for refinement. Lock seed when iterating a single prompt across small variants — keeps composition stable.

Web-grounding, sparingly. Turn on enable_web_search only when the prompt names current events / real entities. Adds latency + cost; off by default.

Don't conflict styles. "minimalist + ornate + retro + cyberpunk" cancels. Pick 1–2 anchors.

Anti-patterns:

  • Trying to verbally describe a stable subject identity — use the edit endpoint with image refs instead.
  • Asking for resolutions outside the 4 tiers → 422.
  • Aspect ratios outside the 11 supported values → 422.
  • Non-quoted in-image text → unpredictable rendering.

Where it shines

Use caseWhy Nano Banana 2
Marketing draft thumbnails (batch of 4)Fast iteration at 0.5K, then promote winner to 2K
Social-platform-nativeWide aspect ratio support including 9:16, 4:5, 21:9
In-image typography for posters / cardsPredictable text rendering when characters are quoted
Web-grounded current-event imageryenable_web_search integrates fresh info
Reproducible variant testingStrong seed + consistent framing

Sample prompts (verified to produce strong results)

Cinematic portrait (page example):

A cinematic close-up portrait of an American woman standing under neon

lights in rainy Tokyo, shallow depth of field, reflective wet streets,

ultra-detailed, realistic skin texture

Brand-asset card with quoted text:

A minimalist 16:9 product card: a matte black ceramic mug centered on a

soft warm-grey paper background, rim highlight from upper-left, the

headline "Brewed Quietly" in clean bold sans-serif top-right, balanced

negative space below, e-commerce ready, clean studio lighting

Vertical platform-native:

A 9:16 vertical hero for a wellness brand: a single ceramic teacup on a

linen runner, soft morning side-light, the words "Slow Down" in

hand-drawn serif large at the top, gentle steam rising, neutral color

palette, uncluttered

Limitations

  • Still images only. No video on this endpoint.
  • Max 4 outputs per request.
  • Web search adds latency + cost — only enable on demand.
  • 2K / 4K cost more — default to 1K unless user asked for higher.
  • **For image edit, use the /edit endpoint** — not this one.

Exit codes

codemeaning
0success
64bad CLI args
65bad input JSON / schema mismatch
69upstream 5xx
75retryable: timeout / 429
77not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill invokes runcomfy run google/nano-banana-2/text-to-image with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/google/nano-banana-2/text-to-image, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

  • Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card