SKILL.md

$2b

3. Generate

Name: ai-image-generation
Author: agentspace-so

runcomfy run //

--input '{"prompt": "..."}'

--output-dir ./out

CLI docs: [Install](https://docs.runcomfy.com/cli/install?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [Quickstart](https://docs.runcomfy.com/cli/quickstart?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [Commands](https://docs.runcomfy.com/cli/commands?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [Auth](https://docs.runcomfy.com/cli/auth?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [Troubleshooting](https://docs.runcomfy.com/cli/troubleshooting?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

## Install this skill

npx skills add agentspace-so/runcomfy-agent-skills --skill ai-image-generation -g


## Pick the right model for the user's intent

### Text-to-image (t2i) — newest first

**FLUX 2 Klein 9B** — `blackforestlabs/flux-2-klein/9b/text-to-image` (default)

Step-distilled, 4–25 steps, native multi-reference conditioning, strong photoreal + illustration all-rounder.
Pick for: intent unclear, fast iteration, multi-ref styling, general-purpose.
Avoid for: in-image text — use **GPT Image 2**.

**FLUX 2 Klein 4B** — `blackforestlabs/flux-2-klein/4b/text-to-image`

Sub-second variant of Klein 9B, same field set.
Pick for: storyboard, moodboard, batch concepting at speed.
Avoid for: final delivery — slight quality drop vs 9B.

**FLUX 2 Pro / Dev / Flash / Turbo / Max** — `blackforestlabs/flux-2/max`, [flux-2-dev](https://www.runcomfy.com/models/blackforestlabs/flux-2-dev/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation), [flux-2-flash](https://www.runcomfy.com/models/blackforestlabs/flux-2-flash?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation), [flux-2-turbo](https://www.runcomfy.com/models/blackforestlabs/flux-2-turbo?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Higher-fidelity tiers of the FLUX 2 base. Cinematic + brand work, hero shots.
Pick for: production polish, brand campaigns.
Avoid for: sub-second speed — use **Klein 4B**.

**Nano Banana Pro** — [google/nano-banana-pro/text-to-image](https://www.runcomfy.com/models/google/nano-banana-pro/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Highest-quality Nano Banana tier. Gemini-grounded, optional web search for real-world references (products, landmarks).
Pick for: NB-style instruction-following at higher fidelity.
Avoid for: cost-sensitive iteration — drop to **Nano Banana 2**.

**Nano Banana 2** — `google/nano-banana-2/text-to-image`

Flash-tier latency, predictable framing, `enable_web_search` flag for real-product / real-person grounding.
Pick for: speed iteration, 4-up batch, real-world grounded prompts.
Avoid for: long compositional instructions — use **GPT Image 2**.

**GPT Image 2** — `openai/gpt-image-2/text-to-image`

Best-in-class in-image text rendering (Japanese kana, Cyrillic, Arabic). Layout-precise instruction following.
Pick for: posters, ads, multi-line copy, multilingual creatives, exact-text headlines.
Avoid for: photoreal portraits — **Seedream 5** wins on skin tones and lighting.

**Seedream 5 Lite** — [bytedance/seedream-5/lite/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-5/lite/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Latest ByteDance Seedream tier. Photoreal skin tones, natural lighting, strong East Asian aesthetic.
Pick for: photoreal portraits, product shots, fashion / lifestyle.
Avoid for: typography precision — use **GPT Image 2**.

**Seedream 4-5** — [bytedance/seedream-4-5/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-4-5/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Previous Seedream flagship, still strong on photoreal.
Pick for: identity-stable batches between Seedream-5 generations; cheaper Seedream tier.
Avoid for: new work — prefer **Seedream 5 Lite**.

**Dreamina 4-0** — [bytedance/dreamina-4-0/text-to-image](https://www.runcomfy.com/models/bytedance/dreamina-4-0/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

ByteDance illustration / concept-art lean, stylized characters.
Pick for: concept art, illustrated heroes, painterly assets.
Avoid for: photoreal — use **Seedream**.

**Qwen Image 2512** — [qwen/qwen-image/qwen-image-2512](https://www.runcomfy.com/models/qwen/qwen-image/qwen-image-2512?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Alibaba Qwen latest, open-weights, LoRA-compatible (`/lora` variant).
Pick for: open-weights workflow, Qwen-aligned LoRA chains.
Avoid for: closed-weights polish — use **FLUX 2** or **GPT Image 2**.

**Wan 2-7** — [wan-ai/wan-2-7/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation), [wan-ai/wan-2-7/pro/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/pro/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Open-weights, pairs natively with Wan 2-7 video models for unified-stack workflows.
Pick for: Wan-stack pipelines (image + video same brand), open-weights requirement.
Avoid for: top-tier image-only quality.

**Z-Image Turbo** — [tongyi-mai/z-image/turbo](https://www.runcomfy.com/models/tongyi-mai/z-image/turbo?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Sub-second open-weights, native LoRA `/lora` variant.
Pick for: LoRA-customized open-weights workflow at speed.
Avoid for: closed-weights polish.

### Image-to-image / edit (i2i) — newest first

**Nano Banana Pro Edit** — [google/nano-banana-pro/edit](https://www.runcomfy.com/models/google/nano-banana-pro/edit?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Highest-quality Nano Banana edit tier. Identity-preserving, multi-ref.
Pick for: premium NB edit work, identity-locked variants.
Avoid for: cost-sensitive iteration — drop to **Nano Banana 2 Edit**.

**Nano Banana 2 Edit** — `google/nano-banana-2/edit` (default i2i)

1–20 input images per call, identity-preserving by default, spatial-language honored ("upper-right", "the left object").
Pick for: default i2i, batch identity-preserving, background swap, directional object remove/add.
Avoid for: precise mask region — use the [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill (Z-Image Inpaint).

**GPT Image 2 Edit** — `openai/gpt-image-2/edit`

Up to 10 reference images, multilingual in-image text rewrite, layout-precise repositioning.
Pick for: multilingual headline swap, multi-ref composition, layout repositioning, brand-locked identity across translations.
Avoid for: mask-driven inpainting — use [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill.

**Seedream 5 Lite Edit** — [bytedance/seedream-5/lite/edit](https://www.runcomfy.com/models/bytedance/seedream-5/lite/edit?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Latest Seedream edit tier, photoreal preservation.
Pick for: photoreal edits that started from a Seedream t2i (identity holds across the pair).
Avoid for: multilingual text rewrite.

**Seedream 4-5 Edit** — [bytedance/seedream-4-5/edit](https://www.runcomfy.com/models/bytedance/seedream-4-5/edit?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Previous Seedream edit.
Pick for: identity-stable batches between 4-5 generations.
Avoid for: new work — prefer **Seedream 5 Lite Edit**.

**Dreamina 4-0 Edit** — [bytedance/dreamina-4-0/edit](https://www.runcomfy.com/models/bytedance/dreamina-4-0/edit?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

ByteDance illustration edit.
Pick for: editing a Dreamina-generated illustration.
Avoid for: photoreal subjects.

**Qwen Image Edit 2511** — [qwen/qwen-image/qwen-image-edit-2511](https://www.runcomfy.com/models/qwen/qwen-image/qwen-image-edit-2511?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Alibaba open-weights edit.
Pick for: open-weights edit pipeline.
Avoid for: closed-weights polish.

**Wan 2.6 i2i** — [wan-ai/wan-v2.6/image-to-image](https://www.runcomfy.com/models/wan-ai/wan-v2.6/image-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

Wan ecosystem image-to-image.
Pick for: Wan-stack pipeline integration.
Avoid for: new work — older generation; prefer NB or GPT Image 2.

**FLUX Kontext Pro** — `blackforestlabs/flux-1-kontext/pro/edit`

Single-ref single-instruction, highest preservation fidelity ("keep everything except X").
Pick for: single-image precise local edit ("change only her umbrella to orange").
Avoid for: batch work, multi-ref composition, mask-driven inpainting.

**Need mask-driven inpainting, controlled outpainting, or the full edit treatment?** → use the [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill.

## t2i Route 1: FLUX 2 Klein — default

**Models**: `blackforestlabs/flux-2-klein/9b/text-to-image` (default), `blackforestlabs/flux-2-klein/4b/text-to-image` (sub-second)
**Catalog**: [9B](https://www.runcomfy.com/models/blackforestlabs/flux-2-klein/9b/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [4B](https://www.runcomfy.com/models/blackforestlabs/flux-2-klein/4b/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

### Schema (both variants)

Field
Type
Required
Default
Notes

`prompt`
string
yes
—
Up to ~512 tokens; longer degrades. Subject-first declarative

`steps`
int
no
25 (9B) / 4 (4B)
Step-distilled; 4–8 enough for ideation, ~25 for polish, >25 buys little

`width`
int
no
1024
512–1536 typical, max ~2K total. Aspect cap 16:9

`height`
int
no
1024
Match width's aspect intent

Up to **4 reference images** supported on the same endpoint for style transfer / guided composition. Field name documented on the [model page](https://www.runcomfy.com/models/blackforestlabs/flux-2-klein/9b/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation).

### Invoke

**Polish / final (9B):**

runcomfy run blackforestlabs/flux-2-klein/9b/text-to-image \

--input '{

"prompt": "A small purple cat sitting on a moss-covered stone, golden hour rim light, shallow depth of field, photoreal",

"steps": 25,

"width": 1536,

"height": 864

}' \

--output-dir ./out


**Sub-second concepting (4B):**

runcomfy run blackforestlabs/flux-2-klein/4b/text-to-image \

--input '{"prompt": "A small purple cat at sunset, photoreal"}' \

--output-dir ./out


### Prompting tips

- **Subject first, scene second, modifiers last.** "A small purple cat … on a moss stone … golden hour, shallow DoF."

- **Step strategy**: 4–8 for ideation, ~25 for polish. Don't crank past 28 — diminishing returns.

- **9B vs 4B**: default 9B; drop to 4B only when you need sub-second batch concepting.

- **Multi-ref**: 1–4 reference URLs; describe roles in prompt (`"subject from ref 1, palette from ref 2"`).

## t2i Route 2: GPT Image 2 — typography &#x26; in-image text

**Model**: `openai/gpt-image-2/text-to-image`
**Catalog**: [runcomfy.com/models/openai/gpt-image-2](https://www.runcomfy.com/models/openai/gpt-image-2/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

### Schema

Field
Type
Required
Default
Notes

`prompt`
string
yes
—
Quote in-image text exactly with `"…"`

`size`
enum
no
`1024_1024`
`1024_1024` (1:1), `1024_1536` (2:3 portrait), `1536_1024` (3:2 landscape) — **only these three**

### Invoke

**Logo / poster with exact headline:**

runcomfy run openai/gpt-image-2/text-to-image \

--input '{

"prompt": "Minimal product poster. Centered bold headline reads exactly \"AURORA — Spring 2026\" in clean white sans-serif on a deep navy background. Below the headline a small line in monospace reads \"runs on water\". 3:2 layout.",

"size": "1536_1024"

}' \

--output-dir ./out


**Multilingual:**

runcomfy run openai/gpt-image-2/text-to-image \

--input '{

"prompt": "Japanese magazine cover. Vertical headline reads exactly \"今日のおすすめ\" in bold Japanese kana, right-edge alignment, photoreal portrait of a woman in a kimono.",

"size": "1024_1536"

}' \

--output-dir ./out


### Prompting tips

- **Quote in-image text exactly.** `"the sign reads exactly 'CLOSED'"` — without the literal quote the model paraphrases.

- **Name the script for non-Latin text**: `"Japanese kana"`, `"Cyrillic"`, `"Arabic right-to-left"`. Without this it falls back to romanization.

- **Layout language honored**: `"top-left"`, `"centered"`, `"two-line stacked"`, `"baseline aligned"`.

- **Only 3 sizes.** Don't pass arbitrary widths.

## t2i Route 3: Nano Banana 2 — speed iteration

**Model**: `google/nano-banana-2/text-to-image`
**Catalog**: [runcomfy.com/models/google/nano-banana-2](https://www.runcomfy.com/models/google/nano-banana-2?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [nano-banana collection](https://www.runcomfy.com/models/collections/nano-banana?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

### Schema

Field
Type
Required
Default
Notes

`prompt`
string
yes
—
Subject-first description

`num_images`
int
no
1
1–4. Use 4 for ideation rounds

`seed`
int
no
0
Reuse for reproducibility

`aspect_ratio`
enum
no
`auto`
`auto`, `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16`

`resolution`
enum
no
`1K`
`0.5K` (drafts), `1K` (default), `2K` (final), `4K` (max)

`output_format`
enum
no
`png`
`png`, `jpeg`, `webp`

`safety_tolerance`
int
no
4
1 (strict) – 6 (permissive)

`enable_web_search`
bool
no
false
Adds web grounding (extra cost + latency)

### Invoke

**Default draft:**

runcomfy run google/nano-banana-2/text-to-image \

--input '{"prompt": "A coffee mug on marble counter, top-down warm morning light"}' \

--output-dir ./out


**4-up batch for ideation:**

runcomfy run google/nano-banana-2/text-to-image \

--input '{

"prompt": "Three product photos of a ceramic coffee mug on a marble counter, warm morning light, top-down angle, minimal styling",

"num_images": 4,

"aspect_ratio": "1:1",

"resolution": "0.5K"

}' \

--output-dir ./out


### Prompting tips

- **Subject-first declarative.** "A coffee mug on marble" beats "Generate a creative shot of a mug".

- **`enable_web_search: true`** when the prompt names a real product, place, or person whose appearance must match reality (logos, landmarks).

- **Drop to `0.5K` for ideation, jump to `2K`+ only for finals** — `4K` ~16× the cost of `0.5K`.

## t2i Route 4: Seedream 5 / 4-5 — photoreal flagship

**Models**: [bytedance/seedream-5/lite/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-5/lite/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation) · [bytedance/seedream-4-5/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-4-5/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)
**Collection**: [seedream](https://www.runcomfy.com/models/collections/seedream?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)

### Invoke

runcomfy run bytedance/seedream-5/lite/text-to-image \

--input '{"prompt": "85mm portrait of a woman by a window, soft natural light, shallow depth of field, photoreal"}' \

--output-dir ./out


Field schema is on the model page — pass through the CLI verbatim.

### When to pick Seedream

- **Photoreal portraits / product** — realistic skin tones and natural lighting

- **East Asian aesthetic / fashion** — strong on these subject categories

- **Cinematic frames** — picks up lens and lighting language well

- **vs FLUX 2**: Seedream skews more photoreal; FLUX skews more design/illustration

## t2i Route 5: Open-weights &#x26; specialty models

For workflows that want open-weights / LoRA support, or alternative aesthetics:

Model
Endpoint
When

[wan-ai/wan-2-7/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)
`wan-ai/wan-2-7/text-to-image`
Wan ecosystem; pair with Wan 2-7 video models

[wan-ai/wan-2-7/pro/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/pro/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)
`wan-ai/wan-2-7/pro/text-to-image`
Wan Pro tier

[tongyi-mai/z-image/turbo](https://www.runcomfy.com/models/tongyi-mai/z-image/turbo?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)
`tongyi-mai/z-image/turbo`
Sub-second, supports LoRA via `/lora` endpoint

[qwen/qwen-image/qwen-image-2512](https://www.runcomfy.com/models/qwen/qwen-image/qwen-image-2512?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)
`qwen/qwen-image/qwen-image-2512`
Qwen Image, open-weights, also has `/lora` variant

[bytedance/dreamina-4-0/text-to-image](https://www.runcomfy.com/models/bytedance/dreamina-4-0/text-to-image?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-image-generation)
`bytedance/dreamina-4-0/text-to-image`
Illustration / concept art lean

Schemas live on each model page — pass field set through the CLI verbatim.

## i2i — image-to-image / edit (compact)

For one-shot edits, this skill ships three core routes; for the full edit treatment (mask-driven inpainting, batch-edit, all the side schemas), use the dedicated [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill.

### i2i Route A: Nano Banana 2 Edit — default

runcomfy run google/nano-banana-2/edit \

--input '{

"prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",

"image_urls": ["https://.../portrait.jpg"]

}' \

--output-dir ./out


Schema: `prompt`, `image_urls` (1–20), `number_of_images` (1–4), `aspect_ratio` (`auto` default), `resolution`, `output_format`, `seed`, `enable_web_search`. Lead the prompt with preservation goals, end with the change.

### i2i Route B: GPT Image 2 Edit — multilingual + multi-ref

runcomfy run openai/gpt-image-2/edit \

--input '{

"prompt": "Keep the photo and layout exactly as in the input. Replace only the headline with \"今日のおすすめ\" in bold Japanese kana.",

"images": ["https://.../poster-en.jpg"],

"size": "auto"

}' \

--output-dir ./out


Schema: `prompt`, `images` (up to 10 HTTPS refs; image 1 is primary), `size` (`auto` / `1024_1024` / `1024_1536` / `1536_1024`). `size: "auto"` preserves input ratio.

### i2i Route C: FLUX Kontext Pro — single-shot precise

runcomfy run blackforestlabs/flux-1-kontext/pro/edit \

--input '{

"prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",

"image": "https://.../portrait.jpg"

}' \

--output-dir ./out

ai-image-generation

SKILL.md

3. Generate

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers