SKILL.md
$2b
3. Generate
runcomfy run //
--input '{"prompt": "..."}'
--output-dir ./out
CLI docs: [Install](https://docs.runcomfy.com/cli/install?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [Quickstart](https://docs.runcomfy.com/cli/quickstart?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [Commands](https://docs.runcomfy.com/cli/commands?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [Auth](https://docs.runcomfy.com/cli/auth?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [Troubleshooting](https://docs.runcomfy.com/cli/troubleshooting?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
## Install this skill
npx skills add agentspace-so/runcomfy-agent-skills --skill ai-image-generation -g
## Pick the right model for the user's intent
### Text-to-image (t2i) — newest first
**FLUX 2 Klein 9B** — `blackforestlabs/flux-2-klein/9b/text-to-image` (default)
Step-distilled, 4–25 steps, native multi-reference conditioning, strong photoreal + illustration all-rounder.
Pick for: intent unclear, fast iteration, multi-ref styling, general-purpose.
Avoid for: in-image text — use **GPT Image 2**.
**FLUX 2 Klein 4B** — `blackforestlabs/flux-2-klein/4b/text-to-image`
Sub-second variant of Klein 9B, same field set.
Pick for: storyboard, moodboard, batch concepting at speed.
Avoid for: final delivery — slight quality drop vs 9B.
**FLUX 2 Pro / Dev / Flash / Turbo / Max** — `blackforestlabs/flux-2/max`, [flux-2-dev](https://www.runcomfy.com/models/blackforestlabs/flux-2-dev/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation), [flux-2-flash](https://www.runcomfy.com/models/blackforestlabs/flux-2-flash?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation), [flux-2-turbo](https://www.runcomfy.com/models/blackforestlabs/flux-2-turbo?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Higher-fidelity tiers of the FLUX 2 base. Cinematic + brand work, hero shots.
Pick for: production polish, brand campaigns.
Avoid for: sub-second speed — use **Klein 4B**.
**Nano Banana Pro** — [google/nano-banana-pro/text-to-image](https://www.runcomfy.com/models/google/nano-banana-pro/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Highest-quality Nano Banana tier. Gemini-grounded, optional web search for real-world references (products, landmarks).
Pick for: NB-style instruction-following at higher fidelity.
Avoid for: cost-sensitive iteration — drop to **Nano Banana 2**.
**Nano Banana 2** — `google/nano-banana-2/text-to-image`
Flash-tier latency, predictable framing, `enable_web_search` flag for real-product / real-person grounding.
Pick for: speed iteration, 4-up batch, real-world grounded prompts.
Avoid for: long compositional instructions — use **GPT Image 2**.
**GPT Image 2** — `openai/gpt-image-2/text-to-image`
Best-in-class in-image text rendering (Japanese kana, Cyrillic, Arabic). Layout-precise instruction following.
Pick for: posters, ads, multi-line copy, multilingual creatives, exact-text headlines.
Avoid for: photoreal portraits — **Seedream 5** wins on skin tones and lighting.
**Seedream 5 Lite** — [bytedance/seedream-5/lite/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-5/lite/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Latest ByteDance Seedream tier. Photoreal skin tones, natural lighting, strong East Asian aesthetic.
Pick for: photoreal portraits, product shots, fashion / lifestyle.
Avoid for: typography precision — use **GPT Image 2**.
**Seedream 4-5** — [bytedance/seedream-4-5/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-4-5/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Previous Seedream flagship, still strong on photoreal.
Pick for: identity-stable batches between Seedream-5 generations; cheaper Seedream tier.
Avoid for: new work — prefer **Seedream 5 Lite**.
**Dreamina 4-0** — [bytedance/dreamina-4-0/text-to-image](https://www.runcomfy.com/models/bytedance/dreamina-4-0/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
ByteDance illustration / concept-art lean, stylized characters.
Pick for: concept art, illustrated heroes, painterly assets.
Avoid for: photoreal — use **Seedream**.
**Qwen Image 2512** — [qwen/qwen-image/qwen-image-2512](https://www.runcomfy.com/models/qwen/qwen-image/qwen-image-2512?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Alibaba Qwen latest, open-weights, LoRA-compatible (`/lora` variant).
Pick for: open-weights workflow, Qwen-aligned LoRA chains.
Avoid for: closed-weights polish — use **FLUX 2** or **GPT Image 2**.
**Wan 2-7** — [wan-ai/wan-2-7/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation), [wan-ai/wan-2-7/pro/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/pro/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Open-weights, pairs natively with Wan 2-7 video models for unified-stack workflows.
Pick for: Wan-stack pipelines (image + video same brand), open-weights requirement.
Avoid for: top-tier image-only quality.
**Z-Image Turbo** — [tongyi-mai/z-image/turbo](https://www.runcomfy.com/models/tongyi-mai/z-image/turbo?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Sub-second open-weights, native LoRA `/lora` variant.
Pick for: LoRA-customized open-weights workflow at speed.
Avoid for: closed-weights polish.
### Image-to-image / edit (i2i) — newest first
**Nano Banana Pro Edit** — [google/nano-banana-pro/edit](https://www.runcomfy.com/models/google/nano-banana-pro/edit?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Highest-quality Nano Banana edit tier. Identity-preserving, multi-ref.
Pick for: premium NB edit work, identity-locked variants.
Avoid for: cost-sensitive iteration — drop to **Nano Banana 2 Edit**.
**Nano Banana 2 Edit** — `google/nano-banana-2/edit` (default i2i)
1–20 input images per call, identity-preserving by default, spatial-language honored ("upper-right", "the left object").
Pick for: default i2i, batch identity-preserving, background swap, directional object remove/add.
Avoid for: precise mask region — use the [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill (Z-Image Inpaint).
**GPT Image 2 Edit** — `openai/gpt-image-2/edit`
Up to 10 reference images, multilingual in-image text rewrite, layout-precise repositioning.
Pick for: multilingual headline swap, multi-ref composition, layout repositioning, brand-locked identity across translations.
Avoid for: mask-driven inpainting — use [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill.
**Seedream 5 Lite Edit** — [bytedance/seedream-5/lite/edit](https://www.runcomfy.com/models/bytedance/seedream-5/lite/edit?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Latest Seedream edit tier, photoreal preservation.
Pick for: photoreal edits that started from a Seedream t2i (identity holds across the pair).
Avoid for: multilingual text rewrite.
**Seedream 4-5 Edit** — [bytedance/seedream-4-5/edit](https://www.runcomfy.com/models/bytedance/seedream-4-5/edit?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Previous Seedream edit.
Pick for: identity-stable batches between 4-5 generations.
Avoid for: new work — prefer **Seedream 5 Lite Edit**.
**Dreamina 4-0 Edit** — [bytedance/dreamina-4-0/edit](https://www.runcomfy.com/models/bytedance/dreamina-4-0/edit?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
ByteDance illustration edit.
Pick for: editing a Dreamina-generated illustration.
Avoid for: photoreal subjects.
**Qwen Image Edit 2511** — [qwen/qwen-image/qwen-image-edit-2511](https://www.runcomfy.com/models/qwen/qwen-image/qwen-image-edit-2511?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Alibaba open-weights edit.
Pick for: open-weights edit pipeline.
Avoid for: closed-weights polish.
**Wan 2.6 i2i** — [wan-ai/wan-v2.6/image-to-image](https://www.runcomfy.com/models/wan-ai/wan-v2.6/image-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
Wan ecosystem image-to-image.
Pick for: Wan-stack pipeline integration.
Avoid for: new work — older generation; prefer NB or GPT Image 2.
**FLUX Kontext Pro** — `blackforestlabs/flux-1-kontext/pro/edit`
Single-ref single-instruction, highest preservation fidelity ("keep everything except X").
Pick for: single-image precise local edit ("change only her umbrella to orange").
Avoid for: batch work, multi-ref composition, mask-driven inpainting.
**Need mask-driven inpainting, controlled outpainting, or the full edit treatment?** → use the [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill.
## t2i Route 1: FLUX 2 Klein — default
**Models**: `blackforestlabs/flux-2-klein/9b/text-to-image` (default), `blackforestlabs/flux-2-klein/4b/text-to-image` (sub-second)
**Catalog**: [9B](https://www.runcomfy.com/models/blackforestlabs/flux-2-klein/9b/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [4B](https://www.runcomfy.com/models/blackforestlabs/flux-2-klein/4b/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
### Schema (both variants)
Field
Type
Required
Default
Notes
`prompt`
string
yes
—
Up to ~512 tokens; longer degrades. Subject-first declarative
`steps`
int
no
25 (9B) / 4 (4B)
Step-distilled; 4–8 enough for ideation, ~25 for polish, >25 buys little
`width`
int
no
1024
512–1536 typical, max ~2K total. Aspect cap 16:9
`height`
int
no
1024
Match width's aspect intent
Up to **4 reference images** supported on the same endpoint for style transfer / guided composition. Field name documented on the [model page](https://www.runcomfy.com/models/blackforestlabs/flux-2-klein/9b/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation).
### Invoke
**Polish / final (9B):**
runcomfy run blackforestlabs/flux-2-klein/9b/text-to-image \
--input '{
"prompt": "A small purple cat sitting on a moss-covered stone, golden hour rim light, shallow depth of field, photoreal",
"steps": 25,
"width": 1536,
"height": 864
}' \
--output-dir ./out
**Sub-second concepting (4B):**
runcomfy run blackforestlabs/flux-2-klein/4b/text-to-image \
--input '{"prompt": "A small purple cat at sunset, photoreal"}' \
--output-dir ./out
### Prompting tips
- **Subject first, scene second, modifiers last.** "A small purple cat … on a moss stone … golden hour, shallow DoF."
- **Step strategy**: 4–8 for ideation, ~25 for polish. Don't crank past 28 — diminishing returns.
- **9B vs 4B**: default 9B; drop to 4B only when you need sub-second batch concepting.
- **Multi-ref**: 1–4 reference URLs; describe roles in prompt (`"subject from ref 1, palette from ref 2"`).
## t2i Route 2: GPT Image 2 — typography & in-image text
**Model**: `openai/gpt-image-2/text-to-image`
**Catalog**: [runcomfy.com/models/openai/gpt-image-2](https://www.runcomfy.com/models/openai/gpt-image-2/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
### Schema
Field
Type
Required
Default
Notes
`prompt`
string
yes
—
Quote in-image text exactly with `"…"`
`size`
enum
no
`1024_1024`
`1024_1024` (1:1), `1024_1536` (2:3 portrait), `1536_1024` (3:2 landscape) — **only these three**
### Invoke
**Logo / poster with exact headline:**
runcomfy run openai/gpt-image-2/text-to-image \
--input '{
"prompt": "Minimal product poster. Centered bold headline reads exactly \"AURORA — Spring 2026\" in clean white sans-serif on a deep navy background. Below the headline a small line in monospace reads \"runs on water\". 3:2 layout.",
"size": "1536_1024"
}' \
--output-dir ./out
**Multilingual:**
runcomfy run openai/gpt-image-2/text-to-image \
--input '{
"prompt": "Japanese magazine cover. Vertical headline reads exactly \"今日のおすすめ\" in bold Japanese kana, right-edge alignment, photoreal portrait of a woman in a kimono.",
"size": "1024_1536"
}' \
--output-dir ./out
### Prompting tips
- **Quote in-image text exactly.** `"the sign reads exactly 'CLOSED'"` — without the literal quote the model paraphrases.
- **Name the script for non-Latin text**: `"Japanese kana"`, `"Cyrillic"`, `"Arabic right-to-left"`. Without this it falls back to romanization.
- **Layout language honored**: `"top-left"`, `"centered"`, `"two-line stacked"`, `"baseline aligned"`.
- **Only 3 sizes.** Don't pass arbitrary widths.
## t2i Route 3: Nano Banana 2 — speed iteration
**Model**: `google/nano-banana-2/text-to-image`
**Catalog**: [runcomfy.com/models/google/nano-banana-2](https://www.runcomfy.com/models/google/nano-banana-2?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [nano-banana collection](https://www.runcomfy.com/models/collections/nano-banana?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
### Schema
Field
Type
Required
Default
Notes
`prompt`
string
yes
—
Subject-first description
`num_images`
int
no
1
1–4. Use 4 for ideation rounds
`seed`
int
no
0
Reuse for reproducibility
`aspect_ratio`
enum
no
`auto`
`auto`, `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16`
`resolution`
enum
no
`1K`
`0.5K` (drafts), `1K` (default), `2K` (final), `4K` (max)
`output_format`
enum
no
`png`
`png`, `jpeg`, `webp`
`safety_tolerance`
int
no
4
1 (strict) – 6 (permissive)
`enable_web_search`
bool
no
false
Adds web grounding (extra cost + latency)
### Invoke
**Default draft:**
runcomfy run google/nano-banana-2/text-to-image \
--input '{"prompt": "A coffee mug on marble counter, top-down warm morning light"}' \
--output-dir ./out
**4-up batch for ideation:**
runcomfy run google/nano-banana-2/text-to-image \
--input '{
"prompt": "Three product photos of a ceramic coffee mug on a marble counter, warm morning light, top-down angle, minimal styling",
"num_images": 4,
"aspect_ratio": "1:1",
"resolution": "0.5K"
}' \
--output-dir ./out
### Prompting tips
- **Subject-first declarative.** "A coffee mug on marble" beats "Generate a creative shot of a mug".
- **`enable_web_search: true`** when the prompt names a real product, place, or person whose appearance must match reality (logos, landmarks).
- **Drop to `0.5K` for ideation, jump to `2K`+ only for finals** — `4K` ~16× the cost of `0.5K`.
## t2i Route 4: Seedream 5 / 4-5 — photoreal flagship
**Models**: [bytedance/seedream-5/lite/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-5/lite/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation) · [bytedance/seedream-4-5/text-to-image](https://www.runcomfy.com/models/bytedance/seedream-4-5/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
**Collection**: [seedream](https://www.runcomfy.com/models/collections/seedream?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
### Invoke
runcomfy run bytedance/seedream-5/lite/text-to-image \
--input '{"prompt": "85mm portrait of a woman by a window, soft natural light, shallow depth of field, photoreal"}' \
--output-dir ./out
Field schema is on the model page — pass through the CLI verbatim.
### When to pick Seedream
- **Photoreal portraits / product** — realistic skin tones and natural lighting
- **East Asian aesthetic / fashion** — strong on these subject categories
- **Cinematic frames** — picks up lens and lighting language well
- **vs FLUX 2**: Seedream skews more photoreal; FLUX skews more design/illustration
## t2i Route 5: Open-weights & specialty models
For workflows that want open-weights / LoRA support, or alternative aesthetics:
Model
Endpoint
When
[wan-ai/wan-2-7/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
`wan-ai/wan-2-7/text-to-image`
Wan ecosystem; pair with Wan 2-7 video models
[wan-ai/wan-2-7/pro/text-to-image](https://www.runcomfy.com/models/wan-ai/wan-2-7/pro/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
`wan-ai/wan-2-7/pro/text-to-image`
Wan Pro tier
[tongyi-mai/z-image/turbo](https://www.runcomfy.com/models/tongyi-mai/z-image/turbo?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
`tongyi-mai/z-image/turbo`
Sub-second, supports LoRA via `/lora` endpoint
[qwen/qwen-image/qwen-image-2512](https://www.runcomfy.com/models/qwen/qwen-image/qwen-image-2512?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
`qwen/qwen-image/qwen-image-2512`
Qwen Image, open-weights, also has `/lora` variant
[bytedance/dreamina-4-0/text-to-image](https://www.runcomfy.com/models/bytedance/dreamina-4-0/text-to-image?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-image-generation)
`bytedance/dreamina-4-0/text-to-image`
Illustration / concept art lean
Schemas live on each model page — pass field set through the CLI verbatim.
## i2i — image-to-image / edit (compact)
For one-shot edits, this skill ships three core routes; for the full edit treatment (mask-driven inpainting, batch-edit, all the side schemas), use the dedicated [image-edit](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/image-edit) skill.
### i2i Route A: Nano Banana 2 Edit — default
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",
"image_urls": ["https://.../portrait.jpg"]
}' \
--output-dir ./out
Schema: `prompt`, `image_urls` (1–20), `number_of_images` (1–4), `aspect_ratio` (`auto` default), `resolution`, `output_format`, `seed`, `enable_web_search`. Lead the prompt with preservation goals, end with the change.
### i2i Route B: GPT Image 2 Edit — multilingual + multi-ref
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photo and layout exactly as in the input. Replace only the headline with \"今日のおすすめ\" in bold Japanese kana.",
"images": ["https://.../poster-en.jpg"],
"size": "auto"
}' \
--output-dir ./out
Schema: `prompt`, `images` (up to 10 HTTPS refs; image 1 is primary), `size` (`auto` / `1024_1024` / `1024_1536` / `1536_1024`). `size: "auto"` preserves input ratio.
### i2i Route C: FLUX Kontext Pro — single-shot precise
runcomfy run blackforestlabs/flux-1-kontext/pro/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",
"image": "https://.../portrait.jpg"
}' \
--output-dir ./out