image-edit

>

INSTALLATION
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill image-edit
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Image Edit — Pro Pack on RunComfy

runcomfy.com · Nano Banana Edit · GPT Image 2 Edit · Flux Kontext · Z-Image Inpaint · GitHub

Image edit, intent-routed. This skill doesn't lock you to one model — it picks the right edit model in the RunComfy catalog based on what the user actually wants: batch identity-preservation, multilingual text rewrite, single-shot precise edit, or mask-driven region replacement.

npx skills add agentspace-so/runcomfy-skills --skill image-edit -g

Pick the right model for the user's intent

User intentModelWhy
Batch edit 1–20 images consistently (SKU gallery, A/B variants)Nano Banana EditUp to 20 input images per call; locked aspect/resolution for series
Swap background, preserve subject identityNano Banana EditStrong identity preservation under "keep X unchanged" prompts
Localized object removal / addition with spatial language ("the left object", "upper-right corner")Nano Banana EditHonors directional spatial scope
Multilingual / non-Latin in-image text rewrite (Japanese kana, Cyrillic, Arabic)GPT Image 2 EditStrongest in class for multilingual typography
Multi-reference composition (subject from img1, scene from img2, palette from img3)GPT Image 2 EditNumbered refs route cues correctly
Layout-precise repositioning ("move headline from top-right to bottom-center")GPT Image 2 EditDirectional language honored at layout level
Identity preservation across translated headline variantsGPT Image 2 EditSame source asset → many language variants, identity stable
Single-shot precise local edit ("she's now holding an orange umbrella")Flux Kontext ProSingle-ref single-instruction, high-fidelity preservation
Mask-driven object removal (cables, watermarks, distractions)Z-Image Turbo InpaintMask-required, strength-tunable, edge-consistent
Mask-driven region replacement (full background swap with mask)Z-Image Turbo InpaintHigh strength + clean mask = clean replacement
Default if unspecifiedNano Banana EditMost flexible, supports both single and batch

The agent reads this table, classifies the user's intent, and picks the matching subsection below.

Prerequisites

  • RunComfy CLInpm i -g @runcomfy/cli
  • RunComfy accountruncomfy login.
  • CI / containers — set RUNCOMFY_TOKEN=<token>.

Route 1: Nano Banana Edit — default for general edit + batch

Model: google/nano-banana-2/edit

Schema

FieldTypeRequiredDefaultNotes
promptstringyesLead with preservation goals, end with the change.
image_urlsarrayyes1–20 publicly-fetchable HTTPS URLs.
number_of_imagesintno11–4 outputs per call.
aspect_ratioenumnoautoauto follows input; lock for batch consistency.
resolutionenumno1K0.5K / 1K / 2K / 4K.
output_formatenumnopngpng / jpeg / webp.
seedintnoReproducibility.
enable_web_searchboolnofalseWeb-grounded edits (extra latency).

Invoke

runcomfy run google/nano-banana-2/edit \

  --input '{

    "prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",

    "image_urls": ["https://.../portrait.jpg"]

  }' \

  --output-dir <absolute/path>

Batch (lock aspect + resolution):

runcomfy run google/nano-banana-2/edit \

  --input '{

    "prompt": "Replace the watermark in the bottom-right with the text \"AURA\" in clean white sans-serif. Keep everything else exactly as in the input.",

    "image_urls": ["https://.../sku-1.jpg", "https://.../sku-2.jpg", "https://.../sku-3.jpg"],

    "aspect_ratio": "1:1",

    "resolution": "1K"

  }' \

  --output-dir <absolute/path>

Prompting tips

  • Preservation first: "Keep [identity / pose / brand / framing] unchanged." Then state the change.
  • Spatial scope: "background only", "the left object", "upper-right quadrant" — concrete locations honored.
  • Batch consistency: lock aspect_ratio and resolution across the batch.
  • Iterate small: split compound edits into multiple shorter passes.

Route 2: GPT Image 2 Edit — multilingual text + multi-ref composition

Model: openai/gpt-image-2/edit

Schema

FieldTypeRequiredDefaultNotes
promptstringyesEdit instruction; lead with preservation.
imagesstring[]yesUp to 10 HTTPS URLs. First is primary; rest are auxiliary.
sizeenumnoautoauto, 1024_1024, 1024_1536, 1536_1024. Only these.

Invoke

Multilingual text rewrite:

runcomfy run openai/gpt-image-2/edit \

  --input '{

    "prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight.",

    "images": ["https://.../poster-en.jpg"]

  }' \

  --output-dir <absolute/path>

Multi-ref composition:

runcomfy run openai/gpt-image-2/edit \

  --input '{

    "prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity unchanged.",

    "images": ["https://.../subject.jpg", "https://.../room.jpg"]

  }' \

  --output-dir <absolute/path>

Prompting tips

  • Quote in-image text exactly. Name the script for non-Latin: "Japanese kana", "Cyrillic", "Arabic right-to-left".
  • Number multi-refs: "subject from image 1, lighting from image 2".
  • Directional layout language: "move the headline from top-right to bottom-center", "replace the watermark in the bottom-right".
  • **size: "auto"** preserves input ratio — recommended unless the edit changes framing.

Route 3: Flux Kontext Pro — single-shot precise local edit

Model: blackforestlabs/flux-1-kontext/pro/edit

Schema (minimal)

FieldTypeRequiredNotes
promptstringyesOne declarative edit instruction.
imagestringyesSingle source image URL.
aspect_ratioenumnoPick from supported W:H values.
seedintnoReproducibility.

Single image only — no array. For multi-image flows, use Route 1 (Nano Banana Edit).

Invoke

runcomfy run blackforestlabs/flux-1-kontext/pro/edit \

  --input '{

    "prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",

    "image": "https://.../portrait.jpg"

  }' \

  --output-dir <absolute/path>

Prompting tips

  • One declarative instruction. "She is now holding an orange umbrella and smiling" — imperative, single change.
  • Preservation first. Lead with "Keep [unchanged elements]" then state the change.
  • Iterate small. Compound edits drift on a single pass; split into sequential passes.

Route 4: Z-Image Turbo Inpaint — mask-driven precise region edit

Model: tongyi-mai/z-image/turbo/inpainting

Schema

FieldTypeRequiredNotes
promptstringyesWhat to fill / replace; preservation constraints for the unmasked surround.
imagestringyesSource image URL.
mask_imagestringyesGrayscale mask URL (white = inpaint, black = preserve).
strengthfloatno0.3–0.6 retouching, 0.7–1.0 full replacement.
control_scalefloatno0.6–0.9 typical.
aspect_ratioenumnoW:H output ratio.
seedintnoReproducibility.

Invoke

Object removal (low strength):

runcomfy run tongyi-mai/z-image/turbo/inpainting \

  --input '{

    "prompt": "Remove overhead cables; preserve rooflines and sky gradient; thin clean sky.",

    "image": "https://.../street.jpg",

    "mask_image": "https://.../cables-mask.png",

    "strength": 0.5,

    "control_scale": 0.8

  }' \

  --output-dir <absolute/path>

Region replacement (high strength):

runcomfy run tongyi-mai/z-image/turbo/inpainting \

  --input '{

    "prompt": "Replace busy backdrop with smooth light gray studio paper; mask background only.",

    "image": "https://.../product.jpg",

    "mask_image": "https://.../bg-mask.png",

    "strength": 0.9

  }' \

  --output-dir <absolute/path>

Prompting tips

  • A mask URL is required — grayscale, white = inpaint region, black = preserve. Slight blur on mask edges (1–3px) blends better than sharp binary.
  • Strength by intent: 0.3–0.5 for retouching / cleanup, 0.6–0.7 for object replacement with style match, 0.8–1.0 for full-region replacement.
  • Name what stays outside the mask in the prompt: "preserve rooflines and sky gradient", "match brick pattern and mortar tone".
  • Spatial labels still help even though the mask defines the region: "the left shelf", "upper-right quadrant".

Limitations

  • Each route inherits its model's limits. Nano Banana: 1–20 inputs, 1–4 outputs. GPT Image 2 Edit: up to 10 refs, 4 fixed sizes. Flux Kontext: single ref. Z-Image Inpaint: mask required.
  • No multi-route blending. This skill picks one model per call.
  • Brand-specific overrides — if the user named a specific model, route to the corresponding brand skill (gpt-image-edit, flux-kontext, nano-banana-edit) for fuller treatment.

Exit codes

codemeaning
0success
64bad CLI args
65bad input JSON / schema mismatch
69upstream 5xx
75retryable: timeout / 429
77not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill picks one of Nano Banana Edit / GPT Image 2 Edit / Flux Kontext Pro / Z-Image Turbo Inpaint based on user intent and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

  • Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card