SKILL.md

🎨 VideoAgent Image Studio

Use when: User asks to generate, draw, create, or make any kind of image, photo, illustration, icon, logo, or artwork.

Generate images with 8 state-of-the-art AI models. This skill automatically picks the best model for the job and handles all the complexity — including Midjourney's async polling — so you can focus on the conversation.

Quick Reference

User Intent

Model

Speed

Artistic, cinematic, painterly

midjourney

~15s

Photorealistic, portrait, product

flux-pro

~8s

General purpose, balanced

flux-dev

~10s

Quick draft, fast iteration

flux-schnell

~2s

Image with text, logo, poster

ideogram

~10s

Vector art, icon, flat design

recraft

~8s

Anime, stylized illustration

sdxl

~5s

Gemini-powered, consistent style

nano-banana

~12s

How to Generate an Image

Step 1 — Enhance the prompt

Before calling the script, expand the user's prompt with style, lighting, and quality descriptors appropriate for the chosen model.

Midjourney: Add cinematic lighting, ultra detailed, --v 7, --style raw

Flux: Add masterpiece, highly detailed, sharp focus, professional photography

Ideogram: Be explicit about text content, font style, and layout

Recraft: Specify vector illustration, flat design, icon style

Step 2 — Run the script

node {baseDir}/tools/generate.js \

  --model <model_id> \

  --prompt "<enhanced prompt>" \

  --aspect-ratio <ratio>

All parameters:

Parameter

Default

Description

--model

flux-dev

Model ID from the table above

--prompt

(required)

The image generation prompt

--aspect-ratio

1:1

1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 21:9

--num-images

1

Number of images (1–4; Midjourney always returns 4)

--negative-prompt

—

Things to avoid (not supported by Midjourney)

--seed

—

Seed for reproducibility

Step 3 — Return the result

The script always waits and returns the final image URL(s). No polling required.

{

  "success": true,

  "model": "flux-pro",

  "imageUrl": "https://...",

  "images": ["https://..."]

}

Send the imageUrl to the user.

Midjourney Actions

After generating a 4-image grid with Midjourney, offer the user these options:

# Upscale image #2 (subtle, preserves details)

node {baseDir}/tools/generate.js \

  --model midjourney \

  --action upscale \

  --index 2 \

  --job-id <job_id>

# Create a strong variation of image #3

node {baseDir}/tools/generate.js \

  --model midjourney \

  --action variation \

  --index 3 \

  --job-id <job_id> \

  --variation-type 1

# Regenerate with same prompt

node {baseDir}/tools/generate.js \

  --model midjourney \

  --action reroll \

  --job-id <job_id>

Upscale types: 0 = Subtle (default, best for photos), 1 = Creative (best for illustrations)

Variation types: 0 = Subtle (default), 1 = Strong (dramatic changes)

Example Conversations

User: "Draw a snow leopard on a snowy mountain with cinematic lighting"

# Choose midjourney for artistic quality

node {baseDir}/tools/generate.js \

  --model midjourney \

  --prompt "a majestic snow leopard on a snowy mountain peak, cinematic lighting, dramatic atmosphere, ultra detailed --ar 16:9 --v 7" \

  --aspect-ratio 16:9

🎨 Done! Which one to upscale? (U1-U4) Or create a variant? (V1-V4)

User: "Use Flux to generate a perfume product poster, white background"

# Choose flux-pro for photorealistic product shots

node {baseDir}/tools/generate.js \

  --model flux-pro \

  --prompt "a luxury perfume bottle on a clean white background, professional product photography, soft shadows, 8k, highly detailed" \

  --aspect-ratio 3:4

User: "Show me a quick draft"

# flux-schnell for instant previews

node {baseDir}/tools/generate.js \

  --model flux-schnell \

  --prompt "..." \

  --aspect-ratio 1:1

User: "Make me an App icon, flat style, blue theme"

# recraft for vector/icon style

node {baseDir}/tools/generate.js \

  --model recraft \

  --prompt "a minimal flat design app icon, blue color scheme, simple geometric shapes, vector style, white background"

Setup

Zero API keys needed! All requests go through a hosted proxy that handles authentication server-side.

The skill works out of the box — just install and use.

Advanced: Custom proxy or token

If you want to use your own proxy or a persistent token, set these environment variables:

{

  "skills": {

    "entries": {

      "videoagent-image-studio": {

        "enabled": true,

        "env": {

          "IMAGE_STUDIO_PROXY_URL": "https://your-proxy.vercel.app",

          "IMAGE_STUDIO_TOKEN": "your_token_here"

        }

      }

    }

  }

}

Variable

Required

Description

IMAGE_STUDIO_PROXY_URL

Custom proxy base URL (default: https://image-gen-proxy.vercel.app)

IMAGE_STUDIO_TOKEN

Persistent token (auto-obtained if not set, 100 free uses per token)

To deploy your own proxy, see the videoagent-audio-studio proxy as a reference implementation. You'll need FAL_KEY and LEGNEXT_KEY as Vercel environment variables.

Changelog

v2.0.0

Simplified async: The script now blocks until Midjourney completes. No more --async / --poll flags needed in SKILL.md instructions.

Unified output format: All models return the same { success, imageUrl, images } shape.

Reference images for Nano Banana: Pass --reference-images "url1,url2" for character/style consistency across generations.

v1.3.0

Added non-blocking async mode for Midjourney (--async + --poll).

v1.2.0

Midjourney turbo mode enabled by default (~10-20s).

v1.1.0

Switched Midjourney provider from TTAPI to Legnext.ai for better stability.

v1.0.0

Initial release with Midjourney, Flux, SDXL, Nano Banana, Ideogram, Recraft.

videoagent-image-studio