SKILL.md

$2c

3. Generate

Name: ai-video-generation
Author: agentspace-so

runcomfy run //

--input '{"prompt": "..."}'

--output-dir ./out

CLI deep dive: [`runcomfy-cli`](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/runcomfy-cli) skill.

## Install this skill

npx skills add agentspace-so/runcomfy-agent-skills --skill ai-video-generation -g


## Pick the right model for the user's intent

### Text-to-video (t2v) — newest first

**HappyHorse 1.0** — `happyhorse/happyhorse-1-0/text-to-video` (default)

Currently #1 on Artificial Analysis Video Arena. Native synchronized audio generated in-pass (no separate Foley step). Native 1080p, up to ~15s, strong multi-shot character consistency.
Pick for: general-purpose t2v, ad creative with audio, social-media clips, multi-shot narratives.
Avoid for: audio-driven lip-sync to a specific voiceover MP3 — use **Wan 2-7**.

**Kling 3.0 4K** — [kling/kling-3.0/4k/text-to-video](https://www.runcomfy.com/models/kling/kling-3.0/4k/text-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Kling's latest, 4K output, strong multi-shot character identity, premium camera language.
Pick for: hero shots, final-delivery 4K cuts, multi-shot character narratives.
Avoid for: cost-sensitive iteration — drop to **Kling 2-6 Pro** or **Standard** i2v.

**Seedance v2 Pro** — `bytedance/seedance-v2/pro`

ByteDance flagship — multi-modal (up to 9 reference images, 3 reference videos, 3 reference audio), in-pass synchronized audio, cinematic motion refinement, lens language honored.
Pick for: cinematic ad frames, multi-reference composition (subject + scene + audio refs), 21:9 anamorphic looks.
Avoid for: simple "single prompt → clip" jobs — overpowered, slower.

**Seedance v2 Fast** — [bytedance/seedance-v2/fast](https://www.runcomfy.com/models/bytedance/seedance-v2/fast?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Faster variant of Seedance v2 Pro, same multi-modal capabilities.
Pick for: iteration on Seedance v2 compositions before locking a final on Pro.
Avoid for: hero-shot final delivery.

**Wan 2-7** — `wan-ai/wan-2-7/text-to-video`

Open-weights flagship, `audio_url` field for audio-driven lip-sync, pairs natively with Wan image models.
Pick for: dialog scenes where mouth must sync to a specific voiceover file; open-weights pipeline requirement.
Avoid for: in-pass audio generation (no MP3 input) — use **HappyHorse 1.0**.

**Kling 2-6 Pro** — [kling/kling-2-6/pro/text-to-video](https://www.runcomfy.com/models/kling/kling-2-6/pro/text-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Previous Kling tier — still strong quality at much lower cost than 3.0 4K.
Pick for: production at scale where 3.0 4K is too expensive.
Avoid for: top-tier hero shots — use **Kling 3.0 4K**.

**Seedance 1-5 Pro** — [bytedance/seedance-1-5/pro/text-to-video](https://www.runcomfy.com/models/bytedance/seedance-1-5/pro/text-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Previous Seedance generation, cheaper.
Pick for: identity-stable batches between 1-5 generations; cost-sensitive baseline.
Avoid for: new work — prefer **Seedance v2 Pro** or **Fast**.

### Image-to-video (i2v) — newest first

**HappyHorse 1.0 I2V** — `happyhorse/happyhorse-1-0/image-to-video` (default)

Animate any still with in-pass audio described in prompt, strong identity preservation.
Pick for: animating a generated portrait or product still, vertical social clips, voiceover-described audio.
Avoid for: physics-accurate object motion — use **Veo 3-1**.

**Veo 3-1** — [google-deepmind/veo-3-1/image-to-video](https://www.runcomfy.com/models/google-deepmind/veo-3-1/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Google's flagship — physics-respecting motion, strong object permanence ("rotates 180 degrees" = 180°), pairs with `extend-video` for longer clips.
Pick for: product spins, physics-accurate motion, scenes where "no other motion" must hold.
Avoid for: audio-driven dialog — use **Wan 2-7** or **HappyHorse**.

**Veo 3-1 Fast** — [google-deepmind/veo-3-1/fast/image-to-video](https://www.runcomfy.com/models/google-deepmind/veo-3-1/fast/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Faster Veo 3-1 variant.
Pick for: iteration on Veo compositions.
Avoid for: hero delivery — use full **Veo 3-1**.

**Kling 3.0 4K I2V** — [kling/kling-3.0/4k/image-to-video](https://www.runcomfy.com/models/kling/kling-3.0/4k/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Multi-shot character identity, 4K output from a still.
Pick for: 4K hero shots, character-narrative cuts.
Avoid for: cost iteration — drop to Pro or Standard.

**Kling 3.0 Pro I2V** — [kling/kling-3.0/pro/image-to-video](https://www.runcomfy.com/models/kling/kling-3.0/pro/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Default Kling 3.0 quality tier.
Pick for: high-quality i2v at moderate cost.
Avoid for: 4K final delivery.

**Kling 3.0 Standard I2V** — [kling/kling-3.0/standard/image-to-video](https://www.runcomfy.com/models/kling/kling-3.0/standard/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Cheapest 3.0 i2v tier.
Pick for: concepting / drafts on Kling 3.0.
Avoid for: final delivery.

**Hailuo 2-3 Pro** — [minimax/hailuo-2-3/pro/image-to-video](https://www.runcomfy.com/models/minimax/hailuo-2-3/pro/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

MiniMax Hailuo latest — natural motion, strong on real-world subjects.
Pick for: lifelike motion of real-people / real-product subjects.
Avoid for: stylized characters — use Kling or Dreamina.

**Dreamina 3-0 Pro** — [bytedance/dreamina-3-0/pro/image-to-video](https://www.runcomfy.com/models/bytedance/dreamina-3-0/pro/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

ByteDance Dreamina i2v — illustration / stylized character lean.
Pick for: animating illustrated heroes, painterly stills.
Avoid for: photoreal motion.

**Seedance 1-0 Pro Fast** — [bytedance/seedance-1-0/pro/fast/image-to-video](https://www.runcomfy.com/models/bytedance/seedance-1-0/pro/fast/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Older Seedance i2v generation, cheap.
Pick for: cost-sensitive batch i2v on Seedance.
Avoid for: new work — Seedance v2 Pro is more capable (t2v + i2v + multi-modal).

### Extend an existing video — newest first

**Veo 3-1 Extend** — [google-deepmind/veo-3-1/extend-video](https://www.runcomfy.com/models/google-deepmind/veo-3-1/extend-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Continue an existing Veo clip with consistent motion / lighting / identity.
Pick for: extending a video past Veo's per-call duration cap; chained narrative shots.

**Veo 3-1 Fast Extend** — [google-deepmind/veo-3-1/fast/extend-video](https://www.runcomfy.com/models/google-deepmind/veo-3-1/fast/extend-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Faster Veo extend variant.
Pick for: extending Veo Fast clips at matching latency tier.

For dedicated treatment of extend (input video preparation, frame-anchor strategy, chained extends), see the [video-extend](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/video-extend) skill.

## t2v Route 1: HappyHorse 1.0 — default

**Model**: `happyhorse/happyhorse-1-0/text-to-video`
**Catalog**: [happyhorse-1-0](https://www.runcomfy.com/models/happyhorse/happyhorse-1-0/text-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Currently #1 on the [Artificial Analysis Video Arena](https://artificialanalysis.ai/text-to-video) — RunComfy's recommended default for general-purpose t2v. Native synchronized audio is generated in-pass (no separate Foley step).

### Schema

Field
Type
Required
Default
Notes

`prompt`
string
yes
—
Subject-first, describe motion + scene + audio in one declarative

`duration`
int
no
5
Seconds. Up to ~15s

`aspect_ratio`
enum
no
`16:9`
`16:9`, `9:16`, `1:1` typical

`resolution`
enum
no
`1080p`
`720p`, `1080p`

`seed`
int
no
—
Reproducibility

### Invoke

runcomfy run happyhorse/happyhorse-1-0/text-to-video \

--input '{

"prompt": "A red kite tumbles across a windy beach at golden hour, kids chasing it laughing, surf in the background. Audio: wind, gulls, distant laughter.",

"duration": 8,

"aspect_ratio": "16:9",

"resolution": "1080p"

}' \

--output-dir ./out


### Prompting tips

- **Lead with subject and one main action.** "A red kite tumbles across a beach" — verb-driven, not adjective-stacked.

- **Describe audio inline** — `"Audio: wind, gulls, distant laughter."` HappyHorse generates audio in-pass.

- **Motion language matters more than visual nouns** — "tumbles", "drifts", "snaps into focus" > "looks beautiful".

- **Multi-shot:** describe transitions explicitly — "Then the camera cuts to …" — Arena-leading multi-shot consistency.

## t2v Route 2: Wan 2-7 — open weights + audio-driven lip-sync

**Model**: `wan-ai/wan-2-7/text-to-video`
**Catalog**: [wan-2-7](https://www.runcomfy.com/models/wan-ai/wan-2-7?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation) · [wan-models collection](https://www.runcomfy.com/models/collections/wan-models?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Pick Wan 2-7 when you have a specific voiceover / dialog audio file and want the on-screen subject's mouth to sync to it. The `audio_url` field drives the lip motion.

### Invoke

**With audio-driven lip-sync:**

runcomfy run wan-ai/wan-2-7/text-to-video \

--input '{

"prompt": "Studio portrait of a woman in her 30s speaking confidently to camera, soft window light.",

"audio_url": "https://your-cdn.example/voiceover.mp3",

"duration": 6

}' \

--output-dir ./out


**Plain t2v (no audio):**

runcomfy run wan-ai/wan-2-7/text-to-video \

--input '{"prompt": "Drone shot over forest canopy at sunrise, soft fog drifting between trees"}' \

--output-dir ./out


### Prompting tips

- **For lip-sync**, the prompt describes the **scene + speaker**; the audio file drives the mouth. Don't transcribe the audio into the prompt — it'll fight the audio track.

- **Open-weights advantage**: pair with Wan ecosystem (LoRA-finetuned variants) when available.

## t2v Route 3: Seedance v2 — multi-modal cinematic

**Model**: `bytedance/seedance-v2/pro` (or `/fast`)
**Catalog**: [seedance-v2 Pro](https://www.runcomfy.com/models/bytedance/seedance-v2/pro?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation) · [seedance collection](https://www.runcomfy.com/models/collections/seedance?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Pick Seedance v2 Pro when the user needs **multi-modal conditioning** — up to **9 reference images, 3 reference videos, 3 reference audio tracks** synthesized in-pass with cinematic motion refinement.

### Invoke

runcomfy run bytedance/seedance-v2/pro \

--input '{

"prompt": "Anamorphic 35mm shot — a vintage car drives down a coastal road at dusk, lens flares from oncoming headlights, cinematic color grade.",

"duration": 10,

"aspect_ratio": "21:9"

}' \

--output-dir ./out


### Prompting tips

- **Lens / film language is honored** — "35mm anamorphic", "shallow DoF", "soft halation", "Kodak 5219" all land.

- **Multi-ref:** describe roles explicitly — `"subject from ref image 1, mood from ref video 2, score from ref audio 1"`.

- **Cinematic motion verbs:** "tracking shot", "push in", "dolly out", "rack focus".

## i2v Route A: HappyHorse 1.0 I2V — default

**Model**: `happyhorse/happyhorse-1-0/image-to-video`
**Catalog**: [happyhorse-1-0 i2v](https://www.runcomfy.com/models/happyhorse/happyhorse-1-0/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

### Invoke

runcomfy run happyhorse/happyhorse-1-0/image-to-video \

--input '{

"image_url": "https://your-cdn.example/portrait.jpg",

"prompt": "She turns her head slowly to look at the camera and smiles. Wind through her hair. Audio: gentle breeze.",

"duration": 6,

"aspect_ratio": "9:16"

}' \

--output-dir ./out


### Prompting tips

- **Describe motion**, not the scene the image already shows. The image is your scene; the prompt is your direction.

- **Anchor the camera explicitly** — "Camera stays still" prevents drift; "slow push in" gives intent.

- **Audio in the same prompt** as t2v Route 1.

## i2v Route B: Veo 3-1 — Google's flagship

**Model**: `google-deepmind/veo-3-1/image-to-video` (or `/fast/image-to-video`)
**Catalog**: [veo-3-1 i2v](https://www.runcomfy.com/models/google-deepmind/veo-3-1/image-to-video?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation) · [veo-3 collection](https://www.runcomfy.com/models/collections/veo-3?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Pick Veo when physics / realism / object permanence matters most. Veo 3-1 supports both 8s clips and longer with the **extend-video** companion endpoint.

### Invoke

runcomfy run google-deepmind/veo-3-1/image-to-video \

--input '{

"image_url": "https://your-cdn.example/product.jpg",

"prompt": "The bottle slowly rotates 180 degrees on a marble surface, soft daylight, no other motion."

}' \

--output-dir ./out


### Prompting tips

- **Veo respects physics** — "the bottle rotates 180 degrees" gets exactly 180°.

- **Object permanence is strong** — say "no other motion" and other elements stay locked.

- For audio-enabled i2v, see Route A (HappyHorse) instead — Veo's audio path lives elsewhere in the catalog.

## i2v Route C: Kling 3.0 — multi-shot identity, 4K

**Model**: `kling/kling-3.0/{4k,pro,standard}/image-to-video`
**Catalog**: [kling collection](https://www.runcomfy.com/models/collections/kling?utm_source=skills.sh&#x26;utm_medium=skill&#x26;utm_campaign=ai-video-generation)

Three tiers — pick by quality / cost trade-off:

Tier
Endpoint
When

4K
`kling/kling-3.0/4k/image-to-video`
Hero shots, final delivery at 4K

Pro
`kling/kling-3.0/pro/image-to-video`
Default — high quality at lower cost

Standard
`kling/kling-3.0/standard/image-to-video`
Concepting, drafts

### Invoke

runcomfy run kling/kling-3.0/pro/image-to-video \

--input '{

"image_url": "https://your-cdn.example/character.jpg",

"prompt": "The character walks toward the camera, soft handheld feel, end on a medium close-up."

}' \

--output-dir ./out

ai-video-generation

SKILL.md

3. Generate

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers