SKILL.md

Image-to-Video — Pro Pack on RunComfy

runcomfy.com · HappyHorse I2V · Wan 2.7 · Seedance 2.0 Pro · GitHub

Image-to-video, intent-routed. This skill doesn't lock you to one model — it picks the right i2v model in the RunComfy catalog based on what the user actually wants: portrait animation, custom-voiceover lip-sync, or multi-modal composition.

npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g

Pick the right model for the user's intent

User intent	Model	Why
Animate a portrait — keep identity stable	HappyHorse 1.0 I2V	#1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity
Product reveal / 360 / macro motion	HappyHorse 1.0 I2V	Geometry preservation + smooth camera moves
Native synchronized ambient audio in one pass	HappyHorse 1.0 I2V	In-pass audio synthesis
Animate and lip-sync to a custom voiceover track	Wan 2.7 + `audio_url`	Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it
Multi-language dub variants (same image, different audio per call)	Wan 2.7 + `audio_url`	Same shot, swap `audio_url` per language
Multi-modal — image + reference video + reference audio together	Seedance 2.0 Pro	Up to 9 image refs, 3 video refs (2–15s each), 3 audio refs
Brand-consistent narrative with character ref + scene ref + voice ref	Seedance 2.0 Pro	Image holds identity, video holds scene, audio holds voice
Default if unspecified	HappyHorse 1.0 I2V	Best all-round quality + native audio

The agent reads this table, classifies the user's intent, and picks the matching subsection below.

Prerequisites

RunComfy CLI — npm i -g @runcomfy/cli

RunComfy account — runcomfy login opens a browser device-code flow.

CI / containers — set RUNCOMFY_TOKEN=<token>.

A source image URL — JPEG/PNG/WebP, min 300px, ≤10MB; aspect 1:2.5 to 2.5:1 (HappyHorse) — other models have similar specs.

Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation

Model: happyhorse/happyhorse-1-0/image-to-video · Arena rank: #1 (Elo 1392)

Schema

Field	Type	Required	Default	Notes
`image_url`	string	yes	—	JPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB.
`prompt`	string	yes	—	≤5000 non-CJK or 2500 CJK chars. Motion / camera / lighting description.
`resolution`	enum	no	`1080P`	`720P` or `1080P`.
`duration`	int	no	5	3–15 seconds.
`seed`	int	no	0	Reuse for variant comparisons.
`watermark`	bool	no	true	Provider watermark toggle.

Output aspect = input aspect. No independent reframing.

Invoke

runcomfy run happyhorse/happyhorse-1-0/image-to-video \

  --input '{

    "image_url": "https://.../portrait.jpg",

    "prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."

  }' \

  --output-dir <absolute/path>

Prompting tips

Lead with motion verbs: "drift", "dolly in", "orbit", "tilt up", "reveal", "blink", "breathe". Front-load what's MOVING.

Don't restate the image — the model sees it. Focus tokens on what changes.

Preservation goals explicit: "identity-stable features", "packaging unchanged", "background geometry stable".

Lighting evolution: "rim light intensifying", "shadows shortening as camera rises".

One beat per clip — single primary motion (orbit OR dolly OR tilt OR character action).

Route 2: Wan 2.7 + audio_url — when the user has a custom voiceover

Model: wan-ai/wan-2-7/text-to-video (NOT /image-to-video — Wan 2.7's t2v endpoint accepts an audio_url that drives lip-sync)

Note on i2v with Wan 2.7: Wan 2.7's primary i2v animation isn't on a dedicated endpoint here. For pure i2v (image animated by motion prompt only), prefer HappyHorse i2v. Use Wan 2.7 specifically when the user has a custom audio track they want lip-synced to a generated talking-head clip.

Schema (Wan 2.7 t2v with audio)

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	Up to ~5000 chars. Describe the talking-head shot: framing, lighting, motion.
`audio_url`	string	yes (for lip-sync)	—	WAV/MP3, 3–30s, ≤15MB. Drives lip-sync.
`aspect_ratio`	enum	no	`16:9`	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`.
`resolution`	enum	no	`1080p`	`720p` or `1080p`.
`duration`	enum	no	`5`	2–15 (whole seconds). Match your audio length.
`negative_prompt`	string	no	—	Concrete issues to avoid (e.g. "no subtitles, no flicker").
`seed`	int	no	—	Reproducibility.

Invoke

runcomfy run wan-ai/wan-2-7/text-to-video \

  --input '{

    "prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",

    "audio_url": "https://.../voiceover-en.mp3",

    "duration": 12,

    "aspect_ratio": "9:16"

  }' \

  --output-dir <absolute/path>

Prompting tips

Describe the talking-head shot — framing, lighting, lens feel. The audio drives the lip-sync; the prompt builds the visual frame around it.

**Match duration to audio length** — clip will be silent past the audio if too long.

**Use negative_prompt for issues**: "no subtitles, no flicker, no distorted hands".

For multi-language dubs — same prompt, swap audio_url per call. Lock seed for visual consistency across languages.

Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)

Model: bytedance/seedance-v2/pro

Use when the user wants a single clip that combines: a subject image + scene from a reference video + voice tone from a reference audio.

Schema (Seedance 2.0 Pro, i2v-relevant fields)

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	CN ≤500 chars OR EN ≤1000 words.
`image_url`	array	yes (for i2v)	`[]`	0–9 images. First is the primary subject.
`video_url`	array	no	`[]`	0–3 reference clips (MP4/MOV), 2–15s each.

Limitations

Each route inherits its model's limits. HappyHorse: 15s cap, output aspect = input aspect. Wan 2.7: 15s cap, audio 3–30s/15MB. Seedance: 720p ceiling on this template, 15s cap.

No multi-route blending. This skill picks one model per call. If the user wants HappyHorse animation + Wan-style lip-sync in the same clip, that's two calls + a stitch (out of scope here).

Brand-specific overrides — if the user named a specific model variant not listed (e.g. Wan 2.6, Seedance 1.5), route to the corresponding brand skill (wan-2-7, seedance-v2) instead of forcing it through here.

Exit codes

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill picks one of HappyHorse 1.0 I2V / Wan 2.7 t2v+audio / Seedance 2.0 Pro based on user intent and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.

Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.

Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.

Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.

Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

image-to-video

SKILL.md

Image-to-Video — Pro Pack on RunComfy

Pick the right model for the user's intent

Prerequisites

Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation

Schema

Invoke

Prompting tips

Route 2: Wan 2.7 + audio_url — when the user has a custom voiceover

Schema (Wan 2.7 t2v with audio)

Invoke

Prompting tips

Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)

Schema (Seedance 2.0 Pro, i2v-relevant fields)

Limitations

Exit codes

How it works

Security & Privacy

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers