lipsync

>

INSTALLATION
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill lipsync
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2c

3. Lipsync

runcomfy run /

--input '{"video_url": "...", "audio_url": "..."}'

--output-dir ./out

CLI deep dive: [`runcomfy-cli`](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/runcomfy-cli) skill.

## Consent

Driving a real person's mouth from a separate audio track is dual-use. Refuse user requests that target real public figures without consent, or that aim at defamatory or sexually explicit synthetic media. The skill itself does not gate inputs — the responsibility rests with the operator.

---

## Pick the right model

Listed newest first within each subtype. The agent picks one route based on: input shape (portrait still + audio vs source video + audio vs script-only), quality tier, and budget.

### Source video + audio → lip-synced video (mouth-swap on existing footage)

**Sync Labs sync v2 Pro** — `sync/sync/lipsync/v2/pro` *(default for premium)*

> Sync Labs' premium lip-sync — state-of-the-art mouth motion onto an existing video. Preserves the rest of the frame untouched.

> Pick for: hero-quality dubs, lipsync on professionally-shot video, foreign-language dubbing where mouth fidelity matters most.

> Avoid for: cost-sensitive batch jobs — drop to **sync v2**.

**Sync Labs sync v2** — [`sync/sync/lipsync/v2`](https://www.runcomfy.com/models/sync/sync/lipsync/v2?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

> Standard Sync Labs tier, same workflow as Pro.

> Pick for: scaled / batch lipsync jobs, drafts.

> Avoid for: hero delivery — use **v2 Pro**.

**Kling Lipsync (audio-to-video)** — [`kling/lipsync/audio-to-video`](https://www.runcomfy.com/models/kling/lipsync/audio-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

> Kling's lip-sync onto a source video, driven by an audio track.

> Pick for: Kling-pipeline integration; alternative to Sync Labs.

> Avoid for: top-tier mouth fidelity — Sync Labs Pro is the industry benchmark.

**Creatify Lipsync** — [`creatify/lipsync`](https://www.runcomfy.com/models/creatify/lipsync?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

> Creatify's lipsync endpoint.

> Pick for: Creatify-ecosystem workflows.

> Avoid for: comparison shopping unless cost / latency favors it.

### Portrait still + audio → talking-head video (avatar-style)

**OmniHuman** — `bytedance/omnihuman/api` *(default for avatar-style)*

> ByteDance's audio-driven full-body avatar. One portrait + one audio → video where the subject speaks / gestures naturally. Listed under RunComfy's `/feature/lip-sync` as the curated default.

> Pick for: UGC voiceover, virtual presenter, dubbed product demo from a single portrait.

> Avoid for: lip-sync onto an existing **video** (no portrait, want to preserve original motion) — use **Sync Labs v2** instead.

**Wan 2-7 with `audio_url`** — `wan-ai/wan-2-7/text-to-video`

> Open-weights t2v with `audio_url` field — prompt describes the scene, audio drives the mouth.

> Pick for: full scene control (not just a portrait) with a specific voiceover MP3 + open-weights pipeline.

> Avoid for: simplest "portrait talks" — use **OmniHuman**.

### Generate-and-sync from a script (no audio file available)

**Kling Lipsync (text-to-video)** — [`kling/lipsync/text-to-video`](https://www.runcomfy.com/models/kling/lipsync/text-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

> Generates speech audio in-pass from a script and syncs it to the resulting video.

> Pick for: "write a script → get a video with synced speech", no audio file needed.

> Avoid for: precise lip-sync to a specific MP3 (audio is regenerated each call, not locked).

**HappyHorse 1.0** — `happyhorse/happyhorse-1-0/text-to-video` (also `/image-to-video`)

> Arena #1 t2v / i2v with in-pass audio generated from prompt. Quote the spoken line inside the prompt with `says clearly: "…"`.

> Pick for: written script, in-pass audio with strong overall quality, social/UGC clips.

> Avoid for: locking mouth to a pre-recorded voiceover.

---

## Route 1: Sync Labs sync v2 / Pro — default for mouth-swap

**Model**: `sync/sync/lipsync/v2/pro` (or `sync/sync/lipsync/v2`)

**Catalog**: [sync v2 Pro](https://www.runcomfy.com/models/sync/sync/lipsync/v2/pro?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync) · [sync v2](https://www.runcomfy.com/models/sync/sync/lipsync/v2?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

### Invoke

runcomfy run sync/sync/lipsync/v2/pro \

--input '{

"video_url": "https://your-cdn.example/source-video.mp4",

"audio_url": "https://your-cdn.example/voiceover.mp3"

}' \

--output-dir ./out


### Tips

- **Source video provides everything except the mouth** — camera, lighting, background, body pose all preserved.

- **Audio quality drives mouth quality.** Clean voiceover (no music bed) → cleaner sync. Isolate voice stem if needed.

- **Match audio length to video length.** Significant audio/video duration mismatch leads to drift; trim audio or extend video first.

- Schema details on the [model page](https://www.runcomfy.com/models/sync/sync/lipsync/v2/pro?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync).

## Route 2: OmniHuman — default for avatar from still

**Model**: `bytedance/omnihuman/api`
**Catalog**: [omnihuman](https://www.runcomfy.com/models/bytedance/omnihuman/api?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

### Invoke

runcomfy run bytedance/omnihuman/api \

--input '{

"image_url": "https://your-cdn.example/portrait.jpg",

"audio_url": "https://your-cdn.example/voiceover.mp3"

}' \

--output-dir ./out


### Tips

- **Portrait framing works best** — head-and-shoulders or upper body.

- **No prompt** — the model derives everything from image + audio. Don't fight that.

- See the [ai-avatar-video](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/ai-avatar-video) skill for the full avatar treatment.

## Route 3: Kling Lipsync — Kling-ecosystem mouth sync

**Model**: `kling/lipsync/audio-to-video` (existing video + audio) or `kling/lipsync/text-to-video` (script-only)
**Catalog**: [Kling lipsync a2v](https://www.runcomfy.com/models/kling/lipsync/audio-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync) · [Kling lipsync t2v](https://www.runcomfy.com/models/kling/lipsync/text-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=lipsync)

### Invoke (audio-to-video variant)

runcomfy run kling/lipsync/audio-to-video \

--input '{

"video_url": "https://your-cdn.example/source-video.mp4",

"audio_url": "https://your-cdn.example/voiceover.mp3"

}' \

--output-dir ./out

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card