SKILL.md

$27

Regenerate a single scene

node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text"

List available voices and character presets

node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices

node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters

## Character Presets

Use character presets for more natural voiceovers instead of literal screen text reading:

| Character | Description | Best For |

|-----------|-------------|----------|

| `literal` | Reads text exactly as written | Screen text, quotes |

| `narrator` | Professional storyteller, smooth, engaging | Explainers, documentaries |

| `salesperson` | Enthusiastic, persuasive, energetic | Marketing, ads |

| `expert` | Authoritative, confident, knowledgeable | Legal content, tutorials |

| `conversational` | Casual, friendly, natural | Social media, casual content |

| `dramatic` | Intense, emotional, impactful | Hooks, problem statements |

| `calm` | Soothing, reassuring, gentle | Trust-building, conclusions |

Use narrator style globally

node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/

Or set per-scene in scenes.json

{

"scenes": [

{ "id": "scene1", "text": "Problem statement", "character": "dramatic" },

{ "id": "scene2", "text": "Solution", "character": "calm" }

]

}


## Scene-Based Generation with Request Stitching

Generate multiple scenes with consistent prosody using ElevenLabs request stitching:

### scenes.json Format

{

"name": "product-demo",

"voice": "George",

"character": "narrator",

"scenes": [

{

"id": "scene1",

"text": "Generic text-to-speech sounds robotic. Your brand deserves better.",

"duration": 4.5,

"character": "dramatic"

{

"id": "scene2",

"text": "With voice cloning, you can use your own voice for unlimited content.",

"duration": 5.5

{

"id": "scene3",

"text": "Record a short sample. Clone it. Create professional voiceovers in minutes.",

"duration": 6,

"delay": 0.3

}

]

}


### Generate All Scenes

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--scenes remotion/product-demo-scenes.json \

--output-dir public/audio/product-demo/


This creates:

- `product-demo-scene1.mp3` through `sceneN.mp3`

- `product-demo-combined.mp3` (all scenes stitched)

- `product-demo-info.json` (metadata with durations)

### Single Scene Regeneration

If a scene starts too early, has wrong timing, or needs different text:

Regenerate scene2 with new text

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--scenes remotion/scenes.json \

--scene scene2 \

--new-text "Updated scene 2 text" \

--output-dir public/audio/project/

Regenerate scene3 with different character

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--scenes remotion/scenes.json \

--scene scene3 \

--character salesperson \

--output-dir public/audio/project/

Just regenerate (same text, same character)

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--scenes remotion/scenes.json \

--scene scene1 \

--output-dir public/audio/project/

Embed a thumbnail into an MP4 video

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--embed-thumbnail public/videos/my-video.mp4 \

--thumbnail public/videos/my-thumbnail.png \

--output public/videos/my-video-with-thumb.mp4


The tool automatically:

- Uses request stitching from previous scenes for consistent prosody

- Updates the info.json file with new metadata

- Updates scenes.json if `--new-text` is provided

## Thumbnail Embedding

Embed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.

### Embed Thumbnail into Video

Basic usage - outputs to video-thumb.mp4

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--embed-thumbnail public/videos/promo.mp4 \

--thumbnail public/videos/thumbnail.png

Custom output path

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--embed-thumbnail public/videos/promo.mp4 \

--thumbnail public/videos/thumbnail.png \

--output public/videos/promo-final.mp4


### Workflow with Remotion

1. Render your video

npx remotion render MyVideo public/videos/my-video.mp4

2. Render your thumbnail (use Still composition)

npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png

3. Embed the thumbnail

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--embed-thumbnail public/videos/my-video.mp4 \

--thumbnail public/videos/my-thumbnail.png \

--output public/videos/my-video-final.mp4


### Supported Formats

- **Video**: MP4 (H.264/H.265)

- **Thumbnail**: PNG, JPG, JPEG

The embedding uses ffmpeg's `-disposition:v:1 attached_pic` flag to set the thumbnail as an attached picture, which most video players and platforms recognize.

## Timing Validation

The skill automatically validates timing after generation using `ffprobe`:

### What It Checks

Check
Threshold
Description

Duration mismatch
>15%
Warns if actual differs from expected duration

Leading silence
>200ms
Audio starts late (voiceover delayed)

Trailing silence
>500ms
Unnecessary silence at end

Speaking rate
2-4.5 wps
Optimal ~3 words/second

### Validate Existing Audio

Validate all scenes in a project

node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/


Output example:

🔍 Validating product-demo (6 scenes)

❌ scene1: 3.00s (expected: 4.5s)

❌ Audio 1.50s shorter than expected

👍 8 words @ 3.1 words/sec

⚠️ scene2: 6.35s (expected: 5.5s)

⚠️ Leading silence: 235ms (may start late)

🐢 10 words @ 1.8 words/sec

✅ scene4: 4.36s (expected: 4s)

👍 9 words @ 2.3 words/sec

📊 Total duration: 30.80s (expected: 30.00s)


### Updated info.json

After validation, the info.json includes actual measurements:

{

"scenes": [

{

"id": "scene1",

"duration": 4.5,

"actualDuration": 3.0,

"leadingSilence": 0.05,

"wordsPerSecond": 3.1

}

]

}


Use `actualDuration` in your Remotion composition for precise sync.

## Options

Option
Description
Default

`--text`, `-t`
Text to convert to speech
Required (or --file/--scenes)

`--file`, `-f`
Read text from file
-

`--output`, `-o`
Output file path
`output.mp3`

`--output-dir`
Output directory for scenes
`public/audio`

`--voice`, `-v`
Voice name or ID
`George`

`--model`, `-m`
Model ID
`eleven_multilingual_v2`

`--character`, `-c`
Character preset
`literal`

`--scenes`
JSON file with scenes
-

`--scene`
Regenerate single scene ID
-

`--new-text`
New text for scene regen
-

`--validate`
Validate existing audio dir
-

`--skip-validation`
Skip auto-validation
false

`--embed-thumbnail`
Video file to embed thumbnail into
-

`--thumbnail`
Thumbnail image file (PNG/JPG)
-

`--stability`
Voice stability (0-1)
varies by character

`--similarity`
Voice similarity (0-1)
varies by character

`--style`
Style exaggeration (0-1)
varies by character

`--no-combined`
Skip combined file
false

## Recommended Voices

Voice
Style
Best For

`George`
Warm, captivating British
Narration, explainers

`Antoni`
Professional, warm
Legal content, tutorials

`Arnold`
Authoritative, deep
Corporate, serious topics

`Josh`
Friendly, conversational
Marketing, casual content

## Integration with Remotion

After generating scene voiceovers, use them in your composition:

import { Audio, Sequence, staticFile } from "remotion";

// Use individual scene audio files for precise sync

const SCENE_DURATIONS = {

scene1: 4.5, // From info.json

scene2: 5.5,

scene3: 8.0,

};

export const VideoWithVoiceover: React.FC = () => {

const { fps } = useVideoConfig();

const scene1Frames = Math.round(SCENE_DURATIONS.scene1 * fps);

const scene2Frames = Math.round(SCENE_DURATIONS.scene2 * fps);

return (

</Sequence>

</Sequence>

</>

);

};


## Tips for Best Results

- **Use character presets**: Don't read screen text literally - use `narrator` or `expert` for natural flow

- **Punctuation matters**: Use periods for pauses, commas for brief breaks

- **Numbers**: Write out numbers ("five hundred" not "500") for natural speech

- **Abbreviations**: Write full words ("twenty-four hours" not "24h")

- **Scene-by-scene**: Different scenes can have different characters (dramatic intro, calm CTA)

- **Fine-tune**: Use `--scene` to regenerate individual scenes without redoing everything

- **Request stitching**: Keeps voice consistent across all scenes

## Workflow Example

1. Create scenes.json with your script

2. Generate all scenes with narrator style

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--scenes remotion/my-video-scenes.json \

--character narrator \

--output-dir public/audio/my-video/

3. Preview in Remotion, notice scene2 starts too early

4. Regenerate just scene2 with updated text

node .claude/skills/elevenlabs-remotion-skill/generate.js \

--scenes remotion/my-video-scenes.json \

--scene scene2 \

--new-text "Slightly longer text to fill the visual timing" \

--output-dir public/audio/my-video/

5. Update video composition with new duration from info.json

6. Repeat until timing is perfect

elevenlabs-remotion

SKILL.md

Regenerate a single scene

List available voices and character presets

Use narrator style globally

Or set per-scene in scenes.json

Regenerate scene2 with new text

Regenerate scene3 with different character

Just regenerate (same text, same character)

Embed a thumbnail into an MP4 video

Basic usage - outputs to video-thumb.mp4

Custom output path

1. Render your video

2. Render your thumbnail (use Still composition)

3. Embed the thumbnail

Validate all scenes in a project

1. Create scenes.json with your script

2. Generate all scenes with narrator style

3. Preview in Remotion, notice scene2 starts too early

4. Regenerate just scene2 with updated text

5. Update video composition with new duration from info.json

6. Repeat until timing is perfect

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers