SKILL.md
$27
Regenerate a single scene
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text"
List available voices and character presets
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters
## Character Presets
Use character presets for more natural voiceovers instead of literal screen text reading:
| Character | Description | Best For |
|-----------|-------------|----------|
| `literal` | Reads text exactly as written | Screen text, quotes |
| `narrator` | Professional storyteller, smooth, engaging | Explainers, documentaries |
| `salesperson` | Enthusiastic, persuasive, energetic | Marketing, ads |
| `expert` | Authoritative, confident, knowledgeable | Legal content, tutorials |
| `conversational` | Casual, friendly, natural | Social media, casual content |
| `dramatic` | Intense, emotional, impactful | Hooks, problem statements |
| `calm` | Soothing, reassuring, gentle | Trust-building, conclusions |
Use narrator style globally
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/
Or set per-scene in scenes.json
{
"scenes": [
{ "id": "scene1", "text": "Problem statement", "character": "dramatic" },
{ "id": "scene2", "text": "Solution", "character": "calm" }
]
}
## Scene-Based Generation with Request Stitching
Generate multiple scenes with consistent prosody using ElevenLabs request stitching:
### scenes.json Format
{
"name": "product-demo",
"voice": "George",
"character": "narrator",
"scenes": [
{
"id": "scene1",
"text": "Generic text-to-speech sounds robotic. Your brand deserves better.",
"duration": 4.5,
"character": "dramatic"
},
{
"id": "scene2",
"text": "With voice cloning, you can use your own voice for unlimited content.",
"duration": 5.5
},
{
"id": "scene3",
"text": "Record a short sample. Clone it. Create professional voiceovers in minutes.",
"duration": 6,
"delay": 0.3
}
]
}
### Generate All Scenes
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/product-demo-scenes.json \
--output-dir public/audio/product-demo/
This creates:
- `product-demo-scene1.mp3` through `sceneN.mp3`
- `product-demo-combined.mp3` (all scenes stitched)
- `product-demo-info.json` (metadata with durations)
### Single Scene Regeneration
If a scene starts too early, has wrong timing, or needs different text:
Regenerate scene2 with new text
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene2 \
--new-text "Updated scene 2 text" \
--output-dir public/audio/project/
Regenerate scene3 with different character
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene3 \
--character salesperson \
--output-dir public/audio/project/
Just regenerate (same text, same character)
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene1 \
--output-dir public/audio/project/
Embed a thumbnail into an MP4 video
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-with-thumb.mp4
The tool automatically:
- Uses request stitching from previous scenes for consistent prosody
- Updates the info.json file with new metadata
- Updates scenes.json if `--new-text` is provided
## Thumbnail Embedding
Embed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.
### Embed Thumbnail into Video
Basic usage - outputs to video-thumb.mp4
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png
Custom output path
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png \
--output public/videos/promo-final.mp4
### Workflow with Remotion
1. Render your video
npx remotion render MyVideo public/videos/my-video.mp4
2. Render your thumbnail (use Still composition)
npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png
3. Embed the thumbnail
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-final.mp4
### Supported Formats
- **Video**: MP4 (H.264/H.265)
- **Thumbnail**: PNG, JPG, JPEG
The embedding uses ffmpeg's `-disposition:v:1 attached_pic` flag to set the thumbnail as an attached picture, which most video players and platforms recognize.
## Timing Validation
The skill automatically validates timing after generation using `ffprobe`:
### What It Checks
Check
Threshold
Description
Duration mismatch
>15%
Warns if actual differs from expected duration
Leading silence
>200ms
Audio starts late (voiceover delayed)
Trailing silence
>500ms
Unnecessary silence at end
Speaking rate
2-4.5 wps
Optimal ~3 words/second
### Validate Existing Audio
Validate all scenes in a project
node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/
Output example:
🔍 Validating product-demo (6 scenes)
❌ scene1: 3.00s (expected: 4.5s)
❌ Audio 1.50s shorter than expected
👍 8 words @ 3.1 words/sec
⚠️ scene2: 6.35s (expected: 5.5s)
⚠️ Leading silence: 235ms (may start late)
🐢 10 words @ 1.8 words/sec
✅ scene4: 4.36s (expected: 4s)
👍 9 words @ 2.3 words/sec
📊 Total duration: 30.80s (expected: 30.00s)
### Updated info.json
After validation, the info.json includes actual measurements:
{
"scenes": [
{
"id": "scene1",
"duration": 4.5,
"actualDuration": 3.0,
"leadingSilence": 0.05,
"wordsPerSecond": 3.1
}
]
}
Use `actualDuration` in your Remotion composition for precise sync.
## Options
Option
Description
Default
`--text`, `-t`
Text to convert to speech
Required (or --file/--scenes)
`--file`, `-f`
Read text from file
-
`--output`, `-o`
Output file path
`output.mp3`
`--output-dir`
Output directory for scenes
`public/audio`
`--voice`, `-v`
Voice name or ID
`George`
`--model`, `-m`
Model ID
`eleven_multilingual_v2`
`--character`, `-c`
Character preset
`literal`
`--scenes`
JSON file with scenes
-
`--scene`
Regenerate single scene ID
-
`--new-text`
New text for scene regen
-
`--validate`
Validate existing audio dir
-
`--skip-validation`
Skip auto-validation
false
`--embed-thumbnail`
Video file to embed thumbnail into
-
`--thumbnail`
Thumbnail image file (PNG/JPG)
-
`--stability`
Voice stability (0-1)
varies by character
`--similarity`
Voice similarity (0-1)
varies by character
`--style`
Style exaggeration (0-1)
varies by character
`--no-combined`
Skip combined file
false
## Recommended Voices
Voice
Style
Best For
`George`
Warm, captivating British
Narration, explainers
`Antoni`
Professional, warm
Legal content, tutorials
`Arnold`
Authoritative, deep
Corporate, serious topics
`Josh`
Friendly, conversational
Marketing, casual content
## Integration with Remotion
After generating scene voiceovers, use them in your composition:
import { Audio, Sequence, staticFile } from "remotion";
// Use individual scene audio files for precise sync
const SCENE_DURATIONS = {
scene1: 4.5, // From info.json
scene2: 5.5,
scene3: 8.0,
};
export const VideoWithVoiceover: React.FC = () => {
const { fps } = useVideoConfig();
const scene1Frames = Math.round(SCENE_DURATIONS.scene1 * fps);
const scene2Frames = Math.round(SCENE_DURATIONS.scene2 * fps);
return (
<>
<Sequence from={0} durationInFrames={scene1Frames}>
<Audio src={staticFile("audio/project/project-scene1.mp3")} />
<Scene1Visual />
</Sequence>
<Sequence from={scene1Frames} durationInFrames={scene2Frames}>
<Audio src={staticFile("audio/project/project-scene2.mp3")} />
<Scene2Visual />
</Sequence>
</>
);
};
## Tips for Best Results
- **Use character presets**: Don't read screen text literally - use `narrator` or `expert` for natural flow
- **Punctuation matters**: Use periods for pauses, commas for brief breaks
- **Numbers**: Write out numbers ("five hundred" not "500") for natural speech
- **Abbreviations**: Write full words ("twenty-four hours" not "24h")
- **Scene-by-scene**: Different scenes can have different characters (dramatic intro, calm CTA)
- **Fine-tune**: Use `--scene` to regenerate individual scenes without redoing everything
- **Request stitching**: Keeps voice consistent across all scenes
## Workflow Example
1. Create scenes.json with your script
2. Generate all scenes with narrator style
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/my-video-scenes.json \
--character narrator \
--output-dir public/audio/my-video/
3. Preview in Remotion, notice scene2 starts too early
4. Regenerate just scene2 with updated text
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/my-video-scenes.json \
--scene scene2 \
--new-text "Slightly longer text to fill the visual timing" \
--output-dir public/audio/my-video/