SKILL.md
$27
audio = client.text_to_speech.convert(
text="Welcome to my video!",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
speed=1.0
)
)
save(audio, "voiceover.mp3")
### Models
| Model | Quality | SSML Support | Notes |
|-------|---------|--------------|-------|
| `eleven_multilingual_v2` | Highest consistency | None | Stable, production-ready, 29 languages |
| `eleven_flash_v2_5` | Good | `<break>`, `<phoneme>` | Fast, supports pause/pronunciation tags |
| `eleven_turbo_v2_5` | Good | `<break>`, `<phoneme>` | Fastest latency |
| `eleven_v3` | Most expressive | None | Alpha — unreliable, needs prompt engineering |
**Choose:** multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).
### Voice Settings by Style
| Style | stability | similarity | style | speed |
|-------|-----------|------------|-------|-------|
| Natural/professional | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
| Conversational | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
| Energetic/YouTuber | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |
### Pauses Between Sections
**With flash/turbo models:** Use SSML break tags inline:
...end of section. Start of next...
Max 3 seconds per break. Excessive breaks can cause speed artifacts.
**With multilingual_v2 / v3:** No SSML support. Options:
- Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
- Post-process with ffmpeg: split audio and insert silence
**WARNING:** `...` (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.
### Pronunciation Control
**Phonetic spelling (any model):** Write words as you want them pronounced:
- `Janus` → `Jan-us`
- `nginx` → `engine-x`
- Use dashes, capitals, apostrophes to guide pronunciation
**SSML phoneme tags (flash/turbo only):**
Janus
### Iterative Workflow
1. Generate → listen → identify pronunciation/pacing issues
2. Adjust: phonetic spellings, break tags, voice settings
3. Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.
## Voice Cloning
### Instant Voice Clone
with open("sample.mp3", "rb") as f:
voice = client.voices.ivc.create(
name="My Voice",
files=[f],
remove_background_noise=True
)
print(f"Voice ID: {voice.voice_id}")
- Use `client.voices.ivc.create()` (not `client.voices.clone()`)
- Pass file handles in binary mode (`"rb"`), not paths
- Convert m4a first: `ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3`
- Multiple samples (2-3 clips) improve accuracy
- Save voice ID for reuse
**Professional Voice Clone:** Requires Creator plan+, 30+ min audio. See [reference.md](https://github.com/digitalsamba/claude-code-video-toolkit/blob/HEAD/.claude/skills/elevenlabs/reference.md).
## Sound Effects
Max 22 seconds per generation.
result = client.text_to_sound_effects.convert(
text="Thunder rumbling followed by heavy rain",
duration_seconds=10,
prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
for chunk in result:
f.write(chunk)
**Prompt tips:** Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"
## Music Generation
10 seconds to 5 minutes. Use `client.music.compose()` (not `.generate()`).
result = client.music.compose(
prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
music_length_ms=60000,
force_instrumental=True
)
with open("music.mp3", "wb") as f:
for chunk in result:
f.write(chunk)
**Prompt structure:** Genre, mood, instruments, tempo, use case. Add "no vocals" or use `force_instrumental=True` for background music.
## Remotion Integration
### Complete Workflow: Script to Synchronized Scene
VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
↓ ↓ ↓ ↓
Scene narration Generate MP3 Audio files <Audio> component
with durations per scene with timing synced to scenes
### Step 1: Generate Per-Scene Audio
Use the toolkit's voiceover tool to generate audio for each scene:
Generate voiceover files for each scene
python tools/voiceover.py --scene-dir public/audio/scenes --json
Output:
public/audio/scenes/
├── scene-01-title.mp3
├── scene-02-problem.mp3
├── scene-03-solution.mp3
└── manifest.json (durations for each file)
The `manifest.json` contains timing info:
{
"scenes": [
{ "file": "scene-01-title.mp3", "duration": 4.2 },
{ "file": "scene-02-problem.mp3", "duration": 12.8 },
{ "file": "scene-03-solution.mp3", "duration": 15.3 }
],
"totalDuration": 32.3
}
### Step 2: Use Audio in Remotion Composition
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
// Import scene components
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';
// Scene durations (from manifest.json, converted to frames at 30fps)
const SCENE_DURATIONS = {
title: Math.ceil(4.2 * 30), // 126 frames
problem: Math.ceil(12.8 * 30), // 384 frames
solution: Math.ceil(15.3 * 30), // 459 frames
};
export const MainComposition: React.FC = () => {
return (
<>
{/ Scene sequence /}
<Series>
<Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
<TitleSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
<ProblemSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
<SolutionSlide />
</Series.Sequence>
</Series>
{/ Audio track - plays continuously across all scenes /}
<Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
{/ Optional: Background music at lower volume /}
<Audio src={staticFile('audio/music.mp3')} volume={0.15} />
</>
);
};
### Step 3: Per-Scene Audio (Alternative)
For more control, add audio to each scene individually:
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';
export const ProblemSlide: React.FC = () => {
const frame = useCurrentFrame();
return (
<div style={{ / slide styles / }}>
<h1>The Problem</h1>
{/ Scene content /}
{/ Audio starts when this scene starts (frame 0 of this sequence) /}
<Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
</div>
);
};
### Syncing Visuals to Voiceover
Calculate scene duration from audio, not the other way around:
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';
const FPS = 30;
// Convert audio durations to frame counts
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
acc[name] = Math.ceil(scene.duration * FPS);
return acc;
}, {} as Record<string, number>);
// Usage in composition:
// <Series.Sequence durationInFrames={sceneDurations.title}>
### Audio Timing Patterns
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
// Fade in audio
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
src,
fadeFrames = 30
}) => {
const frame = useCurrentFrame();
const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
extrapolateRight: 'clamp',
});
return <Audio src={src} volume={volume} />;
};
// Delayed audio start
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
src,
delayFrames
}) => (
<Sequence from={delayFrames}>
<Audio src={src} />
</Sequence>
);
// Usage:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
### Voiceover + Demo Video Sync
When a scene has both voiceover and demo video:
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
export const DemoScene: React.FC = () => {
const { durationInFrames, fps } = useVideoConfig();
// Calculate playback rate to fit demo into voiceover duration
const demoDuration = 45; // seconds (original demo length)
const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
const playbackRate = demoDuration / sceneDuration;
return (
<>
<OffthreadVideo
src={staticFile('demos/feature-demo.mp4')}
playbackRate={playbackRate}
/>
<Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
</>
);
};
### Error Handling
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';
export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
const [handle] = useState(() => delayRender());
const [audioReady, setAudioReady] = useState(false);
useEffect(() => {
const audio = new window.Audio(src);
audio.oncanplaythrough = () => {
setAudioReady(true);
continueRender(handle);
};
audio.onerror = () => {
console.error(Failed to load audio: ${src});
continueRender(handle); // Continue without audio rather than hang
};
}, [src, handle]);
if (!audioReady) return null;
return <Audio src={src} />;
};
### Toolkit Command: /generate-voiceover
The `/generate-voiceover` command handles the full workflow:
/generate-voiceover
- Reads VOICEOVER-SCRIPT.md
- Extracts narration for each scene
- Generates audio via ElevenLabs API
- Saves to public/audio/scenes/
- Creates manifest.json with durations
- Updates project.json with timing info