SKILL.md
$2a
The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.
# First run (slow - daemon starting)
speakturbo "Starting up" # ~2-5 seconds
# Second run (fast - daemon already running)
speakturbo "Now I'm fast" # ~90ms
Usage
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"
# Save to file (no audio playback)
speakturbo "Hello" -o output.wav
# Save to specific file
speakturbo "Goodbye" -o goodbye.wav
# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q
# List available voices
speakturbo --list-voices
Available Voices
Voice
Type
alba
Female (default)
marius
Male
javert
Male
jean
Male
fantine
Female
cosette
Female
eponine
Female
azelma
Female
Performance
Metric
Value
Time to first sound
~90ms (daemon warm)
First run
2-5s (daemon startup)
Real-time factor
~4x faster
Sample rate
24kHz mono
Architecture
speakturbo (Rust CLI, 2.2MB)
│
│ HTTP streaming (port 7125)
▼
speakturbo-daemon (Python + pocket-tts)
│
│ Model in memory, auto-shutdown after 1hr idle
▼
Audio playback (rodio)
Text Input
- Encoding: UTF-8
- Quotes in text: Use escaping:
speakturbo "She said \"hello\""
- Long text: Supported, streams as it generates
Output Path Security
The -o flag only writes to directories that are on the allowlist. By default, these are:
/tmpand system temp directories
- Your current working directory
~/.speakturbo/
If you need to write elsewhere, use --allow-dir:
speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path
To permanently allow a directory, add it to ~/.speakturbo/config:
mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config
The config file is one directory per line. Lines starting with # are comments.
Exit Codes
Code
Meaning
0
Success (audio played/saved)
1
Error (daemon connection failed, invalid args)
When to Use
Use speakturbo when:
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient
**Use speak instead when:**
- You need custom voice cloning (Morgan Freeman, etc.)
→ speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
- You need emotion tags like
[laugh],[sigh]
- Quality/variety matters more than speed
See the speak skill documentation for full usage.
Troubleshooting
No audio plays:
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}
# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav # macOS
aplay /tmp/test.wav # Linux
Daemon won't start:
# Check port availability
lsof -i :7125
# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test" # Auto-restarts daemon
First run is slow:
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).
Daemon Management
The daemon auto-starts on first use and auto-shuts down after 1 hour idle.
# Check status
curl http://127.0.0.1:7125/health
# Manual stop
pkill -f "daemon_streaming"
# View logs
cat /tmp/speakturbo.log
Comparison with speak
Feature
speakturbo
speak
Time to first sound
~90ms
~4-8s
Voice cloning
❌
✅
Emotion tags
❌
✅
Voices
8 built-in
Custom wav files
Engine
pocket-tts
Chatterbox