speakturbo-tts

Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices. Delivers audio in approximately 90ms after daemon warmup, with first run taking 2-5 seconds for model initialization Includes 8 pre-configured voices (alba, marius, javert, jean, fantine, cosette, eponine, azelma) accessible via simple command-line flags Supports file output with configurable directory allowlisting, quiet mode, and UTF-8 text input including long-form content Auto-starting daemon with 1-hour idle shutdown; use the speak skill instead for voice cloning and emotion tags

INSTALLATION
npx skills add https://github.com/emzod/speak-turbo --skill speakturbo-tts
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2a

The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.

# First run (slow - daemon starting)

speakturbo "Starting up"  # ~2-5 seconds

# Second run (fast - daemon already running)

speakturbo "Now I'm fast"  # ~90ms

Usage

# Basic - plays immediately (default voice: alba)

speakturbo "Hello world"

# Save to file (no audio playback)

speakturbo "Hello" -o output.wav

# Save to specific file

speakturbo "Goodbye" -o goodbye.wav

# Quiet mode (suppress status messages, still plays audio)

speakturbo "Hello" -q

# List available voices

speakturbo --list-voices

Available Voices

Voice

Type

alba

Female (default)

marius

Male

javert

Male

jean

Male

fantine

Female

cosette

Female

eponine

Female

azelma

Female

Performance

Metric

Value

Time to first sound

~90ms (daemon warm)

First run

2-5s (daemon startup)

Real-time factor

~4x faster

Sample rate

24kHz mono

Architecture

speakturbo (Rust CLI, 2.2MB)

    │

    │ HTTP streaming (port 7125)

    ▼

speakturbo-daemon (Python + pocket-tts)

    │

    │ Model in memory, auto-shutdown after 1hr idle

    ▼

Audio playback (rodio)

Text Input

  • Encoding: UTF-8
  • Quotes in text: Use escaping: speakturbo "She said \"hello\""
  • Long text: Supported, streams as it generates

Output Path Security

The -o flag only writes to directories that are on the allowlist. By default, these are:

  • /tmp and system temp directories
  • Your current working directory
  • ~/.speakturbo/

If you need to write elsewhere, use --allow-dir:

speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path

To permanently allow a directory, add it to ~/.speakturbo/config:

mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config

The config file is one directory per line. Lines starting with # are comments.

Exit Codes

Code

Meaning

0

Success (audio played/saved)

1

Error (daemon connection failed, invalid args)

When to Use

Use speakturbo when:

  • You need instant audio feedback (~90ms)
  • Speed matters more than voice variety
  • Built-in voices are sufficient

**Use speak instead when:**

  • You need custom voice cloning (Morgan Freeman, etc.)

speak "text" --voice ~/.chatter/voices/morgan_freeman.wav

  • You need emotion tags like [laugh], [sigh]
  • Quality/variety matters more than speed

See the speak skill documentation for full usage.

Troubleshooting

No audio plays:

# Check daemon is running

curl http://127.0.0.1:7125/health

# Expected: {"status":"ready","voices":["alba","marius",...]}

# Verify by saving to file and playing manually

speakturbo "test" -o /tmp/test.wav

afplay /tmp/test.wav  # macOS

aplay /tmp/test.wav   # Linux

Daemon won't start:

# Check port availability

lsof -i :7125

# Manually kill and restart

pkill -f "daemon_streaming"

speakturbo "test"  # Auto-restarts daemon

First run is slow:

This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).

Daemon Management

The daemon auto-starts on first use and auto-shuts down after 1 hour idle.

# Check status

curl http://127.0.0.1:7125/health

# Manual stop

pkill -f "daemon_streaming"

# View logs

cat /tmp/speakturbo.log

Comparison with speak

Feature

speakturbo

speak

Time to first sound

~90ms

~4-8s

Voice cloning

Emotion tags

Voices

8 built-in

Custom wav files

Engine

pocket-tts

Chatterbox

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card