nightingale-karaoke

ML-powered Karaoke app in Rust using Bevy, WhisperX, and Demucs for stem separation, lyrics transcription, and pitch scoring.

INSTALLATION
npx skills add https://github.com/aradotso/trending-skills --skill nightingale-karaoke
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Build from Source

Prerequisites:

  • Rust 1.85+ (edition 2024)
  • Linux additionally needs: libasound2-dev libudev-dev libwayland-dev libxkbcommon-dev
git clone https://github.com/rzru/nightingale

cd nightingale

# Development build

cargo build --release

# Run directly

./target/release/nightingale

Release Packaging

# Linux / macOS

scripts/make-release.sh

# Windows (PowerShell)

powershell -ExecutionPolicy Bypass -File scripts/make-release.ps1

Outputs a .tar.gz (Linux/macOS) or .zip (Windows) ready for distribution.

First Launch / Bootstrap

On first run, Nightingale downloads and configures:

  • ffmpeg binary
  • uv (Python package manager)
  • Python 3.10 via uv
  • PyTorch + WhisperX + audio-separator in a virtual environment
  • UVR Karaoke ONNX model and WhisperX large-v3 model

This takes 2–10 minutes depending on network speed. A progress screen is shown in-app.

To force re-bootstrap at any time:

./nightingale --setup

Bootstrap completion is marked by ~/.nightingale/vendor/.ready.

CLI Flags

Flag

Description

--setup

Force re-run of the first-launch bootstrap (re-downloads vendor deps)

Keyboard & Gamepad Controls

Navigation

Action

Keyboard

Gamepad

Move

Arrow keys

D-pad / Left stick

Confirm

Enter

A (South)

Back

Escape

B (East) / Start

Switch panel

Tab

Search

Type to filter

Playback

Action

Keyboard

Gamepad

Pause / Resume

Space

Start

Exit to menu

Escape

B (East)

Toggle guide vocals

G

Guide volume up/down

+ / -

Cycle background

T

Cycle video flavor

F

Toggle microphone

M

Next microphone

N

Toggle fullscreen

F11

Configuration

Main Config

Located at ~/.nightingale/config.json. Edit directly or via in-app settings.

{

  "music_folder": "/home/user/Music",

  "separator": "uvr",

  "guide_vocal_volume": 0.3,

  "background_theme": "plasma",

  "video_flavor": "nature",

  "default_profile": "Alice"

}

**separator options:** "uvr" (default, preserves backing vocals) | "demucs"

**background_theme options:** "plasma", "aurora", "waves", "nebula", "starfield", "video", "source_video"

**video_flavor options:** "nature", "underwater", "space", "city", "countryside"

Profiles

Located at ~/.nightingale/profiles.json:

{

  "profiles": [

    {

      "name": "Alice",

      "scores": {

        "blake3_hash_of_song": {

          "stars": 4,

          "score": 87250,

          "played_at": "2026-03-18T21:00:00Z"

        }

      }

    }

  ]

}

Pixabay Video Backgrounds (Dev)

API key is embedded in release builds. For local development, create .env at project root:

# .env

PIXABAY_API_KEY=$PIXABAY_API_KEY

The release script (make-release.sh) sources .env automatically.

Data Storage Layout

~/.nightingale/

├── cache/              # Per-song stems, transcripts, lyrics (keyed by blake3 hash)

├── config.json         # App settings

├── profiles.json       # Player profiles and per-song scores

├── videos/             # Pre-downloaded Pixabay video backgrounds

├── sounds/             # Sound effects

├── vendor/

│   ├── ffmpeg          # ffmpeg binary

│   ├── uv              # uv binary

│   ├── python/         # Python 3.10

│   ├── venv/           # ML virtualenv (WhisperX, Demucs, audio-separator)

│   ├── analyzer/       # Python analyzer scripts

│   └── .ready          # Bootstrap completion marker

└── models/

    ├── torch/          # Demucs model weights

    ├── huggingface/    # WhisperX large-v3 weights

    └── audio_separator/ # UVR Karaoke ONNX model

Cache keys are blake3 hashes of the source file — re-analysis only triggers if the file changes or is manually invalidated.

Supported File Formats

Audio: .mp3, .flac, .ogg, .wav, .m4a, .aac, .wma

Video: .mp4, .mkv, .avi, .webm, .mov, .m4v

Video files: audio track is extracted, vocals separated, original video plays as background automatically.

Hardware Acceleration

PyTorch backend is auto-detected:

Backend

Device

Notes

CUDA

NVIDIA GPU

Fastest; ~2–5 min/song

MPS

Apple Silicon

macOS; WhisperX alignment falls back to CPU

CPU

Any

Always works; ~10–20 min/song

UVR Karaoke model uses ONNX Runtime with CUDA (NVIDIA) or CoreML (Apple Silicon) automatically.

Processing Pipeline

Audio/Video file

       │

       ▼

 UVR Karaoke (ONNX) or Demucs (PyTorch)

       │  vocals.ogg + instrumental.ogg

       ▼

 LRCLIB API  ──▶  Synced lyrics fetch (if available)

       │

       ▼

 WhisperX large-v3  ──▶  Transcription + word-level timestamps

       │

       ▼

 Bevy App (Rust)

   - Plays instrumental audio

   - Synchronized word highlighting

   - Real-time pitch detection & scoring

   - GPU shader / video backgrounds

   - Scoreboards per profile

Code Patterns

Adding a New Background Theme (Bevy System)

// In your Bevy plugin, register a new background variant

use bevy::prelude::*;

#[derive(Component)]

pub struct MyCustomBackground;

pub fn spawn_custom_background(mut commands: Commands) {

    commands.spawn((

        MyCustomBackground,

        // ... your background components

    ));

}

pub struct CustomBackgroundPlugin;

impl Plugin for CustomBackgroundPlugin {

    fn build(&self, app: &mut App) {

        app.add_systems(OnEnter(AppState::Playing), spawn_custom_background);

    }

}

Extending Config Deserialization

use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]

pub struct NightingaleConfig {

    pub music_folder: String,

    #[serde(default = "default_separator")]

    pub separator: StemSeparator,

    #[serde(default = "default_guide_volume")]

    pub guide_vocal_volume: f32,

}

#[derive(Debug, Clone, Serialize, Deserialize, Default)]

#[serde(rename_all = "lowercase")]

pub enum StemSeparator {

    #[default]

    Uvr,

    Demucs,

}

fn default_guide_volume() -> f32 { 0.3 }

fn default_separator() -> StemSeparator { StemSeparator::Uvr }

// Load config

fn load_config() -> NightingaleConfig {

    let path = dirs::home_dir()

        .unwrap()

        .join(".nightingale/config.json");

    let raw = std::fs::read_to_string(&path).unwrap_or_default();

    serde_json::from_str(&raw).unwrap_or_default()

}

Triggering Re-analysis Programmatically

use std::fs;

use std::path::PathBuf;

/// Remove cached stems/transcript for a song to force re-analysis

fn invalidate_song_cache(song_hash: &str) {

    let cache_dir = dirs::home_dir()

        .unwrap()

        .join(".nightingale/cache")

        .join(song_hash);

    if cache_dir.exists() {

        fs::remove_dir_all(&cache_dir)

            .expect("Failed to remove cache directory");

        println!("Cache invalidated for {}", song_hash);

    }

}

Computing a Song's Blake3 Hash (for Cache Lookup)

use blake3::Hasher;

use std::fs::File;

use std::io::{BufReader, Read};

fn hash_file(path: &std::path::Path) -> String {

    let file = File::open(path).expect("Cannot open file");

    let mut reader = BufReader::new(file);

    let mut hasher = Hasher::new();

    let mut buf = [0u8; 65536];

    loop {

        let n = reader.read(&mut buf).unwrap();

        if n == 0 { break; }

        hasher.update(&buf[..n]);

    }

    hasher.finalize().to_hex().to_string()

}

Profile Score Update Pattern

use serde::{Deserialize, Serialize};

use std::collections::HashMap;

#[derive(Debug, Serialize, Deserialize)]

pub struct SongScore {

    pub stars: u8,

    pub score: u32,

    pub played_at: String,

}

#[derive(Debug, Serialize, Deserialize)]

pub struct Profile {

    pub name: String,

    pub scores: HashMap<String, SongScore>, // key = blake3 hash

}

fn update_score(profile: &#x26;mut Profile, song_hash: &#x26;str, stars: u8, score: u32) {

    profile.scores.insert(song_hash.to_string(), SongScore {

        stars,

        score,

        played_at: chrono::Utc::now().to_rfc3339(),

    });

}

Troubleshooting

Bootstrap Fails / Stuck on Setup Screen

# Force re-bootstrap

./nightingale --setup

# Or manually remove the vendor directory and restart

rm -rf ~/.nightingale/vendor

./nightingale

Song Analysis Hangs or Errors

# Check the analyzer venv is healthy

~/.nightingale/vendor/venv/bin/python -c "import whisperx; print('ok')"

# Re-bootstrap if broken

./nightingale --setup

macOS "App is damaged" Error

xattr -cr Nightingale.app

GPU Not Being Used

  • NVIDIA: Ensure CUDA drivers are installed and nvidia-smi shows your GPU.
  • Apple Silicon: MPS is used automatically on macOS with Apple Silicon; WhisperX alignment falls back to CPU (normal behavior).
  • Check ~/.nightingale/vendor/venv — if PyTorch installed the CPU-only build, re-bootstrap after installing CUDA drivers.

Cache Corruption / Wrong Lyrics

# Find the blake3 hash of your file (build a small tool or use b3sum)

b3sum /path/to/song.mp3

# Remove that song's cache

rm -rf ~/.nightingale/cache/<hash>

Then re-open the song in Nightingale to re-analyze.

Audio Playback Issues (Linux)

Ensure ALSA/PulseAudio/PipeWire is running. Install missing deps:

sudo apt install libasound2-dev libudev-dev libwayland-dev libxkbcommon-dev

Video Backgrounds Not Loading

Video backgrounds are pre-downloaded during setup via the Pixabay API. For development builds, ensure .env contains a valid PIXABAY_API_KEY. If videos are missing in a release build, run --setup to re-trigger the download.

Platform Targets

Platform

Target Triple

Linux x86_64

x86_64-unknown-linux-gnu

Linux aarch64

aarch64-unknown-linux-gnu

macOS ARM

aarch64-apple-darwin

macOS Intel

x86_64-apple-darwin

Windows x86_64

x86_64-pc-windows-msvc

Cross-compile with:

rustup target add aarch64-unknown-linux-gnu

cargo build --release --target aarch64-unknown-linux-gnu

License

GPL-3.0-or-later. See LICENSE.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card