whisper-transcription

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from…

INSTALLATION
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill whisper-transcription
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Whisper Transcription

Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.

When to Use This Skill

  • Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
  • Video subtitles - Generate SRT/VTT files for YouTube, social media
  • Interview extraction - Pull quotes and insights from recorded calls
  • Content audit - Make audio/video libraries searchable
  • Translation - Transcribe and translate foreign language content

What Claude Does vs What You Decide

Claude Does

You Decide

Structures production workflow

Final creative direction

Suggests technical approaches

Equipment and tool choices

Creates templates and checklists

Quality standards

Identifies best practices

Brand/voice decisions

Generates script outlines

Final script approval

Dependencies

pip install openai-whisper torch ffmpeg-python click

# Also requires ffmpeg installed on system

# macOS: brew install ffmpeg

# Ubuntu: sudo apt install ffmpeg

Commands

Transcribe Single File

python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt

python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt

Batch Transcription

python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/

Transcribe + Translate

python scripts/main.py translate foreign-audio.mp3 --to en

Extract Timestamps

python scripts/main.py timestamps podcast.mp3 --format json

Examples

Example 1: Podcast to Blog Post

# Transcribe 1-hour podcast

python scripts/main.py transcribe episode-42.mp3 --model medium

# Output: episode-42.txt (full transcript with timestamps)

# Processing time: ~5 min for 1 hour audio on M1 Mac

Example 2: YouTube Subtitles

# Generate SRT for video upload

python scripts/main.py transcribe marketing-video.mp4 --format srt

# Output: marketing-video.srt

# Upload directly to YouTube/Vimeo

Example 3: Batch Process Interview Library

# Transcribe all recordings in folder

python scripts/main.py batch ./customer-interviews/ --model small --format txt

# Output: ./customer-interviews/*.txt (one per audio file)

Model Selection Guide

Model

Speed

Accuracy

VRAM

Best For

tiny

Fastest

~70%

1GB

Quick drafts, short clips

base

Fast

~80%

1GB

Social media clips

small

Medium

~85%

2GB

Podcasts, interviews

medium

Slow

~90%

5GB

Professional transcripts

large

Slowest

~95%

10GB

Critical accuracy needs

Recommendation: Start with small for most marketing content. Use medium for client deliverables.

Output Formats

Format

Extension

Use Case

txt

.txt

Blog posts, analysis

srt

.srt

Video subtitles (YouTube)

vtt

.vtt

Web video subtitles

json

.json

Programmatic access

tsv

.tsv

Spreadsheet analysis

Performance Tips

  • GPU acceleration - 10x faster with CUDA GPU
  • Audio extraction - Script auto-extracts audio from video
  • Chunking - Long files auto-split for memory efficiency
  • Language detection - Automatic, or specify with --language

Skill Boundaries

What This Skill Does Well

  • Structuring audio production workflows
  • Providing technical guidance
  • Creating quality checklists
  • Suggesting creative approaches

What This Skill Cannot Do

  • Replace audio engineering expertise
  • Make subjective creative decisions
  • Access or edit audio files directly
  • Guarantee commercial success

Related Skills

Skill Metadata

  • Mode: cyborg
category: automation

subcategory: audio-processing

dependencies: [openai-whisper, torch, ffmpeg-python]

difficulty: beginner

time_saved: 10+ hours/week
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card