transcription-automation

Automate audio/video transcription, meeting notes, subtitle generation, and content processing

INSTALLATION
npx skills add https://github.com/claude-office-skills/skills --skill transcription-automation
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Transcription Automation

Comprehensive skill for automating audio/video transcription and content processing.

Core Workflows

1. Transcription Pipeline

TRANSCRIPTION FLOW:

┌─────────────────┐

│  Audio/Video    │

│     Input       │

└────────┬────────┘

         ▼

┌─────────────────┐

│  Pre-Processing │

│  - Convert      │

│  - Enhance      │

│  - Split        │

└────────┬────────┘

         ▼

┌─────────────────┐

│  Transcription  │

│  - STT Engine   │

│  - Diarization  │

└────────┬────────┘

         ▼

┌─────────────────┐

│ Post-Processing │

│  - Format       │

│  - Timestamps   │

│  - Speakers     │

└────────┬────────┘

         ▼

┌─────────────────┐

│     Output      │

│  - Text/SRT/VTT │

│  - Summary      │

└─────────────────┘

2. Transcription Configuration

transcription_config:

  engine: whisper  # whisper, assembly_ai, deepgram

  audio_settings:

    sample_rate: 16000

    channels: mono

    format: wav

  transcription:

    language: auto  # or specific: en, zh, es

    model: large  # tiny, base, small, medium, large

    task: transcribe  # transcribe or translate

  features:

    speaker_diarization: true

    word_timestamps: true

    punctuation: true

    profanity_filter: false

  output:

    formats:

      - txt

      - srt

      - vtt

      - json

    include_confidence: true

    include_timestamps: true

Meeting Transcription

Meeting Notes Template

meeting_transcript:

  metadata:

    title: "{{meeting_title}}"

    date: "{{date}}"

    duration: "{{duration}}"

    attendees: "{{speakers}}"

  output_template: |

    # {{title}}

    **Date:** {{date}}

    **Duration:** {{duration}}

    **Attendees:** {{attendees}}

    ## Summary

    {{ai_summary}}

    ## Key Points

    {{#each key_points}}

    - {{this}}

    {{/each}}

    ## Action Items

    {{#each action_items}}

    - [ ] {{task}} - @{{assignee}} - Due: {{due_date}}

    {{/each}}

    ## Full Transcript

    {{#each segments}}

    **[{{timestamp}}] {{speaker}}:** {{text}}

    {{/each}}

Speaker Diarization

diarization_config:

  min_speakers: 2

  max_speakers: 10

  speaker_labels:

    - name: "Speaker 1"

      voice_sample: "sample_1.wav"  # Optional

    - name: "Speaker 2"

      voice_sample: "sample_2.wav"

  output_format:

    speaker_prefix: true

    speaker_timestamps: true

  example_output: |

    [00:00:05] SPEAKER_1: Welcome everyone to today's meeting.

    [00:00:12] SPEAKER_2: Thanks for having us.

    [00:00:18] SPEAKER_1: Let's start with the agenda.

Subtitle Generation

SRT Format

subtitle_config:

  format: srt

  timing:

    max_duration: 7  # seconds per subtitle

    min_gap: 0.1     # seconds between subtitles

    chars_per_line: 42

    max_lines: 2

  style:

    case: sentence  # sentence, upper, lower

    numbers: words  # words, digits

  example_output: |

    1

    00:00:05,000 --> 00:00:08,500

    Welcome to today's presentation

    about transcription automation.

    2

    00:00:09,000 --> 00:00:12,000

    Let me start by explaining

    the basic concepts.

VTT Format

vtt_config:

  format: vtt

  features:

    cue_settings: true

    styling: true

  example_output: |

    WEBVTT

    00:00:05.000 --> 00:00:08.500 align:center

    Welcome to today's presentation

    about transcription automation.

    00:00:09.000 --> 00:00:12.000 align:center

    <v Speaker 1>Let me start by explaining

    the basic concepts.

Integration Workflows

Zoom Integration

zoom_transcription:

  trigger:

    event: recording_completed

  workflow:

    - step: download_recording

      source: zoom_cloud

    - step: transcribe

      engine: whisper

      language: auto

    - step: diarize

      identify_speakers: true

    - step: generate_notes

      template: meeting_notes

      include_summary: true

      extract_action_items: true

    - step: distribute

      destinations:

        - notion_page

        - slack_channel

        - email_attendees

YouTube Integration

youtube_subtitles:

  trigger:

    event: video_uploaded

  workflow:

    - step: download_audio

      source: youtube_video

    - step: transcribe

      engine: whisper

      task: transcribe

    - step: generate_subtitles

      formats: [srt, vtt]

    - step: translate

      target_languages: [es, zh, ja, de, fr]

    - step: upload_subtitles

      destination: youtube

      as_cc: true

Podcast Processing

podcast_workflow:

  input:

    source: rss_feed

    format: audio/mp3

  processing:

    - transcribe:

        engine: whisper

        model: large

    - generate_chapters:

        detect_topics: true

        min_duration: 60  # seconds

    - create_show_notes:

        summarize: true

        extract_links: true

        highlight_quotes: true

    - create_searchable_index:

        full_text: true

        timestamps: true

  output:

    - transcript_txt

    - chapters_json

    - show_notes_md

    - search_index

Language Support

Multi-Language Transcription

multilingual:

  auto_detect: true

  supported_languages:

    - code: en

      name: English

      model: large

    - code: zh

      name: Chinese

      model: large

    - code: es

      name: Spanish

      model: large

    - code: ja

      name: Japanese

      model: medium

  translation:

    enabled: true

    target: en

    preserve_original: true

Code-Switching

code_switching:

  enabled: true

  primary_language: en

  secondary_languages: [zh, es]

  output: |

    [00:01:23] The next topic is about 人工智能,

    which has been muy importante in recent years.

  handling:

    detect_language_per_segment: true

    tag_language_switches: true

Quality Enhancement

Post-Processing

post_processing:

  text_cleanup:

    - remove_filler_words: ["um", "uh", "like"]

    - fix_common_errors: true

    - normalize_numbers: true

  formatting:

    - add_punctuation: true

    - capitalize_sentences: true

    - paragraph_breaks: true

  speaker_attribution:

    - merge_short_segments: true

    - min_segment_duration: 1.0

  output_enhancement:

    - add_timestamps: true

    - highlight_keywords: true

    - generate_summary: true

Accuracy Metrics

TRANSCRIPTION QUALITY REPORT

═══════════════════════════════════════

File: meeting_2024_01_15.mp3

Duration: 45:32

Engine: Whisper Large

METRICS:

Word Error Rate (WER):  4.2%

Character Error Rate:   2.8%

Confidence Score:       0.94

SPEAKER DIARIZATION:

Speakers Detected: 4

Diarization Accuracy: 91%

PROCESSING TIME:

Total: 8m 23s

Real-time Factor: 0.18x

DETECTED ISSUES:

• Low confidence at 12:34 (background noise)

• Overlapping speech at 23:45

• Unknown speaker at 34:12

API Examples

OpenAI Whisper

import openai

# Transcribe audio

with open("meeting.mp3", "rb") as audio_file:

    transcript = openai.Audio.transcribe(

        model="whisper-1",

        file=audio_file,

        response_format="verbose_json",

        timestamp_granularities=["word", "segment"]

    )

# Access results

for segment in transcript.segments:

    print(f"[{segment.start:.2f}] {segment.text}")

AssemblyAI

import assemblyai as aai

transcriber = aai.Transcriber()

config = aai.TranscriptionConfig(

    speaker_labels=True,

    auto_chapters=True,

    entity_detection=True

)

transcript = transcriber.transcribe(

    "https://example.com/meeting.mp3",

    config=config

)

for utterance in transcript.utterances:

    print(f"Speaker {utterance.speaker}: {utterance.text}")

Best Practices

  • Quality Audio: Clean input = better output
  • Choose Right Model: Balance speed vs accuracy
  • Use Diarization: Identify speakers clearly
  • Post-Process: Clean up automated output
  • Verify Critical Content: Human review important
  • Consider Privacy: Handle sensitive content
  • Store Efficiently: Compress and index
  • Provide Context: Vocabulary hints help
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card