SKILL.md

Transcription Automation

Comprehensive skill for automating audio/video transcription and content processing.

Core Workflows

1. Transcription Pipeline

TRANSCRIPTION FLOW:

┌─────────────────┐

│  Audio/Video    │

│     Input       │

└────────┬────────┘

         ▼

┌─────────────────┐

│  Pre-Processing │

│  - Convert      │

│  - Enhance      │

│  - Split        │

└────────┬────────┘

         ▼

┌─────────────────┐

│  Transcription  │

│  - STT Engine   │

│  - Diarization  │

└────────┬────────┘

         ▼

┌─────────────────┐

│ Post-Processing │

│  - Format       │

│  - Timestamps   │

│  - Speakers     │

└────────┬────────┘

         ▼

┌─────────────────┐

│     Output      │

│  - Text/SRT/VTT │

│  - Summary      │

└─────────────────┘

2. Transcription Configuration

transcription_config:

  engine: whisper  # whisper, assembly_ai, deepgram

  audio_settings:

    sample_rate: 16000

    channels: mono

    format: wav

  transcription:

    language: auto  # or specific: en, zh, es

    model: large  # tiny, base, small, medium, large

    task: transcribe  # transcribe or translate

  features:

    speaker_diarization: true

    word_timestamps: true

    punctuation: true

    profanity_filter: false

  output:

    formats:

      - txt

      - srt

      - vtt

      - json

    include_confidence: true

    include_timestamps: true

Meeting Transcription

Meeting Notes Template

meeting_transcript:

  metadata:

    title: "{{meeting_title}}"

    date: "{{date}}"

    duration: "{{duration}}"

    attendees: "{{speakers}}"

  output_template: |

    # {{title}}

    **Date:** {{date}}

    **Duration:** {{duration}}

    **Attendees:** {{attendees}}

    ## Summary

    {{ai_summary}}

    ## Key Points

    {{#each key_points}}

    - {{this}}

    {{/each}}

    ## Action Items

    {{#each action_items}}

    - [ ] {{task}} - @{{assignee}} - Due: {{due_date}}

    {{/each}}

    ## Full Transcript

    {{#each segments}}

    **[{{timestamp}}] {{speaker}}:** {{text}}

    {{/each}}

Speaker Diarization

diarization_config:

  min_speakers: 2

  max_speakers: 10

  speaker_labels:

    - name: "Speaker 1"

      voice_sample: "sample_1.wav"  # Optional

    - name: "Speaker 2"

      voice_sample: "sample_2.wav"

  output_format:

    speaker_prefix: true

    speaker_timestamps: true

  example_output: |

    [00:00:05] SPEAKER_1: Welcome everyone to today's meeting.

    [00:00:12] SPEAKER_2: Thanks for having us.

    [00:00:18] SPEAKER_1: Let's start with the agenda.

Subtitle Generation

SRT Format

subtitle_config:

  format: srt

  timing:

    max_duration: 7  # seconds per subtitle

    min_gap: 0.1     # seconds between subtitles

    chars_per_line: 42

    max_lines: 2

  style:

    case: sentence  # sentence, upper, lower

    numbers: words  # words, digits

  example_output: |

    1

    00:00:05,000 --> 00:00:08,500

    Welcome to today's presentation

    about transcription automation.

    2

    00:00:09,000 --> 00:00:12,000

    Let me start by explaining

    the basic concepts.

VTT Format

vtt_config:

  format: vtt

  features:

    cue_settings: true

    styling: true

  example_output: |

    WEBVTT

    00:00:05.000 --> 00:00:08.500 align:center

    Welcome to today's presentation

    about transcription automation.

    00:00:09.000 --> 00:00:12.000 align:center

    <v Speaker 1>Let me start by explaining

    the basic concepts.

Integration Workflows

Zoom Integration

zoom_transcription:

  trigger:

    event: recording_completed

  workflow:

    - step: download_recording

      source: zoom_cloud

    - step: transcribe

      engine: whisper

      language: auto

    - step: diarize

      identify_speakers: true

    - step: generate_notes

      template: meeting_notes

      include_summary: true

      extract_action_items: true

    - step: distribute

      destinations:

        - notion_page

        - slack_channel

        - email_attendees

YouTube Integration

youtube_subtitles:

  trigger:

    event: video_uploaded

  workflow:

    - step: download_audio

      source: youtube_video

    - step: transcribe

      engine: whisper

      task: transcribe

    - step: generate_subtitles

      formats: [srt, vtt]

    - step: translate

      target_languages: [es, zh, ja, de, fr]

    - step: upload_subtitles

      destination: youtube

      as_cc: true

Podcast Processing

podcast_workflow:

  input:

    source: rss_feed

    format: audio/mp3

  processing:

    - transcribe:

        engine: whisper

        model: large

    - generate_chapters:

        detect_topics: true

        min_duration: 60  # seconds

    - create_show_notes:

        summarize: true

        extract_links: true

        highlight_quotes: true

    - create_searchable_index:

        full_text: true

        timestamps: true

  output:

    - transcript_txt

    - chapters_json

    - show_notes_md

    - search_index

Language Support

Multi-Language Transcription

multilingual:

  auto_detect: true

  supported_languages:

    - code: en

      name: English

      model: large

    - code: zh

      name: Chinese

      model: large

    - code: es

      name: Spanish

      model: large

    - code: ja

      name: Japanese

      model: medium

  translation:

    enabled: true

    target: en

    preserve_original: true

Code-Switching

code_switching:

  enabled: true

  primary_language: en

  secondary_languages: [zh, es]

  output: |

    [00:01:23] The next topic is about 人工智能,

    which has been muy importante in recent years.

  handling:

    detect_language_per_segment: true

    tag_language_switches: true

Quality Enhancement

Post-Processing

post_processing:

  text_cleanup:

    - remove_filler_words: ["um", "uh", "like"]

    - fix_common_errors: true

    - normalize_numbers: true

  formatting:

    - add_punctuation: true

    - capitalize_sentences: true

    - paragraph_breaks: true

  speaker_attribution:

    - merge_short_segments: true

    - min_segment_duration: 1.0

  output_enhancement:

    - add_timestamps: true

    - highlight_keywords: true

    - generate_summary: true

Accuracy Metrics

TRANSCRIPTION QUALITY REPORT

═══════════════════════════════════════

File: meeting_2024_01_15.mp3

Duration: 45:32

Engine: Whisper Large

METRICS:

Word Error Rate (WER):  4.2%

Character Error Rate:   2.8%

Confidence Score:       0.94

SPEAKER DIARIZATION:

Speakers Detected: 4

Diarization Accuracy: 91%

PROCESSING TIME:

Total: 8m 23s

Real-time Factor: 0.18x

DETECTED ISSUES:

• Low confidence at 12:34 (background noise)

• Overlapping speech at 23:45

• Unknown speaker at 34:12

API Examples

OpenAI Whisper

import openai

# Transcribe audio

with open("meeting.mp3", "rb") as audio_file:

    transcript = openai.Audio.transcribe(

        model="whisper-1",

        file=audio_file,

        response_format="verbose_json",

        timestamp_granularities=["word", "segment"]

    )

# Access results

for segment in transcript.segments:

    print(f"[{segment.start:.2f}] {segment.text}")

AssemblyAI

import assemblyai as aai

transcriber = aai.Transcriber()

config = aai.TranscriptionConfig(

    speaker_labels=True,

    auto_chapters=True,

    entity_detection=True

)

transcript = transcriber.transcribe(

    "https://example.com/meeting.mp3",

    config=config

)

for utterance in transcript.utterances:

    print(f"Speaker {utterance.speaker}: {utterance.text}")

Best Practices

Quality Audio: Clean input = better output

Choose Right Model: Balance speed vs accuracy

Use Diarization: Identify speakers clearly

Post-Process: Clean up automated output

Verify Critical Content: Human review important

Consider Privacy: Handle sensitive content

Store Efficiently: Compress and index

Provide Context: Vocabulary hints help

transcription-automation

SKILL.md

Transcription Automation

Core Workflows

1. Transcription Pipeline

2. Transcription Configuration

Meeting Transcription

Meeting Notes Template

Speaker Diarization

Subtitle Generation

SRT Format

VTT Format

Integration Workflows

Zoom Integration

YouTube Integration

Podcast Processing

Language Support

Multi-Language Transcription

Code-Switching

Quality Enhancement

Post-Processing

Accuracy Metrics

API Examples

OpenAI Whisper

AssemblyAI

Best Practices

Stop writing automation&scrapers

transcription-automation

SKILL.md

Transcription Automation

Core Workflows

1. Transcription Pipeline

2. Transcription Configuration

Meeting Transcription

Meeting Notes Template

Speaker Diarization

Subtitle Generation

SRT Format

VTT Format

Integration Workflows

Zoom Integration

YouTube Integration

Podcast Processing

Language Support

Multi-Language Transcription

Code-Switching

Quality Enhancement

Post-Processing

Accuracy Metrics

API Examples

OpenAI Whisper

AssemblyAI

Best Practices

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers