douyin-video-summary

Extract, transcribe, and summarize Douyin videos with local whisper.cpp processing. Handles both short ( v.douyin.com ) and direct Douyin URLs by parsing video IDs and intercepting audio via browser JavaScript Uses ffmpeg to convert audio to WAV and whisper.cpp for local Chinese transcription, with automatic Metal GPU acceleration on Apple Silicon Generates structured summaries with core points, details, and one-line takeaways in markdown format Optional Feishu (Lark) document sync to append summaries to collaborative docs via the Feishu Open API

INSTALLATION

npx skills add https://github.com/liu-wei-ai/douyin-video-summary --skill douyin-video-summary

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Douyin Video Summary

Name: douyin-video-summary
Author: liu-wei-ai

Summarize Douyin videos: extract audio → transcribe locally → AI summary.

Prerequisites

Install these tools (macOS example):

brew install whisper-cpp ffmpeg

# Download whisper.cpp GGML model (small recommended for speed/quality balance)

curl -L -o models/ggml-small.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin"

Workflow

When a Douyin link is received:

Step 1: Extract Video ID

Parse the Douyin URL to get the video ID. Douyin share links come in two formats:

Short link: https://v.douyin.com/xxxxx/ → follow redirect to get video ID

Direct link: https://www.douyin.com/video/7604713801732365681

# Follow redirect to get final URL, extract numeric video ID

curl -sL -o /dev/null -w '%{url_effective}' 'https://v.douyin.com/xxxxx/' | grep -oE '[0-9]{15,}'

Step 2: Get Audio via Browser

Douyin blocks direct downloads (yt-dlp, aria2c all get 403). Use the browser to intercept the audio URL:

Open the Douyin video page in the browser

Inject JS to intercept network requests before navigation:

window.__audioUrls = [];

const origOpen = XMLHttpRequest.prototype.open;

XMLHttpRequest.prototype.open = function(method, url) {

  if (url &#x26;&#x26; (url.includes('.mp3') || url.includes('.m4a') || url.includes('mime_type=audio'))) {

    window.__audioUrls.push(url);

  }

  return origOpen.apply(this, arguments);

};

Navigate to the video page, click play to trigger audio loading

Retrieve intercepted URLs: window.__audioUrls

Download with curl (Referer header required):

curl -H "Referer: https://www.douyin.com/" -o audio.mp4 "<audio_url>"

Important: aria2c will 403 on Douyin CDN URLs. Always use curl with the Referer header.

Step 3: Convert to WAV

ffmpeg -i audio.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav

Step 4: Transcribe with whisper.cpp

whisper-cli -m /path/to/ggml-small.bin -l zh -f audio.wav -otxt -of output

Use -l zh for Chinese content (auto-detect if unsure)

Apple Silicon GPU acceleration is automatic (Metal)

Performance: ~20s for 5min audio on M4

Step 5: Generate Summary

Read the transcription text and produce a structured summary:

📹 **[Video Title] | [Author]**

时长：X分X秒 | 发布：YYYY-MM-DD

🎯 **核心观点：[one-line core message]**

**1. [Point 1 title]**

• [detail]

• [detail]

**2. [Point 2 title]**

• [detail]

💬 **一句话总结：[concise takeaway]**

Step 6 (Optional): Sync to Feishu Doc

If Feishu integration is configured, append the summary to a Feishu document using the Feishu Open API. See references/feishu-sync.md for the API details.

Tips

For short videos (<1min), the summary may be very brief — that's fine

If browser interception fails, retry once; Douyin pages sometimes need a second load

Clean up downloaded audio/wav files after processing to save disk space

whisper.cpp small model is the best speed/quality tradeoff; medium may OOM on 8GB machines