SKILL.md

When to Use

User wants to create an explainer or tutorial video

User asks to "explain" something in video form

User wants narrated content with AI-generated visuals

User says "explainer video", "解说视频", "tutorial video"

When NOT to Use

User wants audio-only content without visuals (use /speech or /podcast)

User wants a podcast-style discussion (use /podcast)

User wants to generate a standalone image (use /image-gen)

User wants to read text aloud without video (use /speech)

Purpose

Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.

Hard Constraints

Always read config following shared/config-pattern.md before any interaction

Follow shared/cli-patterns.md for execution modes, error handling, and interaction patterns

Always follow shared/cli-authentication.md for auth checks

Never hardcode speaker IDs — always fetch from the speakers CLI when the user wants to change voice

Never save files to ~/Downloads/ or .listenhub/ — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)

Explainer uses exactly 1 speaker

Mode must be info (for Info style) or story (for Story style) — never slides (use /slides skill instead)

Step -1: CLI Auth Check

Follow shared/config-pattern.md § CLI Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login per shared/cli-authentication.md — never ask the user to run commands manually.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

mkdir -p ".listenhub/explainer"

echo '{"outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json"

CONFIG_PATH=".listenhub/explainer/config.json"

CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow.

If file exists — read config silently and proceed:

CONFIG_PATH=".listenhub/explainer/config.json"

[ ! -f "$CONFIG_PATH" ] &#x26;&#x26; CONFIG_PATH="$HOME/.listenhub/explainer/config.json"

CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (explainer)：

  输出方式：{inline / download / both}

  语言偏好：{zh / en / 未设置}

  默认风格：{info / story / 未设置}

  默认主播：{speakerName / 使用内置默认}

Then ask:

outputMode: Follow shared/output-mode.md § Setup Flow Question.

Language (optional): "默认语言？"

"中文 (zh)"

"English (en)"

"每次手动选择" → keep null

Style (optional): "默认风格？"

"Info — 信息展示型"

"Story — 故事叙述型"

"每次手动选择" → keep null

After collecting answers, save immediately:

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')

echo "$NEW_CONFIG" > "$CONFIG_PATH"

CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

Step 1: Topic / Content

Free text input. Ask the user:

What would you like to explain or introduce?

Accept: topic description, text content, or concept to explain.

Step 2: Language

If config.language is set, pre-fill and show in summary — skip this question.

Otherwise ask:

Question: "What language?"

Options:

  - "Chinese (zh)" — Content in Mandarin Chinese

  - "English (en)" — Content in English

  - "Japanese (ja)" — Content in Japanese

Step 3: Style

If config.defaultStyle is set, pre-fill and show in summary — skip this question.

Otherwise ask:

Question: "What style of explainer?"

Options:

  - "Info" — Informational, factual presentation style

  - "Story" — Narrative, storytelling approach

Step 4: Speaker Selection

Follow shared/speaker-selection.md:

If config.defaultSpeakers.{language} is set → use saved speaker silently

If not set → use built-in default from shared/speaker-selection.md for the language

Show the speaker in the confirmation summary (Step 6) — user can change from there if desired

Only show the full speaker list if the user explicitly asks to change voice

Speaker query: see shared/cli-speakers.md for listing and filtering speakers.

Only 1 speaker is supported for explainer videos.

Step 5: Output Type

Question: "What output do you want?"

Options:

  - "Text script only" — Generate narration script, no video

  - "Text + Video" — Generate full explainer video with AI visuals

Step 6: Confirm & Generate

Summarize all choices:

Ready to generate explainer:

  Topic: {topic}

  Language: {language}

  Style: {info/story}

  Speaker: {speaker name}

  Output: {text only / text + video}

  Proceed?

Wait for explicit confirmation before running any CLI command.

Workflow

Run the CLI command with run_in_background: true and timeout: 660000. The CLI blocks until generation completes and returns the final result as JSON:

listenhub explainer create \

  --query "{topic}" \

  --mode {info|story} \

  --lang {en|zh|ja} \

  --speaker "{name}" \

  --speaker-id "{id}" \

  --timeout 600 \

  --json

If the command fails (non-zero exit), check stderr for error details. See shared/cli-patterns.md § Error Handling for exit codes and common errors.

Optional flags (add when applicable):

--source-url "{url}" — if the user provided a reference URL

--skip-audio — if text-only output (no video)

--image-size {2K|4K} — image resolution (default: 2K)

--aspect-ratio {16:9|9:16|1:1} — video aspect ratio (default: 16:9)

--style "{style}" — visual style for AI-generated images

Tell the user the task is submitted. When notified of completion, parse and present result:

Parse the CLI JSON output for key fields:

EPISODE_ID=$(echo "$RESULT" | jq -r '.episodeId')

AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty')

VIDEO_URL=$(echo "$RESULT" | jq -r '.videoUrl // empty')

CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

If text-only output:

**inline or both**: Present the script inline.

Present:

解说脚本已生成！

「{title}」

在线查看：https://listenhub.ai/app/explainer/{episodeId}

**download or both**: Also save the script file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.

Save as {slug}-explainer.md in cwd (dedup if exists)

Present the save path in addition to the above summary.

If text + video output:

**inline or both**: Display video URL and audio URL as clickable links.

Present:

解说视频已生成！

视频链接：{videoUrl}

音频链接：{audioUrl}

消耗积分：{credits}

**download or both**: Also save files. Generate a topic slug following shared/config-pattern.md § Artifact Naming.

Create {slug}-explainer/ folder (dedup if exists)

Write script.md inside

Download audio:

listenhub download "{audioUrl}" -o "{slug}-explainer/audio.mp3"

Present:

已保存到当前目录：

  {slug}-explainer/

    script.md

    audio.mp3

After Successful Generation

Update config with the choices made this session:

NEW_CONFIG=$(echo "$CONFIG" | jq \

  --arg lang "{language}" \

  --arg style "{info/story}" \

  --arg speakerId "{speakerId}" \

  '. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}')

echo "$NEW_CONFIG" > "$CONFIG_PATH"

Estimated times:

Text script only: 2-3 minutes

Text + Video: 5-10 minutes

Resources

CLI authentication: shared/cli-authentication.md

CLI patterns: shared/cli-patterns.md

Speaker query: shared/cli-speakers.md

Speaker selection guide: shared/speaker-selection.md

Config pattern: shared/config-pattern.md

Output mode: shared/output-mode.md

Composability

Invokes: speakers CLI (for speaker selection); may invoke /speech for voiceover

Invoked by: content-planner (Phase 3)

Example

User: "Create an explainer video introducing Claude Code"

Agent workflow:

Topic: "Claude Code introduction"

Ask language → "English"

Ask style → "Info"

Use default speaker "Mars" (cozy-man-english)

Ask output → "Text + Video"

# Run with run_in_background: true, timeout: 660000

listenhub explainer create \

  --query "Introduce Claude Code: what it is, key features, and how to get started" \

  --mode info \

  --lang en \

  --speaker "Mars" \

  --speaker-id "cozy-man-english" \

  --timeout 600 \

  --json

Parse result for episodeId, audioUrl, videoUrl, credits, and present to user.

explainer