podcast

|

INSTALLATION
npx skills add https://github.com/marswaveai/skills --skill podcast
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

When to Use

  • User wants to create a podcast episode on any topic
  • User provides a URL or text and wants it turned into a podcast discussion
  • User asks for a "debate", "dialogue", or "discussion" format
  • User says "podcast", "播客", or "录一期节目"

When NOT to Use

  • User wants text-to-speech reading (use /speech)
  • User wants an explainer video with visuals (use /explainer)
  • User wants to generate an image (use /image-gen)
  • User only wants to extract content from a URL without generating audio (use /content-parser)

Purpose

Generate podcast episodes with 1-2 AI speakers discussing a topic. Supports quick overviews, deep analysis, and debate formats. Input can be a topic description, URL(s), or text. Output is a full audio episode with transcript.

Hard Constraints

  • Always check CLI auth following shared/cli-authentication.md
  • Follow shared/cli-patterns.md for command execution and error handling
  • Never hardcode speaker IDs in API calls — use built-in defaults from shared/speaker-selection.md as fallback only; fetch from the speakers API when the user wants to change voice
  • Never fabricate CLI commands or parameters
  • Always read config following shared/config-pattern.md before any interaction
  • Always follow shared/speaker-selection.md for speaker selection (text table + free-text input)
  • Never save files to ~/Downloads/ or .listenhub/ — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)

Step -1: CLI Auth Check

Follow shared/cli-authentication.md § Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

mkdir -p ".listenhub/podcast"

echo '{"outputMode":"inline","language":null,"defaultMode":"quick","defaultSpeakers":{}}' > ".listenhub/podcast/config.json"

CONFIG_PATH=".listenhub/podcast/config.json"

CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow.

If file exists — read config silently and proceed:

CONFIG_PATH=".listenhub/podcast/config.json"

[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/podcast/config.json"

CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (podcast):

  输出方式:{inline / download / both}

  语言偏好:{zh / en / 未设置}

  默认模式:{quick / deep / debate / 未设置}

  默认主播:{speakerName(s) / 使用内置默认}

Then ask these questions in order and save:

-

outputMode: Follow shared/output-mode.md § Setup Flow Question.

-

Language (optional): "默认语言?"

  • "中文 (zh)"
  • "English (en)"
  • "每次手动选择" → keep null

-

Mode (optional): "默认播客模式?"

  • "Quick — 简短概述"
  • "Deep — 深度分析"
  • "Debate — 辩论对话"
  • "每次手动选择" → keep null

After collecting answers, save immediately:

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')

# Save language if user chose one (not "每次手动选择")

if [ "$LANGUAGE" != "null" ]; then

  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')

fi

# Save mode if user chose one

if [ "$MODE" != "null" ]; then

  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg mode "$MODE" '. + {"defaultMode": $mode}')

fi

echo "$NEW_CONFIG" > "$CONFIG_PATH"

CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

Step 1: Topic + Reference Materials

Ask topic and optional reference materials together in a single question using AskUserQuestion with two sub-questions, or a single free-text prompt:

What topic would you like to turn into a podcast? If you have reference materials (URLs or text), include them here too.

Accept: topic description, URL(s), pasted text, or any combination.

Examples of valid input:

  • "AI developments in 2026"

Step 2: Mode

Default: "quick" — skip this question unless:

  • config.defaultMode is set to something else → use that value silently
  • User explicitly mentioned a mode keyword in Step 1 (e.g. "deep dive", "debate", "in depth") → infer mode from intent

Only ask this question if the user's intent is ambiguous AND no default is configured. In most cases, just use "quick".

Step 3: Language

Default: match the user's interaction language. Detect from the language the user used in Step 1:

  • If the user wrote in Chinese → zh
  • If the user wrote in English → en
  • If config.language is set → use that value

Never ask this question. Always infer silently. Show in the confirmation summary so the user can override if needed.

Step 4: Speaker Count

Default: 2 speakers (dialogue) — the most common and engaging format.

Skip this question. Debate mode requires 2 speakers. For quick/deep, default to 2 speakers as well.

Only use 1 speaker if the user explicitly requests a monologue or solo format.

Step 5: Speaker Selection

Follow shared/speaker-selection.md:

  • If config.defaultSpeakers.{language} is set → use saved speakers silently
  • If not set → use built-in defaults from shared/speaker-selection.md (no question asked)
  • Show the speaker(s) in the confirmation summary — user can change from there if desired
  • Only show the full speaker list if the user explicitly asks to change voices

For 2-speaker mode (dialogue/debate): use Primary + Secondary defaults for the language.

Step 6: Confirm & Generate

Summarize all choices:

Ready to generate podcast:

  Topic: {topic}

  Mode: {mode}

  Language: {language}

  Speakers: {speaker name(s)}

  References: {yes/no + brief description}

  Proceed?

Wait for explicit confirmation before calling any CLI command. The user can adjust any parameter here before confirming.

Workflow

Generation

-

Submit (background): Run the CLI command with run_in_background: true and timeout: 360000:

listenhub podcast create \

  --query "{topic}" \

  --source-url "{url}" \

  --source-text "{text}" \

  --mode {quick|deep|debate} \

  --lang {en|zh|ja} \

  --speaker "{name}" \

  --speaker "{name2}" \

  --json

Flag notes:

  • --query — the topic or question to discuss
  • --source-url — repeatable, one per URL reference
  • --source-text — repeatable, one per text block reference
  • --mode — one of quick, deep, debate
  • --lang — language code
  • --speaker — repeatable (max 2); use speaker display names
  • --speaker-id — alternative to --speaker; use speaker IDs instead of names
  • Omit --source-url / --source-text if the user provided no references

The CLI handles polling internally and returns the final result when generation completes.

-

Tell the user the task is submitted and that they will be notified when it finishes.

-

When notified of completion, Present result:

Parse the CLI JSON output to extract fields: audioUrl, subtitlesUrl, audioDuration, credits.

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

**inline or both**: Display audioUrl as a clickable link.

Present:

播客已生成!

在线收听:{audioUrl}

字幕:{subtitlesUrl}(如有)

时长:{audioDuration / 1000}s

消耗积分:{credits}

**download or both**: Also download the file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.

SLUG="{topic-slug}"  # e.g. "ai-developments"

NAME="${SLUG}-podcast.mp3"

# Dedup: if file exists, append -2, -3, etc.

BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2

while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done

curl -sS -o "$NAME" "{audioUrl}"

Present:

已保存到当前目录:

  {NAME}

-

Offer to show transcript or provide download URL on request

After Successful Generation

Update config with the choices made this session:

NEW_CONFIG=$(echo "$CONFIG" | jq \

  --arg lang "{language}" \

  --arg mode "{mode}" \

  --argjson speakers '{"{language}": ["{speakerId}"]}' \

  '. + {"language": $lang, "defaultMode": $mode, "defaultSpeakers": (.defaultSpeakers + $speakers)}')

echo "$NEW_CONFIG" > "$CONFIG_PATH"

API Reference

  • Speaker list: shared/cli-speakers.md
  • Speaker selection guide: shared/speaker-selection.md
  • CLI patterns: shared/cli-patterns.md
  • CLI authentication: shared/cli-authentication.md
  • Config pattern: shared/config-pattern.md

Composability

  • Invokes: speakers API (for speaker selection)
  • Invoked by: content-planner (Phase 3)

Example

User: "Make a podcast about the latest AI developments"

Agent workflow:

  • Detect: podcast request, topic = "latest AI developments", no references
  • Infer: mode = "quick" (default), language = "en" (user wrote in English), 2 speakers (default)
  • Show confirmation summary → user confirms
listenhub podcast create \

  --query "The latest AI developments" \

  --mode deep \

  --lang en \

  --speaker "Mars" \

  --speaker "Mia" \

  --json

Wait for CLI to return result, then present with title and listen link.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card