From voxflow
Generates multi-speaker AI podcasts from a topic, URL, or script via CLI. Covers LLM dialogue generation, per-speaker TTS synthesis, and MP3/WAV export.
How this skill is triggered — by the user, by Claude, or both
Slash command
/voxflow:podcastThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate multi-speaker AI podcasts entirely from the command line. Sister skills:
Generate multi-speaker AI podcasts entirely from the command line. Sister skills:
voxflow (hub)voxflow:videovoxflow:transcribe^20.19.0 || >=22.12.0npm install -g voxflow then voxflow loginNo API keys needed — all auth goes through voxflow login.
# 1. Login (one-time, opens browser)
voxflow login
# 2. Generate a podcast
voxflow podcast --topic "AI 时代的程序员出路"
# 3. Output: podcast-2026-03-20T12-00-00.wav + .txt
voxflow podcast \
--topic "量子计算入门" \
--template tutorial \
--colloquial medium \
--speakers 2 \
--language zh-CN \
--format json \
--no-tts
Outputs: podcast-<ts>.txt + podcast-<ts>.podcast.json
Open .podcast.json — it contains structured dialogue, speaker info, quality scores, and voice mappings. Edit as needed.
voxflow podcast --input podcast-2026-03-20.podcast.json --output final.wav
voxflow podcast \
--topic "Web3 的未来" \
--engine ai-sdk \
--colloquial high \
--speakers 3 \
--output web3-podcast.wav
| Flag | Values | Default | Description |
|---|---|---|---|
--topic | text | tech trends | Podcast topic or prompt |
--engine | auto, legacy, ai-sdk | auto (→ ai-sdk) | Generation engine |
--template | interview, discussion, news, story, tutorial | interview | Podcast template |
--colloquial | low, medium, high | medium | Conversational tone level |
--speakers | 1, 2, 3 | 2 | Number of speakers |
--language | zh-CN, en, ja | zh-CN | Output language |
--length | short, medium, long | medium | Script length |
--format | json | — | Also output .podcast.json |
--input | file path | — | Load .podcast.json for synthesis |
--no-tts | flag | false | Script only, skip TTS |
--speed | 0.5-2.0 | 1.0 | TTS playback speed |
--silence | 0-5.0 | 0.5 | Gap between segments (sec) |
--output | file path | auto | Output file path |
| Feature | legacy | ai-sdk |
|---|---|---|
| Structured output | No | Yes (JSON) |
| Quality scoring | No | Yes (1-10) |
| Colloquial control | No | 3 levels |
| Intent tagging | No | Yes |
| Speaker metadata | Partial | Full |
| Multi-language | Chinese only | zh/en/ja |
| Operation | Cost |
|---|---|
| Script generation (medium, ~16 turns) | 2,000 |
| TTS per turn (native pause voice) | 50 |
| TTS per chunk (splice fallback voice) | 50 |
Per-turn TTS call count depends on voice: voices flagged nativePauseSupported: true (most podcast voices) take 1 TTS call per turn — TRTC honors <|break|> / <|s_break|> markers natively (~250-430ms inserted). Voices that haven't been verified (e.g. 旁白 narration voices) fall back to client-side splice = N calls per turn.
Typical medium podcast (16 turns, all native voices) ≈ 16 × 50 + 2,000 = 2,800 quota — Free tier (10K/month) covers ~3 medium podcasts. Tip: if you already have a script, pass --script my.json to skip the 2,000 LLM step entirely. Mixed-voice podcasts cost slightly more if they include non-native voices.
# English tech podcast
voxflow podcast --topic "AI ethics debate" --language en --template discussion
# Quick news briefing (short)
voxflow podcast --topic "本周科技新闻" --template news --length short
# Casual chat with high colloquial level
voxflow podcast --topic "程序员加班那些事" --colloquial high
# JSON export for editing
voxflow podcast --topic "创业故事" --format json --no-tts
# Synthesize edited script
voxflow podcast --input edited-podcast.podcast.json --speed 1.1
voxflow status before generation.--no-tts --format json to inspect dialogue before paying for TTS.--input.open podcast-*.wav.npx claudepluginhub voxflowstudio/skills --plugin voxflowCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.