From pika
Two-host podcast video for any URL OR free-form topic — 1 minute, 4 acts × ~15s, native multi-shot dialogue, optional voice cloning for Host A. CANONICAL workflow for podcast-style spoken video. Accepts EITHER a URL (scraped and reviewed) OR a free-form brief like "I and Elon Musk talk about Mars", "two scientists debate AGI", "podcast about quantum computing", "interview with a VC about seed-stage fundraising". Use when the user asks to "make a podcast", "podcast about [thing]", "podcast review of [url]", "two-host explainer", "interview-style clip", "two-host take on [topic]", "two people talking on camera about [thing]", "podcast clip", "GRWM podcast", "I and X talk about Y", "interview with [persona] about [topic]" — or any variant of "two hosts discussing [anything]". No captions burned (native audio is the deliverable; auto-transcription mistranscribes domain-specific terms).
npx claudepluginhub pika-labs/pika-plugins --plugin pikaThis skill uses the workspace's default tool permissions.
4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL **or** a free-form topic / brief.
Creates podcast episodes, interviews, dialogues, and audio dramas via interactive prompts, Claude script generation, Gemini TTS multi-speaker voices, Lyria intro/outro music, and FFmpeg assembly.
Generates audiobooks, podcasts, or educational audio from user topics using Claude-written scripts and ElevenLabs TTS. Supports custom lengths, voice effects; outputs MP3 files.
Generates Korean podcast episodes from URLs, tweets, articles, PDFs: analyzes sources, writes script, OpenAI TTS audio, MP4 conversion, YouTube auto-upload. Partial execution supported.
Share bugs, ideas, or general feedback.
4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL or a free-form topic / brief.
| Param | Default | Notes |
|---|---|---|
input | required | URL to review or free-form topic / brief (e.g. "I and Elon Musk talk about Mars") |
bg_img | auto-generated | Podcast studio background |
host_a_img | auto-generated | Host A portrait — see Real-person handling below |
host_b_img | auto-generated | Host B portrait — see Real-person handling below |
voice_a | 876341503281471517 | Kling preset or cloned voice ID for Host A |
voice_b | 829837252279803904 | Kling preset or cloned voice ID for Host B |
use_avatar | off | Clone user's identity voice as Host A via clone_voice |
aspect_ratio | 16:9 | Output aspect ratio |
voice_a defaults to the Kling preset 876341503281471517 and voice_b to 829837252279803904. Do not ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (voice_a=, voice_b=, use_avatar).--yes flag is accepted as a no-op for backward compatibility.Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as host_a_img / host_b_img, pause Step 1 and kindly send them this — something like:
Heads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:
- Paste a URL if it's already hosted (Imgur, S3, your site) — fastest
- Zip it and attach the
.zip— right-click → Compress (macOS) / Send to → Compressed folder (Windows) /zip pic.zip pic.png(Linux). I'll take it from there.
When a .zip arrives, unzip it via Bash, call upload_asset for a presigned PUT URL, push the bytes with curl -X PUT, then use the returned public_url as the parameter — all before Step 1. Already-hosted https://... URLs work as-is and skip this entirely.
If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.
Strip flags (--yes, --no-captions, etc.) and key=value parameters from $ARGUMENTS. If what remains is empty or whitespace-only, print this menu verbatim as your full response, then stop and wait for the user's next message — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.
What would you like a podcast about? I can take any of:
- A website URL (product page, docs site, launch page) — e.g.
https://pika.art- A GitHub repo — e.g.
https://github.com/anthropics/claude-code- A blog post / article URL — e.g. a recent piece you'd like discussed
- A free-form topic or brief — e.g. "I and Elon Musk talk about Mars" or "two scientists debate AGI"
Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).
Tip: you don't need to type
/pika:podcast— just say things like "make a podcast about ", "podcast review of ", or "I and talk about " and I'll fire this skill automatically.
When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.
Generate only what's not provided. Default archetype prompts:
bg_img — modern podcast studio, two chairs, warm lighting, no people, 16:9host_a_img — enthusiastic host, studio portrait, left-side framing, 1:1host_b_img — pragmatic skeptic host, studio portrait, right-side framing, 1:1If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.
use_avatar is set)identity_voice_info → { voice_id, platform, sample_url }sample_url is present: call clone_voice(voice_url=sample_url, voice_name="host_a_voice") → set voice_a to the returned Kling voice IDStrip flags (--yes, --no-captions, etc.) and key=value parameters from $ARGUMENTS. Inspect what remains.
URL mode — input contains a https?:// URL:
capture_website on the URL.Topic mode — input is free-form prose (no URL):
use_avatar flow if not already, or default avatar) and Host B = X.If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):
host_a_img=<url> or host_b_img=<url>, use the provided image as-is. The user takes responsibility for likeness rights.voice_a= / voice_b=) or invokes use_avatar (which clones the user's own voice for Host A).This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.
Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.
Required (Matan rules — apply to both URL and topic modes):
Acts: Hook → Feature deep-dive → The Turn → Verdict (In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)
Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.
Each act: one generate_reference_video call (kling-v3-omni, duration=15, sound=true). Pass reference_images=[bg_img, host_a_img, host_b_img] and voice_ids=[voice_a, voice_b]. Three shots:
<<<voice_1>>> '<HOST_A line>'<<<voice_2>>> '<HOST_B line>'Emotional beats per act:
After act 4, subagent calls edit_concat([act1, act2, act3, act4]) and returns the final video URL.
Return the final video URL and a one-sentence verdict. Do not call add_captions — Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.
Rules:
voice_ids must be valid Kling voice IDs — never use name-style strings like Calm_Man<<<image_2>>>), Host B always RIGHT (<<<image_3>>>) — never swappedURL mode (review a website / repo / blog):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar
Topic mode (free-form brief):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026
Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):
/pika:podcast podcast about https://pika.art with skeptical investor energy