Skill

podcast

Two-host podcast video for any URL OR free-form topic — 1 minute, 4 acts × ~15s, native multi-shot dialogue, optional voice cloning for Host A. CANONICAL workflow for podcast-style spoken video. Accepts EITHER a URL (scraped and reviewed) OR a free-form brief like "I and Elon Musk talk about Mars", "two scientists debate AGI", "podcast about quantum computing", "interview with a VC about seed-stage fundraising". Use when the user asks to "make a podcast", "podcast about [thing]", "podcast review of [url]", "two-host explainer", "interview-style clip", "two-host take on [topic]", "two people talking on camera about [thing]", "podcast clip", "GRWM podcast", "I and X talk about Y", "interview with [persona] about [topic]" — or any variant of "two hosts discussing [anything]". No captions burned (native audio is the deliverable; auto-transcription mistranscribes domain-specific terms).

npx claudepluginhub pika-labs/pika-plugins --plugin pika

Tool Access

This skill uses the workspace's default tool permissions.

Preview

4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL **or** a free-form topic / brief.

SKILL.md

Similar Skills

podcast-producer-agent

Creates podcast episodes, interviews, dialogues, and audio dramas via interactive prompts, Claude script generation, Gemini TTS multi-speaker voices, Lyria intro/outro music, and FFmpeg assembly.

skills

audio-gen

586

Generates audiobooks, podcasts, or educational audio from user topics using Claude-written scripts and ElevenLabs TTS. Supports custom lengths, voice effects; outputs MP3 files.

sundial-org-awesome-openclaw-skills-4

podcast

764

Generates Korean podcast episodes from URLs, tweets, articles, PDFs: analyzes sources, writes script, OpenAI TTS audio, MP4 conversion, YouTube auto-upload. Partial execution supported.

3 files

podcast

Stats

Stars31

Forks4

Last CommitMay 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

/pika:podcast

4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL or a free-form topic / brief.

Parameters

Param	Default	Notes
`input`	required	URL to review or free-form topic / brief (e.g. "I and Elon Musk talk about Mars")
`bg_img`	auto-generated	Podcast studio background
`host_a_img`	auto-generated	Host A portrait — see Real-person handling below
`host_b_img`	auto-generated	Host B portrait — see Real-person handling below
`voice_a`	`876341503281471517`	Kling preset or cloned voice ID for Host A
`voice_b`	`829837252279803904`	Kling preset or cloned voice ID for Host B
`use_avatar`	off	Clone user's identity voice as Host A via `clone_voice`
`aspect_ratio`	`16:9`	Output aspect ratio

Defaults — fire fast, no mid-flow confirmation

Use the param-table defaults silently for voices. voice_a defaults to the Kling preset 876341503281471517 and voice_b to 829837252279803904. Do not ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (voice_a=, voice_b=, use_avatar).
Auto-generate any missing host portraits silently (Step 1's archetype prompts). Do not ask "should I generate a host image?" — just generate.
No "type yes to proceed" gates. Submit → render the 4 acts → return URL. Account credit balance + provider failover are the canonical guardrails. The --yes flag is accepted as a no-op for backward compatibility.
Topic-mode personas (Step 3) — when the user names a real public figure, follow Step 4 (Real-person handling) silently: archetype portrait by default, no auto-generated photographic likeness, no question to the user about likeness rights.

Local images on Claude Desktop

Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as host_a_img / host_b_img, pause Step 1 and kindly send them this — something like:

Heads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:

Paste a URL if it's already hosted (Imgur, S3, your site) — fastest

Zip it and attach the .zip — right-click → Compress (macOS) / Send to → Compressed folder (Windows) / zip pic.zip pic.png (Linux). I'll take it from there.

When a .zip arrives, unzip it via Bash, call upload_asset for a presigned PUT URL, push the bytes with curl -X PUT, then use the returned public_url as the parameter — all before Step 1. Already-hosted https://... URLs work as-is and skip this entirely.

If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.

Steps

0. Resolve input (empty-args menu)

Strip flags (--yes, --no-captions, etc.) and key=value parameters from $ARGUMENTS. If what remains is empty or whitespace-only, print this menu verbatim as your full response, then stop and wait for the user's next message — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.

What would you like a podcast about? I can take any of:

A website URL (product page, docs site, launch page) — e.g. https://pika.art

A GitHub repo — e.g. https://github.com/anthropics/claude-code

A blog post / article URL — e.g. a recent piece you'd like discussed

A free-form topic or brief — e.g. "I and Elon Musk talk about Mars" or "two scientists debate AGI"

Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).

Tip: you don't need to type /pika:podcast — just say things like "make a podcast about ", "podcast review of ", or "I and talk about " and I'll fire this skill automatically.

When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.

1. Generate missing assets (parallel)

Generate only what's not provided. Default archetype prompts:

bg_img — modern podcast studio, two chairs, warm lighting, no people, 16:9
host_a_img — enthusiastic host, studio portrait, left-side framing, 1:1
host_b_img — pragmatic skeptic host, studio portrait, right-side framing, 1:1

If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.

2. Resolve voice IDs (only if `use_avatar` is set)

Call identity_voice_info → { voice_id, platform, sample_url }
If sample_url is present: call clone_voice(voice_url=sample_url, voice_name="host_a_voice") → set voice_a to the returned Kling voice ID

3. Parse input mode — URL vs topic

Strip flags (--yes, --no-captions, etc.) and key=value parameters from $ARGUMENTS. Inspect what remains.

URL mode — input contains a https?:// URL:

Call capture_website on the URL.
Extract: product name, value prop, 2–3 specific features or facts, pricing, one jokeable detail.
Use these as the script's factual anchors.

Topic mode — input is free-form prose (no URL):

Treat the whole input as the brief. Parse for:
- Subject — what the conversation is about
- Hosts — explicit if mentioned ("I and Elon Musk", "two scientists", "Joe and Sarah"); otherwise use defaults (enthusiastic host + skeptic host)
- Angle — debate / interview / explainer / casual
- Concrete facts — any specific claims, numbers, dates, quotes the user gave
If no concrete facts are given, invent 2–3 plausible specific claims to anchor jokes and the "wait, actually..." pivot. Stay grounded — verifiable-sounding, not wild fabrications. The audience can tell.
If the user says "I and X" or "me and X", Host A = the user (use use_avatar flow if not already, or default avatar) and Host B = X.

4. Real-person handling (topic mode only)

If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):

Default behavior: do NOT auto-generate that person's photographic likeness. Generate an archetype portrait matching the persona vibe — e.g. "tech-billionaire-energy CEO at a podcast desk" for an Elon-style host, "pop-star aesthetic" for a Taylor-style host. Clearly inspired-by, not impersonation.
Override: if the user explicitly provides host_a_img=<url> or host_b_img=<url>, use the provided image as-is. The user takes responsibility for likeness rights.
Voices: same logic — default to a generic Kling preset; only use a cloned voice when the user provides one (voice_a= / voice_b=) or invokes use_avatar (which clones the user's own voice for Host A).
Script tone: the dialogue can riff on the named persona's known public positions or vibe (e.g. Mars enthusiasm for Elon-style) — public-record opinions are fair game. Do NOT put specific defamatory, off-character, or fabricated-private-life statements in their mouth.

This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.

5. Write script

Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.

Required (Matan rules — apply to both URL and topic modes):

One specific joke tied to a concrete detail (scraped fact in URL mode; topic-derived claim in topic mode)
One "wait, actually..." skeptic-flip moment
At least one mid-sentence interruption
Natural filler: "okay so", "wait", "right?", "i mean", "honestly"
Real reactions, not generic praise
Reference at least one actual feature name, price, claim, or quote
Natural ending — no forced "bye!"

Acts: Hook → Feature deep-dive → The Turn → Verdict (In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)

6. Generate video acts (subagent, sequential)

Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.

Each act: one generate_reference_video call (kling-v3-omni, duration=15, sound=true). Pass reference_images=[bg_img, host_a_img, host_b_img] and voice_ids=[voice_a, voice_b]. Three shots:

Wide 5s: both hosts, no voice token
MCU-A 5s: <<<voice_1>>> '<HOST_A line>'
MCU-B 5s: <<<voice_2>>> '<HOST_B line>'

Emotional beats per act:

Act 1: A excited, B skeptical
Act 2: A gesturing/explaining, B questioning
Act 3: A firm, B surprised and reconsidering
Act 4: A satisfied, B conceding

After act 4, subagent calls edit_concat([act1, act2, act3, act4]) and returns the final video URL.

7. Output

Return the final video URL and a one-sentence verdict. Do not call add_captions — Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.

Rules:

voice_ids must be valid Kling voice IDs — never use name-style strings like Calm_Man
Host A always LEFT (<<<image_2>>>), Host B always RIGHT (<<<image_3>>>) — never swapped

Examples

URL mode (review a website / repo / blog):

/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar

Topic mode (free-form brief):

/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026

Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):

/pika:podcast podcast about https://pika.art with skeptical investor energy

podcast

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

podcast

Tool Access

Preview

SKILL.md

/pika:podcast

Parameters

Defaults — fire fast, no mid-flow confirmation

Local images on Claude Desktop

Steps

0. Resolve input (empty-args menu)

1. Generate missing assets (parallel)

2. Resolve voice IDs (only if use_avatar is set)

3. Parse input mode — URL vs topic

4. Real-person handling (topic mode only)

5. Write script

6. Generate video acts (subagent, sequential)

7. Output

Examples

Similar Skills

Help us improve

/pika:podcast

Parameters

Defaults — fire fast, no mid-flow confirmation

Local images on Claude Desktop

Steps

0. Resolve input (empty-args menu)

1. Generate missing assets (parallel)

2. Resolve voice IDs (only if use_avatar is set)

3. Parse input mode — URL vs topic

4. Real-person handling (topic mode only)

5. Write script

6. Generate video acts (subagent, sequential)

7. Output

Examples

2. Resolve voice IDs (only if `use_avatar` is set)

2. Resolve voice IDs (only if `use_avatar` is set)