Help us improve
Share bugs, ideas, or general feedback.
From ai-business-skills
Voice cloning, podcast, audiobook, and voiceover production using ElevenLabs, Murf, and PlayHT. Supports short clips, 30-60 min podcasts, and 1:10 repurposing.
npx claudepluginhub minhnv0807/ai-business-skills --plugin ai-business-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/ai-business-skills:25-voice-clone-podcast-globalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **This skill focuses on audio AI** — voice clone, podcast, audiobook, voiceover.
Guides voice cloning (ElevenLabs, HeyGen, Vbee) and AI audio production for podcasts, audiobooks, and voiceovers. Includes repurposing one podcast into ten short clips.
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Creates single-voice audio content like audiobooks, voiceovers, narrations, jingles, and ads via TTS orchestration, background music, and FFmpeg assembly.
Share bugs, ideas, or general feedback.
This skill focuses on audio AI — voice clone, podcast, audiobook, voiceover. Pairs with
24-ai-avatar-production-global(video) — combine both for full content stack coverage.
Audio AI is the tech behind synthetic voices that sound nearly human — from a sample of your voice, AI learns and produces a synthetic clone (voice clone). You write text -> AI reads it back (Text-to-Speech).
Differences vs video AI:
| Situation | Pick audio AI | Pick video AI |
|---|---|---|
| Long-form content (>10 min) | YES — podcast format | NO — too long for video |
| Don't want to be on camera | YES | NO |
| Need volume content fast | YES — 1 podcast = 10 shorts | YES but more expensive |
| Audience listens while driving / at gym | YES | NO |
| Need visuals to demo | NO | YES |
| Personal brand thought leader | YES — podcast = authority | YES — if face brand exists |
| Task | Time | Cost (USD/mo) |
|---|---|---|
| Voice clone setup | 30-60 min | $5-22 (ElevenLabs Starter/Pro) |
| 60s voiceover (TikTok) | 5-10 min | $5-22 |
| 30 min solo podcast | 1-2 hrs | $22-99 (ElevenLabs + Riverside) |
| Audiobook chapter (15 min) | 30-45 min | $22-99 |
| 1 podcast -> 10 clips | 1-2 hrs | $0-30 (Descript/Opus Clip) |
Ask up to 4 questions before starting:
Based on the answers, pick the appropriate use case + tool stack.
| Criterion | Minimum | Optimal |
|---|---|---|
| Length | 1 min (Free tier) | 3-5 min (Pro tier) |
| Room | Quiet, no echo | Acoustic treatment, rugs, curtains |
| Mic | Phone + headset mic | Condenser mic (AT2020, $80-100) |
| Distance | 20-30 cm | 15-20 cm with pop filter |
| Format | MP3 128 kbps | WAV 44.1 kHz |
| Content | One pre-written passage | Three passages: business / casual / emotional |
Full reference:
references/voice-clone-prompts-global.md— sample scripts across English variants (US/UK/AU/SG/IN) and 3 topics (business / lifestyle / educational).
| Tool | English clone quality | Price/mo | Setup time | Best for |
|---|---|---|---|---|
| ElevenLabs Pro | Excellent (10/10) | $22 | 30 min | Multilingual, content creator |
| HeyGen Voice | Good (8/10) | Bundled with avatar | 15 min | Combo with video AI |
| Murf | Excellent (9/10) | $29-79 | 30 min | Corporate voiceover, e-learning |
| PlayHT | Excellent (9.5/10) | $39-99 | 30 min | API-driven, instant clone |
| Descript Overdub | Good (8/10) | $24 (Hobbyist) | 30 min | Podcast editing |
| Resemble.ai | Excellent (9/10) | $30-99 | 1 hr | Brand custom voice, emotion control |
Recommendations:
VOICE CLONE LICENSE AGREEMENT
I, [Full name], ID/passport: [number], grant [Brand/Company]:
1. Permission to use samples of my voice to create an AI voice clone.
2. Use of the voice clone in [scope: internal / advertising / podcast / etc.].
3. Term: from [DD/MM/YYYY] to [DD/MM/YYYY].
4. Right of withdrawal: I may request deletion of the voice clone at any time
in writing; the brand has 7 days to fully remove it.
5. Disclosure: the brand commits to disclose "AI-generated voice" wherever
required by applicable law (FTC, EU AI Act, etc.).
Signed: ____________ Date: ____________
Spec:
Script template (30s):
[HOOK 0-3s] "Did you know [shocking stat]?"
[PROBLEM 3-10s] "Most people are still stuck in [wrong loop]"
[SOLUTION 10-22s] "I tried [method], and here are 3 things..."
[PAYOFF 22-27s] "Result: [specific number]"
[CTA 27-30s] "Comment 'YES' to get the full breakdown"
Voice settings (ElevenLabs):
Structure:
Pacing:
Sound design:
Voice settings (ElevenLabs):
Structure:
Pacing:
Consistency check (most important):
Voice settings (ElevenLabs):
| Tool | Price/mo | English quality | Multilingual | Setup | Pros | Cons | Best for |
|---|---|---|---|---|---|---|---|
| ElevenLabs | $5-99 | 10/10 | 30+ langs | 30 min | Best clone, multilingual | Pricier high tiers | Multilingual creator |
| HeyGen Voice | Bundle w/ avatar | 8/10 | 40+ langs | 15 min | Combo with avatar | Voice clone less expressive | Combo with video |
| Descript | $24-30 | 9/10 | EN focus | 30 min | Audio editing first | Multilingual weaker | Podcast editing |
| Riverside | $19-29 | n/a (recording) | n/a | 5 min | Studio recording | Not TTS | Live podcast |
| Murf | $29-79 | 9/10 | 20+ langs | 30 min | 120+ voice library | Voice clone limited tier | Corporate voiceover |
| PlayHT | $39-99 | 9.5/10 | 100+ langs | 30 min | Strong API, instant clone | UI dense | Developer/API |
| Resemble.ai | $30-99 | 9/10 | 60+ langs | 1 hr | Custom emotion control | Steep learning curve | Brand custom voice |
Recommended combos 2025-2026:
Use case: solo podcaster who wants conversational format but can't find a co-host. AI co-host = a second AI voice that asks questions while you answer.
Step 1: Define the AI co-host's personality
Name: [AI co-host name]
Personality: curious, asks deep follow-ups, occasionally light humor
Role: asks the host questions, doesn't talk too much
Speaking style: casual, natural, addresses the host by first name
Knowledge level: average — asks questions like a listener would
Catchphrases: "Wow, that's wild." / "What does that mean exactly?" / "Can you go deeper?"
Step 2: Create a separate voice clone for the AI co-host
Step 3: Tool stack
[INTRO]
Host: Hey everyone, today [AI co-host] and I are diving into...
AI co-host: Hi all, I'm [name]. Today I want to dig into [topic] from [host]'s
point of view. Let's go!
[BODY — 5-7 Q&A pairs]
AI co-host: [Broad opening question]
Host: [Answers 2-3 minutes]
AI co-host: [Deeper follow-up]
Host: [Answers with a concrete example]
... repeat 5-7 times ...
[OUTRO]
AI co-host: Thanks [host] for sharing. The biggest thing I learned was...
Host: Thanks [AI co-host]. If you have questions, drop them in the comments...
Tip: pre-write 7-10 AI co-host questions in a doc, record host responses in one go. Then generate AI co-host audio in ElevenLabs and splice in via Descript.
[1] Record 60-min podcast (Riverside)
v
[2] Auto-transcript (Descript / Riverside)
v
[3] Identify hooks (10-15 quotable lines)
v
[4] Cut 30-60s clips per quote (Opus Clip / Descript)
v
[5] Add captions (auto-caption)
v
[6] Distribute across 4 platforms
Find moments in the transcript with these traits:
Target: 10-15 hooks per 60-min podcast. Pick the 10 best.
| Platform | Format | Length | Caption | Bonus |
|---|---|---|---|---|
| TikTok | 9:16 (1080×1920) | 30-60s | Bold caption on top | Trend audio overlay (low volume) |
| Instagram Reels | 9:16 | 15-90s | Clean subtitle, sans-serif font | Strong cover image |
| YouTube Shorts | 9:16 | <60s | Auto-caption | Title with target keyword |
| LinkedIn audio | 1:1 (square video w/ audio) | 60-120s | Subtitle below | Long-form thread (carousel) |
Pro tip: each clip should target one platform with platform-specific captions and cover image. Maximizes reach.
Pass: 40+/50. Below 40 = re-render or re-record.
| Situation | Disclosure | Placement |
|---|---|---|
| Commercial advertising | REQUIRED | Caption + end of audio ("This audio uses an AI voice clone") |
| Personal brand podcast | RECOMMENDED — transparency | Episode description |
| Fiction audiobook | OPTIONAL | Optional — credits at end |
| News/educational | REQUIRED | Beginning of audio + caption |
| Internal corporate content | NOT REQUIRED | n/a |
Disclosure caption template:
This audio uses AI voice cloning technology
(ElevenLabs / Murf / [tool name]). Content was written and reviewed by [Name].
Full reference:
references/ai-video-disclosure-global.md— FTC, EU AI Act, FCC, and OFCOM requirements; 3-tier disclosure framework, situational templates (also applies to audio).
Before publishing audio:
Skill 25 (Global) | v1.0.0