Help us improve
Share bugs, ideas, or general feedback.
From skills
Generates realistic AI text-to-speech audio using Google Gemini TTS, ElevenLabs, and OpenAI TTS. Supports multi-speaker dialogues, podcasts, audiobooks, and voiceovers.
npx claudepluginhub michaelboeding/skills --plugin skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/skills:voice-generationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate realistic speech using AI (Google Gemini TTS, ElevenLabs, OpenAI TTS).
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Automates ElevenLabs text-to-speech workflows: generate speech from text, browse voices, check subscription credits, list models, and retrieve audio history via Composio MCP.
Generates TTS, music, sound effects, and voice clones via ElevenLabs and fal.ai. Use when you need audio without managing multiple API keys.
Share bugs, ideas, or general feedback.
Generate realistic speech using AI (Google Gemini TTS, ElevenLabs, OpenAI TTS).
At least one API key is required:
GOOGLE_API_KEY - For Google Gemini TTS (same key as video/image/music) ✅ELEVENLABS_API_KEY - For ElevenLabs high-quality voice synthesisOPENAI_API_KEY - For OpenAI TTS voicesGOOGLE_API_KEY as video/image/music ✅Parse the user's voice request for:
Choose based on requirements:
| Use Case | Recommended API | Reason |
|---|---|---|
| Default / Same key as video | Gemini TTS | Same GOOGLE_API_KEY ✅ |
| Multi-speaker dialogue | Gemini TTS | Up to 2 speakers built-in |
| Style/accent control | Gemini TTS | Natural language prompts |
| Voice cloning | ElevenLabs | Only API with cloning |
| 100+ voice options | ElevenLabs | Widest selection |
| Audiobook/podcast | ElevenLabs or Gemini | Both excellent for long content |
| Quick narration | OpenAI TTS | Fast, reliable |
| Budget-conscious | OpenAI TTS | Lower cost |
Optimize text for speech:
Example transformation:
Execute the appropriate script from ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/:
For Google Gemini TTS (single speaker):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Welcome to our podcast!" \
--voice "Charon"
Gemini TTS with style direction:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Have a wonderful day!" \
--voice "Puck" \
--style "Say cheerfully with a British accent:"
Gemini TTS multi-speaker (dialogue):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--multi \
--speaker "Host:Charon" \
--speaker "Guest:Aoede" \
--text "Host: Welcome to the show!
Guest: Thanks for having me!"
For ElevenLabs:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/elevenlabs.py \
--text "Your text here" \
--voice "Rachel" \
--model "eleven_multilingual_v2"
For OpenAI TTS:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/openai_tts.py \
--text "Your text here" \
--voice "nova" \
--model "tts-1-hd"
List Gemini voices:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py --list-voices
Missing API key: Inform the user which key is needed:
GOOGLE_API_KEY as video/image - https://aistudio.google.com/apikeyGemini TTS requires google-genai package: pip install google-genai
Text too long: Split into chunks and concatenate, or suggest shorter text.
Rate limit: Suggest waiting or trying a different API.
Unsupported language: Suggest an alternative API that supports the language.
Multi-speaker limit: Gemini TTS supports max 2 speakers. For more, use ElevenLabs with multiple calls.
| Style | Voices | Best For |
|---|---|---|
| Bright/Upbeat | Zephyr, Puck, Aoede, Laomedeia | Marketing, cheerful content |
| Firm/Informative | Charon, Kore, Orus, Rasalgethi | News, tutorials, professional |
| Soft/Warm | Achernar, Sulafat, Vindemiatrix | Meditation, gentle narration |
| Smooth | Algieba, Despina, Callirrhoe | Audiobooks, storytelling |
| Clear | Erinome, Iapetus, Pulcherrima | Instructions, clarity |
| Character | Fenrir (excitable), Enceladus (breathy), Algenib (gravelly), Gacrux (mature) | Character voices, drama |
| Friendly | Achird, Zubenelgenubi (casual) | Casual, conversational |
Gemini TTS Style Tips:
--style "Say angrily:" or --style "Whisper mysteriously:"--style "Speak with a British accent from London:"--style "Speak slowly and deliberately:"--style "Say excitedly with a Southern US accent:"| Voice | Description | Best For |
|---|---|---|
| alloy | Neutral, balanced | General purpose |
| echo | Warm, conversational | Podcasts, casual |
| fable | Expressive, British | Storytelling |
| onyx | Deep, authoritative | Narration, professional |
| nova | Friendly, upbeat | Marketing, tutorials |
| shimmer | Soft, gentle | Meditation, ASMR |
| Voice | Description | Best For |
|---|---|---|
| Rachel | Young female, American | Narration, audiobooks |
| Domi | Young female, energetic | Marketing, ads |
| Bella | Young female, soft | Storytelling |
| Antoni | Young male, well-rounded | Narration |
| Josh | Young male, deep | Audiobooks |
| Arnold | Mature male, authoritative | Documentary |
| Adam | Middle-aged male, deep | Narration |
| Sam | Young male, raspy | Character voices |
| Feature | Gemini TTS | ElevenLabs | OpenAI TTS |
|---|---|---|---|
| API Key | GOOGLE_API_KEY ✅ | ELEVENLABS_API_KEY | OPENAI_API_KEY |
| Voice quality | Excellent | Excellent | Very good |
| Voice variety | 30 voices | 100+ voices | 6 voices |
| Multi-speaker | ✅ Up to 2 | ❌ No | ❌ No |
| Style control | ✅ Natural language | Limited | ❌ No |
| Voice cloning | ❌ No | ✅ Yes | ❌ No |
| Languages | 24 | 29+ | 50+ |
| Speed control | Via prompts | Yes | Yes (0.25-4x) |
| Max length | 32k tokens | 5,000 chars | 4,096 chars |
| Output format | WAV (24kHz) | MP3, WAV | MP3, Opus, AAC, FLAC |
| Same key as video/image | ✅ Yes | ❌ No | ❌ No |