Help us improve
Share bugs, ideas, or general feedback.
From gr
Generates voiceover audio via ElevenLabs TTS API with direct curl calls, voice tuning, and sound effects. For narration, audio ducking, and multilingual production — not voice AI agents or transcription.
npx claudepluginhub galbaz1/video-research-mcpHow this skill is triggered — by the user, by Claude, or both
Slash command
/gr:tts-productionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate, tune, and mix voice-over audio using the ElevenLabs Text-to-Speech API.
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Generates TTS, music, sound effects, and voice clones via ElevenLabs and fal.ai. Use when you need audio without managing multiple API keys.
Generates realistic AI text-to-speech audio using Google Gemini TTS, ElevenLabs, and OpenAI TTS. Supports multi-speaker dialogues, podcasts, audiobooks, and voiceovers.
Share bugs, ideas, or general feedback.
Generate, tune, and mix voice-over audio using the ElevenLabs Text-to-Speech API.
Critical: Use direct API calls (curl), NOT ElevenLabs MCP tools. The MCP
Text_To_Speechtool returns 404 due to routing issues. Direct API is reliable and battle-tested.
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}/with-timestamps" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.75,
"similarity_boost": 0.80,
"style": 0.40,
"use_speaker_boost": true
}
}' \
--output /tmp/tts-response.json
import json, base64
with open('/tmp/tts-response.json', 'r') as f:
data = json.load(f)
audio_bytes = base64.b64decode(data['audio_base64'])
with open('output.mp3', 'wb') as f:
f.write(audio_bytes)
ends = data.get('alignment', {}).get('character_end_times_seconds', [])
print(f'Duration: {ends[-1]:.2f}s' if ends else 'No timestamps')
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{ "text": "...", "model_id": "eleven_multilingual_v2", "voice_settings": {...} }' \
--output output.mp3
curl -s -X POST "https://api.elevenlabs.io/v1/sound-generation" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{ "text": "short sharp underwater splash blip", "duration_seconds": 1.0, "prompt_influence": 0.8 }' \
--output sfx.mp3
| Model | Use Case | Speed | Quality |
|---|---|---|---|
eleven_multilingual_v2 | Production — Dutch, English, mixed language | Slow | Highest |
eleven_flash_v2_5 | Quick drafts, iteration | Fast | Good |
eleven_turbo_v2_5 | Real-time, low latency | Fastest | Acceptable |
Always use eleven_multilingual_v2 for final production. Flash/turbo for iteration only.
| Parameter | Range | Effect | Production Range |
|---|---|---|---|
stability | 0–1 | Low=expressive, High=consistent | 0.55–0.75 |
similarity_boost | 0–1 | Voice matching fidelity | 0.80–0.90 |
style | 0–1 | Emotional expressiveness | 0.30–0.70 |
use_speaker_boost | bool | Clarity enhancement | true for narration |
speed | 0.5–2.0 | Top-level param, NOT in voice_settings. Only works with flash/turbo |
Neutral narration (clean, informational):
{ "stability": 0.75, "similarity_boost": 0.80, "style": 0.40 }
Cinematic narration (authoritative, confident):
{ "stability": 0.55, "similarity_boost": 0.85, "style": 0.70, "use_speaker_boost": true }
Warm/conversational:
{ "stability": 0.60, "similarity_boost": 0.75, "style": 0.55, "use_speaker_boost": true }
ffprobe -i clip.mp3 -show_entries format=duration -v quiet -of csv="p=0"ffmpeg -y -i clip.mp3 -filter:a "atempo=1.2" -codec:a libmp3lame -b:a 192k clip-fast.mp3 (max 1.35x sounds natural)stability (more expressive), raise style (more emotional)speed param is silently ignored by eleven_multilingual_v2 — use FFmpeg atempo insteadatempo above 1.35x sounds unnatural for narrationeleven_multilingual_v2 model auto-detects language from text| Variable | Required | Notes |
|---|---|---|
ELEVENLABS_API_KEY | Yes | Set in shell or ~/.config/video-research-mcp/.env |