npx claudepluginhub galbaz1/video-research-mcpThis skill uses the workspace's default tool permissions.
Generate, tune, and mix voice-over audio using the ElevenLabs Text-to-Speech API.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Performs token-optimized structural code search using tree-sitter AST parsing to discover symbols, outline files, and unfold code without reading full files.
Generate, tune, and mix voice-over audio using the ElevenLabs Text-to-Speech API.
Critical: Use direct API calls (curl), NOT ElevenLabs MCP tools. The MCP
Text_To_Speechtool returns 404 due to routing issues. Direct API is reliable and battle-tested.
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}/with-timestamps" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.75,
"similarity_boost": 0.80,
"style": 0.40,
"use_speaker_boost": true
}
}' \
--output /tmp/tts-response.json
import json, base64
with open('/tmp/tts-response.json', 'r') as f:
data = json.load(f)
audio_bytes = base64.b64decode(data['audio_base64'])
with open('output.mp3', 'wb') as f:
f.write(audio_bytes)
ends = data.get('alignment', {}).get('character_end_times_seconds', [])
print(f'Duration: {ends[-1]:.2f}s' if ends else 'No timestamps')
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{ "text": "...", "model_id": "eleven_multilingual_v2", "voice_settings": {...} }' \
--output output.mp3
curl -s -X POST "https://api.elevenlabs.io/v1/sound-generation" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{ "text": "short sharp underwater splash blip", "duration_seconds": 1.0, "prompt_influence": 0.8 }' \
--output sfx.mp3
| Model | Use Case | Speed | Quality |
|---|---|---|---|
eleven_multilingual_v2 | Production — Dutch, English, mixed language | Slow | Highest |
eleven_flash_v2_5 | Quick drafts, iteration | Fast | Good |
eleven_turbo_v2_5 | Real-time, low latency | Fastest | Acceptable |
Always use eleven_multilingual_v2 for final production. Flash/turbo for iteration only.
| Parameter | Range | Effect | Production Range |
|---|---|---|---|
stability | 0–1 | Low=expressive, High=consistent | 0.55–0.75 |
similarity_boost | 0–1 | Voice matching fidelity | 0.80–0.90 |
style | 0–1 | Emotional expressiveness | 0.30–0.70 |
use_speaker_boost | bool | Clarity enhancement | true for narration |
speed | 0.5–2.0 | Top-level param, NOT in voice_settings. Only works with flash/turbo |
Neutral narration (clean, informational):
{ "stability": 0.75, "similarity_boost": 0.80, "style": 0.40 }
Cinematic narration (authoritative, confident):
{ "stability": 0.55, "similarity_boost": 0.85, "style": 0.70, "use_speaker_boost": true }
Warm/conversational:
{ "stability": 0.60, "similarity_boost": 0.75, "style": 0.55, "use_speaker_boost": true }
ffprobe -i clip.mp3 -show_entries format=duration -v quiet -of csv="p=0"ffmpeg -y -i clip.mp3 -filter:a "atempo=1.2" -codec:a libmp3lame -b:a 192k clip-fast.mp3 (max 1.35x sounds natural)stability (more expressive), raise style (more emotional)speed param is silently ignored by eleven_multilingual_v2 — use FFmpeg atempo insteadatempo above 1.35x sounds unnatural for narrationeleven_multilingual_v2 model auto-detects language from text| Variable | Required | Notes |
|---|---|---|
ELEVENLABS_API_KEY | Yes | Set in shell or ~/.config/video-research-mcp/.env |