From asset-generation
Build and troubleshoot ElevenLabs TTS integrations in Node/Python/web apps: auth, voice/model selection, streaming vs batch generation, latency, fallbacks, secure API keys.
npx claudepluginhub kyh/vibedgamesThis skill uses the workspace's default tool permissions.
Use this skill when implementing or debugging ElevenLabs text-to-speech in production code. It emphasizes architecture decisions first, then API usage.
Implements production reference architecture for ElevenLabs TTS/voice apps with TypeScript project structure, service layers, caching, API routes, queues, and monitoring.
Generates voiceover audio via ElevenLabs TTS API using curl for TTS with timestamps, voice tuning, multilingual models, and sound effects. Includes Python decoding. For production narration excluding agents or transcription.
Builds real-time voice AI applications and agents using OpenAI Realtime API, Vapi, Deepgram, ElevenLabs, LiveKit, and WebRTC. Optimizes latency for production voice experiences.
Share bugs, ideas, or general feedback.
Use this skill when implementing or debugging ElevenLabs text-to-speech in production code. It emphasizes architecture decisions first, then API usage.
TTS is not just an API call; it is a UX contract across identity, latency, intelligibility, and reliability.
Before implementing, ask:
Core principles:
Use this skill when requests involve:
voice_id, model_id, output_format, and latency strategies.Use a stable input model, for example:
textvoiceIdmodelIdoutputFormatReturn a stable output model, for example:
audioUrl or base64/blob referencemimeTypedurationMs (if known)cacheHitELEVENLABS_API_KEY).❌ API key in frontend code Why bad: key leakage and account abuse risk. Better: route all privileged calls through backend or token broker.
❌ One-size-fits-all voice settings Why bad: unnatural output across contexts (alerts vs narration vs dialogue). Better: maintain per-use-case presets.
❌ No timeout or fallback path Why bad: blocked UX and brittle flows. Better: strict timeout + deterministic fallback behavior.
❌ Re-generating identical text repeatedly Why bad: wasted cost and latency. Better: content-hash caching and reuse.
❌ Conflating latency and quality tuning Why bad: random changes without measurable gains. Better: test one variable at a time with explicit success metrics.
IMPORTANT: Implementations should vary by product context.
Avoid converging on a single default voice/model for every task.
references/api-patterns.mdDesign the speech pipeline around UX and operational constraints first. The API call is the easy part; production behavior is the real task.
Codex can do extraordinary work in this domain. Use these principles to unlock better decisions, adapt to context, and ship robust voice experiences.