/add-text-to-speech | elevenlabs

🚨 EXECUTION NOTICE FOR CLAUDE

When you invoke this command via SlashCommand, the system returns THESE INSTRUCTIONS below.

YOU are the executor. This is NOT an autonomous subprocess.

✅ The phases below are YOUR execution checklist
✅ YOU must run each phase immediately using tools (Bash, Read, Write, Edit, TodoWrite)
✅ Complete ALL phases before considering this command done
❌ DON't wait for "the command to complete" - YOU complete it by executing the phases
❌ DON't treat this as status output - it IS your instruction set

Immediately after SlashCommand returns, start executing Phase 0, then Phase 1, etc.

See @CLAUDE.md section "SlashCommand Execution - YOU Are The Executor" for detailed explanation.

Available Skills

This commands has access to the following skills from the elevenlabs plugin:

api-authentication: API authentication patterns, SDK installation scripts, environment variable management, and connection testing for ElevenLabs. Use when setting up ElevenLabs authentication, installing ElevenLabs SDK, configuring API keys, testing ElevenLabs connection, or when user mentions ElevenLabs authentication, xi-api-key, ELEVENLABS_API_KEY, or ElevenLabs setup.
mcp-integration
production-deployment: Production deployment patterns for ElevenLabs API including rate limiting, error handling, monitoring, and testing. Use when deploying to production, implementing rate limiting, setting up monitoring, handling errors, testing concurrency, or when user mentions production deployment, rate limits, error handling, monitoring, ElevenLabs production.
stt-integration: ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.
tts-integration
vercel-ai-patterns
voice-processing: Voice cloning workflows, voice library management, audio format conversion, and voice settings. Use when cloning voices, managing voice libraries, processing audio for voice creation, configuring voice settings, or when user mentions voice cloning, instant cloning, professional cloning, voice library, audio processing, voice settings, or ElevenLabs voices.

To use a skill:

!{skill skill-name}

Use skills when you need:

Domain-specific templates and examples
Validation scripts and automation
Best practices and patterns
Configuration generators

Skills provide pre-built resources to accelerate your work.

Security Requirements

CRITICAL: All generated files must follow security rules:

@docs/security/SECURITY-RULES.md

Key requirements:

Never hardcode API keys or secrets
Use placeholders: your_service_key_here
Protect .env files with .gitignore
Create .env.example with placeholders only
Document key acquisition for users

Arguments: $ARGUMENTS

Goal: Add comprehensive TTS capabilities to the project with support for multiple ElevenLabs voice models, streaming audio, voice selection, and audio playback controls.

Core Principles:

Detect framework and adapt implementation (Next.js, React, Python, Node.js)
Support all 4 voice models (Eleven v3, Flash v2.5, Turbo v2.5, Multilingual v2)
Implement both standard and streaming TTS
Create reusable components/functions
Include voice selection interface

Phase 1: Discovery Goal: Understand project structure and existing setup

Actions:

Check if ElevenLabs SDK is already installed:
- TypeScript: !{bash npm list @elevenlabs/elevenlabs-js 2>/dev/null}
- Python: !{bash pip show elevenlabs 2>/dev/null}
Detect framework:
- Next.js: @package.json (check for "next")
- Python: @requirements.txt or @pyproject.toml
- React: @package.json (check for "react")
Check if authentication is configured (@.env or @.env.local)
Parse $ARGUMENTS for specific options (model preference, streaming, etc.)

Phase 2: Requirements Gathering Goal: Clarify TTS implementation needs

Actions:

If $ARGUMENTS doesn't specify preferences, use AskUserQuestion to ask:
- Which voice model to prioritize? (v3 Alpha for quality, Flash v2.5 for speed, Turbo v2.5 for balance, Multilingual v2 for stability)
- Do you need streaming audio support? (real-time vs complete audio)
- Should we include voice selection UI? (dropdown/list of available voices)
- Where should TTS functionality be added? (new page, existing component, API route, etc.)

Phase 3: Planning Goal: Design the TTS implementation approach

Actions:

Based on detected framework, plan:
- Component structure (React components, Python functions, API routes)
- File locations following project conventions
- Voice model configuration strategy
- Audio playback implementation
- Error handling approach
Present plan to user for confirmation

Phase 4: Implementation Goal: Build TTS integration with specialized agent

Actions:

Launch the elevenlabs-tts-integrator agent to implement text-to-speech capabilities.

Provide the agent with a detailed prompt including:

Context: Detected framework, existing project structure, SDK installation status
Target: $ARGUMENTS (any specific requirements)
Requirements:
- Create TTS function/component with support for all 4 models:
  - Eleven v3 Alpha (eleven_multilingual_v3) - highest quality, 70+ languages
  - Eleven Flash v2.5 (eleven_flash_v2_5) - ultra-low latency ~75ms, 32 languages
  - Eleven Turbo v2.5 (eleven_turbo_v2_5) - balanced speed/quality ~250ms
  - Eleven Multilingual v2 (eleven_multilingual_v2) - stable, 29 languages
- Implement standard TTS (complete audio generation)
- Implement streaming TTS (real-time audio streaming) if requested
- Add voice selection interface (fetch from /v1/voices API)
- Create audio playback controls
- Include error handling and loading states
- Follow framework-specific patterns (React hooks, FastAPI routes, etc.)
- Add proper TypeScript types or Python type hints
- Use progressive documentation loading (fetch ElevenLabs TTS docs as needed)
Expected output:
- TTS component/function created
- Voice selection UI (if requested)
- Audio playback implementation
- Example usage code
- Configuration for voice model selection

Phase 5: Verification Goal: Ensure TTS implementation works correctly

Actions:

Verify files were created in correct locations
Check for TypeScript/Python errors:
- TypeScript: !{bash npx tsc --noEmit 2>/dev/null || echo "No TypeScript check available"}
- Python: !{bash python -m py_compile *.py 2>/dev/null || echo "No Python files to check"}
Verify imports and dependencies
Test that API key is properly referenced from environment

Phase 6: Summary Goal: Guide user on using TTS features

Actions:

Display implementation summary:
- Files created: [list of new files]
- Voice models available: [list of 4 models with descriptions]
- Features implemented: [standard TTS, streaming, voice selection, etc.]
Provide usage instructions:
- How to convert text to speech
- How to select different voice models
- How to use streaming vs standard mode
- How to customize voice settings (stability, clarity, style)
Show code example for detected framework
Suggest next steps:
- Test with different voice models
- Explore voice cloning: /elevenlabs:add-voice-management
- Add Vercel AI SDK integration: /elevenlabs:add-vercel-ai-sdk
- Configure production features: /elevenlabs:add-production