šØ EXECUTION NOTICE FOR CLAUDE
When you invoke this command via SlashCommand, the system returns THESE INSTRUCTIONS below.
YOU are the executor. This is NOT an autonomous subprocess.
- ā
The phases below are YOUR execution checklist
- ā
YOU must run each phase immediately using tools (Bash, Read, Write, Edit, TodoWrite)
- ā
Complete ALL phases before considering this command done
- ā DON't wait for "the command to complete" - YOU complete it by executing the phases
- ā DON't treat this as status output - it IS your instruction set
Immediately after SlashCommand returns, start executing Phase 0, then Phase 1, etc.
See @CLAUDE.md section "SlashCommand Execution - YOU Are The Executor" for detailed explanation.
Available Skills
This commands has access to the following skills from the elevenlabs plugin:
- api-authentication: API authentication patterns, SDK installation scripts, environment variable management, and connection testing for ElevenLabs. Use when setting up ElevenLabs authentication, installing ElevenLabs SDK, configuring API keys, testing ElevenLabs connection, or when user mentions ElevenLabs authentication, xi-api-key, ELEVENLABS_API_KEY, or ElevenLabs setup.
- mcp-integration
- production-deployment: Production deployment patterns for ElevenLabs API including rate limiting, error handling, monitoring, and testing. Use when deploying to production, implementing rate limiting, setting up monitoring, handling errors, testing concurrency, or when user mentions production deployment, rate limits, error handling, monitoring, ElevenLabs production.
- stt-integration: ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.
- tts-integration
- vercel-ai-patterns
- voice-processing: Voice cloning workflows, voice library management, audio format conversion, and voice settings. Use when cloning voices, managing voice libraries, processing audio for voice creation, configuring voice settings, or when user mentions voice cloning, instant cloning, professional cloning, voice library, audio processing, voice settings, or ElevenLabs voices.
To use a skill:
!{skill skill-name}
Use skills when you need:
- Domain-specific templates and examples
- Validation scripts and automation
- Best practices and patterns
- Configuration generators
Skills provide pre-built resources to accelerate your work.
Security Requirements
CRITICAL: All generated files must follow security rules:
@docs/security/SECURITY-RULES.md
Key requirements:
- Never hardcode API keys or secrets
- Use placeholders:
your_service_key_here
- Protect
.env files with .gitignore
- Create
.env.example with placeholders only
- Document key acquisition for users
Arguments: $ARGUMENTS
Goal: Add comprehensive STT transcription capabilities with Scribe v1 model, supporting 99 languages, speaker diarization, word-level timestamps, and seamless Vercel AI SDK integration.
Core Principles:
- Detect framework and adapt (Next.js, React, Python, Node.js)
- Support Vercel AI SDK experimental_transcribe for TypeScript projects
- Include native ElevenLabs SDK for Python projects
- Implement file upload handling and audio processing
- Provide speaker diarization and timestamping options
Phase 1: Discovery
Goal: Understand project setup and STT requirements
Actions:
- Load SDK documentation:
@elevenlabs-documentation.md
- Check existing setup:
- SDK installed: !{bash npm list @elevenlabs/elevenlabs-js @ai-sdk/elevenlabs 2>/dev/null || pip show elevenlabs 2>/dev/null}
- Framework: @package.json or @pyproject.toml
- Authentication: @.env or @.env.local
- Parse $ARGUMENTS for preferences (language, diarization, timestamps)
Phase 2: Requirements Gathering
Goal: Clarify STT implementation details
Actions:
- If preferences not specified, use AskUserQuestion to ask:
- Which approach? (Vercel AI SDK for TypeScript, Native SDK for Python)
- Do you need speaker diarization? (identify multiple speakers)
- Timestamp granularity? (word-level, segment-level, or none)
- Audio event detection? (laughter, applause, background sounds)
- Default language? (or auto-detect from 99 supported languages)
- File upload interface? (drag-drop, file picker, URL input)
Phase 3: Planning
Goal: Design the STT implementation
Actions:
- Based on framework, plan:
- Vercel AI SDK: Use experimental_transcribe with @ai-sdk/elevenlabs provider
- Native SDK: Use client.speechToText.transcribe() method
- File upload: multipart/form-data handling
- Audio processing: format validation, size limits
- Output format: text, words array, speaker labels, timestamps
- Present plan for confirmation
Phase 4: Implementation
Goal: Build STT integration with specialized agent
Actions:
Launch the elevenlabs-stt-integrator agent to implement speech-to-text capabilities.
Provide the agent with detailed requirements:
- Context: Detected framework, SDK status, project structure
- Target: $ARGUMENTS (specific requirements)
- Requirements:
- Implement Scribe v1 transcription (99 languages, ā¤5% WER for major languages)
- Add file upload interface with audio validation
- Configure transcription options:
- Language code (auto-detect or specify)
- Speaker diarization (up to 32 speakers)
- Timestamps granularity (word or segment level)
- Audio event tagging (optional)
- For Vercel AI SDK projects:
- Use experimental_transcribe from 'ai' package
- Configure providerOptions.elevenlabs settings
- Return structured TranscriptionResult
- For Native SDK projects:
- Use client.speechToText.transcribe()
- Handle async audio processing
- Format response consistently
- Add error handling for unsupported formats, large files
- Include loading states and progress indicators
- Use progressive documentation: fetch STT docs as needed
- Expected output:
- STT component/function with all features
- File upload interface
- Transcription result display
- Example usage code
Phase 5: Verification
Goal: Ensure STT works correctly
Actions:
- Verify files created
- Check TypeScript/Python syntax:
- TypeScript: !{bash npx tsc --noEmit 2>/dev/null || echo "No check"}
- Python: !{bash python -m py_compile *.py 2>/dev/null || echo "No Python"}
- Verify imports and dependencies
- Test audio file validation logic
Phase 6: Summary
Goal: Guide user on STT usage
Actions:
- Display summary:
- Files created: [list]
- Languages supported: 99 (excellent accuracy for 12 major languages)
- Features: diarization, timestamps, audio events
- Integration: Vercel AI SDK or Native SDK
- Usage instructions:
- Upload audio file (mp3, wav, m4a, webm, etc.)
- Transcription with diarization
- Access word-level timestamps
- Speaker identification
- Show code example
- Next steps:
- Combine with TTS: /elevenlabs:add-text-to-speech
- Build voice chat: /elevenlabs:add-vercel-ai-sdk
- Add streaming: /elevenlabs:add-streaming