Text-to-speech and speech-to-text using fal.ai audio models. Use when the user requests "Convert text to speech", "Transcribe audio", "Generate voice", "Speech to text", "TTS", "STT", or similar audio tasks.
npx claudepluginhub joshuarweaver/cascade-content-creation-misc-1 --plugin fal-ai-community-skillsThis skill uses the workspace's default tool permissions.
Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.
To discover the best and latest audio models, use the search API:
# Search for text-to-speech models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "text-to-speech"
# Search for speech-to-text models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --category "speech-to-text"
# Search for music generation models
bash /mnt/skills/user/fal-generate/scripts/search-models.sh --query "music generation"
Or use the search_models MCP tool with relevant keywords like "tts", "speech", "music".
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [options]
Arguments:
--text - Text to convert to speech (required)--model - TTS model (defaults to fal-ai/minimax/speech-2.8-turbo)--voice - Voice ID or name (model-specific)Examples:
# Basic TTS (fast, good quality)
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Hello, welcome to the future of AI."
# High quality with MiniMax HD
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "This is premium quality speech." \
--model "fal-ai/minimax/speech-2.8-hd"
# Natural voices with ElevenLabs
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Natural sounding voice generation" \
--model "fal-ai/elevenlabs/tts/eleven-v3"
# Multi-language TTS
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
--text "Bonjour, bienvenue dans le futur." \
--model "fal-ai/chatterbox/text-to-speech/multilingual"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [options]
Arguments:
--audio-url - URL of audio file to transcribe (required)--model - STT model (defaults to fal-ai/whisper)--language - Language code (optional, auto-detected)Examples:
# Transcribe with Whisper
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/audio.mp3"
# Transcribe with speaker diarization
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/meeting.mp3" \
--model "fal-ai/elevenlabs/speech-to-text/scribe-v2"
# Transcribe specific language
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
--audio-url "https://example.com/spanish.mp3" \
--language "es"
Use search_models MCP tool or search-models.sh to find the best current model, then call mcp__fal-ai__generate with the discovered modelId.
Generating speech...
Model: fal-ai/minimax/speech-2.8-turbo
Speech generated!
Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s
Transcribing audio...
Model: fal-ai/whisper
Transcription complete!
Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en
Here's the generated speech:
[Download audio](https://v3.fal.media/files/.../speech.mp3)
• Duration: 5.2s | Model: Maya TTS
Here's the transcription:
"Hello, this is the transcribed text from the audio file."
• Duration: 12.5s | Language: English
text-to-speech category. Consider quality vs speed tradeoffs.music generation. Some models specialize in vocals, others in instrumental.speech-to-text category. Consider whether you need speaker diarization or multi-language support.Error: Generated audio is empty
Check that your text is not empty and contains valid content.
Error: Audio format not supported
Supported formats: MP3, WAV, M4A, FLAC, OGG
Convert your audio to a supported format.
Warning: Could not detect language, defaulting to English
Specify the language explicitly with --language option.