From fal-ai-master
Handles fal.ai STT/TTS: Whisper transcription/translation with timestamps, voice cloning via F5-TTS/XTTS/ElevenLabs, Kokoro multi-lang TTS, SRT subtitles. Provides endpoints, params, TS/Python code.
npx claudepluginhub josiahsiegel/claude-plugin-marketplace --plugin fal-ai-masterThis skill uses the workspace's default tool permissions.
| STT Model | Endpoint | Speed | Accuracy |
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
| STT Model | Endpoint | Speed | Accuracy |
|---|---|---|---|
| Whisper | fal-ai/whisper | Medium | Highest |
| Whisper Turbo | fal-ai/whisper-turbo | Fast | High |
| Whisper Large v3 | fal-ai/whisper-large-v3 | Slow | Highest |
| TTS Model | Endpoint | Voice Clone | Quality |
|---|---|---|---|
| F5-TTS | fal-ai/f5-tts | Yes | High |
| ElevenLabs | fal-ai/elevenlabs/tts | Via API | Highest |
| Kokoro | fal-ai/kokoro/american-english | No | Good |
| XTTS | fal-ai/xtts | Yes | Good |
| Whisper Task | Use Case |
|---|---|
transcribe | Same language text |
translate | Non-English → English |
| Whisper Parameter | Value |
|---|---|
chunk_level | "segment" for timestamps |
language | ISO code (e.g., "en") |
Use for audio processing:
Related skills:
fal-text-to-videofal-api-referencefal-model-guideComplete reference for speech-to-text (STT) and text-to-speech (TTS) models on fal.ai.
Endpoint: fal-ai/whisper
Best For: Accurate transcription and translation
The industry-standard speech recognition model with support for 99+ languages.
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/speech.mp3",
task: "transcribe",
language: "en",
chunk_level: "segment"
}
});
console.log(result.text);
console.log(result.chunks); // With timestamps
import fal_client
result = fal_client.subscribe(
"fal-ai/whisper",
arguments={
"audio_url": "https://example.com/speech.mp3",
"task": "transcribe",
"language": "en",
"chunk_level": "segment"
}
)
print(result["text"])
for chunk in result["chunks"]:
print(f"[{chunk['timestamp'][0]:.2f}-{chunk['timestamp'][1]:.2f}] {chunk['text']}")
Whisper Parameters:
| Parameter | Type | Values | Description |
|---|---|---|---|
audio_url | string | - | Audio file URL |
task | string | "transcribe", "translate" | Transcribe or translate to English |
language | string | ISO code | Source language (optional, auto-detected) |
chunk_level | string | "segment" | Return timestamps |
version | string | "3" | Whisper version |
Response Structure:
interface WhisperOutput {
text: string; // Full transcription
chunks?: Array<{
text: string;
timestamp: [number, number]; // [start, end] in seconds
}>;
}
Endpoint: fal-ai/whisper-turbo
Best For: Fast transcription
const result = await fal.subscribe("fal-ai/whisper-turbo", {
input: {
audio_url: "https://example.com/podcast.mp3",
task: "transcribe"
}
});
Endpoint: fal-ai/whisper-large-v3
Best For: Maximum accuracy
const result = await fal.subscribe("fal-ai/whisper-large-v3", {
input: {
audio_url: "https://example.com/meeting.mp3",
task: "transcribe",
language: "en"
}
});
Transcription with Timestamps:
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: audioUrl,
task: "transcribe",
chunk_level: "segment"
}
});
// Format as SRT subtitles
result.chunks.forEach((chunk, i) => {
const start = formatTime(chunk.timestamp[0]);
const end = formatTime(chunk.timestamp[1]);
console.log(`${i + 1}\n${start} --> ${end}\n${chunk.text}\n`);
});
function formatTime(seconds: number): string {
const h = Math.floor(seconds / 3600);
const m = Math.floor((seconds % 3600) / 60);
const s = Math.floor(seconds % 60);
const ms = Math.floor((seconds % 1) * 1000);
return `${h.toString().padStart(2, '0')}:${m.toString().padStart(2, '0')}:${s.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}
Translation (Non-English to English):
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/french-speech.mp3",
task: "translate", // Translates to English
language: "fr"
}
});
console.log(result.text); // English translation
Multi-Language Detection:
// Whisper auto-detects language if not specified
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/unknown-language.mp3",
task: "transcribe"
// language omitted - auto-detect
}
});
Endpoint: fal-ai/f5-tts
Best For: Voice cloning from reference audio
const result = await fal.subscribe("fal-ai/f5-tts", {
input: {
gen_text: "Hello! Welcome to our product demonstration. We're excited to show you what we've built.",
ref_audio_url: "https://example.com/voice-sample.wav",
ref_text: "This is a sample of my voice for cloning purposes.",
model_type: "F5-TTS"
}
});
console.log(result.audio_url);
result = fal_client.subscribe(
"fal-ai/f5-tts",
arguments={
"gen_text": "Hello! Welcome to our product.",
"ref_audio_url": "https://example.com/voice-sample.wav",
"ref_text": "This is a sample of my voice."
}
)
print(result["audio_url"])
F5-TTS Parameters:
| Parameter | Type | Description |
|---|---|---|
gen_text | string | Text to synthesize |
ref_audio_url | string | Reference voice audio URL |
ref_text | string | Transcript of reference audio |
model_type | string | "F5-TTS" or "E2-TTS" |
remove_silence | boolean | Remove silence from output |
Endpoint: fal-ai/elevenlabs/tts
Best For: Premium voice quality
const result = await fal.subscribe("fal-ai/elevenlabs/tts", {
input: {
text: "Welcome to fal.ai! Let me tell you about our amazing AI models.",
voice_id: "21m00Tcm4TlvDq8ikWAM", // ElevenLabs voice ID
model_id: "eleven_multilingual_v2"
}
});
console.log(result.audio.url);
ElevenLabs Parameters:
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
voice_id | string | ElevenLabs voice ID |
model_id | string | TTS model version |
stability | number | Voice stability (0-1) |
similarity_boost | number | Voice similarity (0-1) |
ElevenLabs Voice IDs (examples):
21m00Tcm4TlvDq8ikWAM - Rachel (female)AZnzlk1XvdvUeBnXmlld - Domi (female)EXAVITQu4vr4xnSDxMaL - Bella (female)ErXwobaYiN019PkySvjV - Antoni (male)VR6AewLTigWG4xSOukaG - Arnold (male)Endpoint: fal-ai/kokoro/american-english
Best For: Multi-language, natural sounding
const result = await fal.subscribe("fal-ai/kokoro/american-english", {
input: {
text: "This is a test of the Kokoro text-to-speech system.",
voice: "af_bella" // Voice style
}
});
console.log(result.audio.url);
Kokoro Variants:
fal-ai/kokoro/american-english - American Englishfal-ai/kokoro/british-english - British Englishfal-ai/kokoro/japanese - Japanesefal-ai/kokoro/mandarin - Mandarin ChineseKokoro Parameters:
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
voice | string | Voice style identifier |
speed | number | Speech speed multiplier |
Endpoint: fal-ai/xtts
Best For: Open-source voice cloning
const result = await fal.subscribe("fal-ai/xtts", {
input: {
text: "Hello, this is a cloned voice speaking.",
audio_url: "https://example.com/voice-reference.wav",
language: "en"
}
});
XTTS Parameters:
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
audio_url | string | Reference audio for cloning |
language | string | Target language |
| Model | Speed | Accuracy | Languages | Best For |
|---|---|---|---|---|
| Whisper | Medium | Highest | 99+ | Accuracy critical |
| Whisper Turbo | Fast | High | 99+ | Speed needed |
| Whisper Large v3 | Slow | Highest | 99+ | Maximum quality |
| Model | Quality | Voice Clone | Languages | Best For |
|---|---|---|---|---|
| F5-TTS | High | Yes | Multiple | Voice cloning |
| ElevenLabs | Highest | Via API | Many | Premium quality |
| Kokoro | Good | No | Multiple | Multi-language |
| XTTS | Good | Yes | 16 | Open-source |
async function processAudio(audioUrl: string, targetLanguage: string = 'en') {
// 1. Transcribe
const transcription = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: audioUrl,
task: "transcribe",
chunk_level: "segment"
}
});
// 2. If not English, translate
let translation = null;
if (targetLanguage === 'en') {
translation = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: audioUrl,
task: "translate"
}
});
}
return {
original: transcription.text,
translated: translation?.text,
chunks: transcription.chunks
};
}
async function cloneVoiceAndSpeak(
referenceAudioUrl: string,
referenceText: string,
textToSpeak: string
) {
// Use F5-TTS for voice cloning
const result = await fal.subscribe("fal-ai/f5-tts", {
input: {
gen_text: textToSpeak,
ref_audio_url: referenceAudioUrl,
ref_text: referenceText,
remove_silence: true
}
});
return result.audio_url;
}
async function generateSubtitles(videoUrl: string): Promise<string> {
// Extract audio and transcribe
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: videoUrl, // Works with video URLs too
task: "transcribe",
chunk_level: "segment"
}
});
// Generate SRT format
let srt = '';
result.chunks.forEach((chunk, i) => {
srt += `${i + 1}\n`;
srt += `${formatSrtTime(chunk.timestamp[0])} --> ${formatSrtTime(chunk.timestamp[1])}\n`;
srt += `${chunk.text}\n\n`;
});
return srt;
}
function formatSrtTime(seconds: number): string {
const date = new Date(seconds * 1000);
return date.toISOString().substr(11, 12).replace('.', ',');
}
async function generateAudioBook(chapters: string[], voiceId: string) {
const audioUrls = [];
for (const chapter of chapters) {
// Split into manageable chunks
const chunks = splitText(chapter, 5000);
for (const chunk of chunks) {
const result = await fal.subscribe("fal-ai/elevenlabs/tts", {
input: {
text: chunk,
voice_id: voiceId,
model_id: "eleven_multilingual_v2"
}
});
audioUrls.push(result.audio.url);
}
}
return audioUrls;
}
function splitText(text: string, maxLength: number): string[] {
const chunks = [];
let current = '';
text.split('. ').forEach(sentence => {
if ((current + sentence).length < maxLength) {
current += sentence + '. ';
} else {
chunks.push(current.trim());
current = sentence + '. ';
}
});
if (current) chunks.push(current.trim());
return chunks;
}
interface STTInput {
audio_url: string;
task?: "transcribe" | "translate";
language?: string; // ISO 639-1 code
chunk_level?: "segment";
version?: string;
}
interface TTSInput {
// Common
text?: string;
gen_text?: string;
// Voice cloning
ref_audio_url?: string;
ref_text?: string;
audio_url?: string; // XTTS
// Voice selection
voice_id?: string; // ElevenLabs
voice?: string; // Kokoro
model_type?: string; // F5-TTS
// Control
speed?: number;
stability?: number;
similarity_boost?: number;
language?: string;
remove_silence?: boolean;
}
chunk_level: "segment" for subtitlestask: "translate" for non-English to English| Language | Code | STT | TTS |
|---|---|---|---|
| English | en | Yes | Yes |
| Spanish | es | Yes | Yes |
| French | fr | Yes | Yes |
| German | de | Yes | Yes |
| Italian | it | Yes | Yes |
| Portuguese | pt | Yes | Yes |
| Japanese | ja | Yes | Yes |
| Chinese | zh | Yes | Yes |
| Korean | ko | Yes | Yes |
| Russian | ru | Yes | Limited |
| Format | Extension | Supported |
|---|---|---|
| MP3 | .mp3 | Yes |
| WAV | .wav | Yes |
| M4A | .m4a | Yes |
| FLAC | .flac | Yes |
| OGG | .ogg | Yes |
| WebM | .webm | Yes |
| Video | .mp4 | Yes (audio extracted) |
| Model | Output Format |
|---|---|
| F5-TTS | WAV |
| ElevenLabs | MP3 |
| Kokoro | WAV |
| XTTS | WAV |
try {
const result = await fal.subscribe("fal-ai/whisper", {
input: { audio_url: audioUrl, task: "transcribe" }
});
} catch (error) {
if (error.status === 400) {
console.error("Invalid audio file or URL");
} else if (error.status === 413) {
console.error("Audio file too large");
} else {
console.error("Transcription failed:", error.message);
}
}