Complete fal.ai audio system. PROACTIVELY activate for: (1) Whisper speech-to-text, (2) Transcription with timestamps, (3) Translation to English, (4) F5-TTS voice cloning, (5) ElevenLabs premium TTS, (6) Kokoro multi-language TTS, (7) XTTS open-source cloning, (8) Subtitle generation (SRT), (9) Audio file formats. Provides: STT/TTS endpoints, language codes, voice cloning setup, timestamp formatting. Ensures accurate transcription and natural speech synthesis.
/plugin marketplace add JosiahSiegel/claude-plugin-marketplace/plugin install fal-ai-master@claude-plugin-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
| STT Model | Endpoint | Speed | Accuracy |
|---|---|---|---|
| Whisper | fal-ai/whisper | Medium | Highest |
| Whisper Turbo | fal-ai/whisper-turbo | Fast | High |
| Whisper Large v3 | fal-ai/whisper-large-v3 | Slow | Highest |
| TTS Model | Endpoint | Voice Clone | Quality |
|---|---|---|---|
| F5-TTS | fal-ai/f5-tts | Yes | High |
| ElevenLabs | fal-ai/elevenlabs/tts | Via API | Highest |
| Kokoro | fal-ai/kokoro/american-english | No | Good |
| XTTS | fal-ai/xtts | Yes | Good |
| Whisper Task | Use Case |
|---|---|
transcribe | Same language text |
translate | Non-English → English |
| Whisper Parameter | Value |
|---|---|
chunk_level | "segment" for timestamps |
language | ISO code (e.g., "en") |
Use for audio processing:
Related skills:
fal-text-to-videofal-api-referencefal-model-guideComplete reference for speech-to-text (STT) and text-to-speech (TTS) models on fal.ai.
Endpoint: fal-ai/whisper
Best For: Accurate transcription and translation
The industry-standard speech recognition model with support for 99+ languages.
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/speech.mp3",
task: "transcribe",
language: "en",
chunk_level: "segment"
}
});
console.log(result.text);
console.log(result.chunks); // With timestamps
import fal_client
result = fal_client.subscribe(
"fal-ai/whisper",
arguments={
"audio_url": "https://example.com/speech.mp3",
"task": "transcribe",
"language": "en",
"chunk_level": "segment"
}
)
print(result["text"])
for chunk in result["chunks"]:
print(f"[{chunk['timestamp'][0]:.2f}-{chunk['timestamp'][1]:.2f}] {chunk['text']}")
Whisper Parameters:
| Parameter | Type | Values | Description |
|---|---|---|---|
audio_url | string | - | Audio file URL |
task | string | "transcribe", "translate" | Transcribe or translate to English |
language | string | ISO code | Source language (optional, auto-detected) |
chunk_level | string | "segment" | Return timestamps |
version | string | "3" | Whisper version |
Response Structure:
interface WhisperOutput {
text: string; // Full transcription
chunks?: Array<{
text: string;
timestamp: [number, number]; // [start, end] in seconds
}>;
}
Endpoint: fal-ai/whisper-turbo
Best For: Fast transcription
const result = await fal.subscribe("fal-ai/whisper-turbo", {
input: {
audio_url: "https://example.com/podcast.mp3",
task: "transcribe"
}
});
Endpoint: fal-ai/whisper-large-v3
Best For: Maximum accuracy
const result = await fal.subscribe("fal-ai/whisper-large-v3", {
input: {
audio_url: "https://example.com/meeting.mp3",
task: "transcribe",
language: "en"
}
});
Transcription with Timestamps:
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: audioUrl,
task: "transcribe",
chunk_level: "segment"
}
});
// Format as SRT subtitles
result.chunks.forEach((chunk, i) => {
const start = formatTime(chunk.timestamp[0]);
const end = formatTime(chunk.timestamp[1]);
console.log(`${i + 1}\n${start} --> ${end}\n${chunk.text}\n`);
});
function formatTime(seconds: number): string {
const h = Math.floor(seconds / 3600);
const m = Math.floor((seconds % 3600) / 60);
const s = Math.floor(seconds % 60);
const ms = Math.floor((seconds % 1) * 1000);
return `${h.toString().padStart(2, '0')}:${m.toString().padStart(2, '0')}:${s.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}
Translation (Non-English to English):
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/french-speech.mp3",
task: "translate", // Translates to English
language: "fr"
}
});
console.log(result.text); // English translation
Multi-Language Detection:
// Whisper auto-detects language if not specified
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/unknown-language.mp3",
task: "transcribe"
// language omitted - auto-detect
}
});
Endpoint: fal-ai/f5-tts
Best For: Voice cloning from reference audio
const result = await fal.subscribe("fal-ai/f5-tts", {
input: {
gen_text: "Hello! Welcome to our product demonstration. We're excited to show you what we've built.",
ref_audio_url: "https://example.com/voice-sample.wav",
ref_text: "This is a sample of my voice for cloning purposes.",
model_type: "F5-TTS"
}
});
console.log(result.audio_url);
result = fal_client.subscribe(
"fal-ai/f5-tts",
arguments={
"gen_text": "Hello! Welcome to our product.",
"ref_audio_url": "https://example.com/voice-sample.wav",
"ref_text": "This is a sample of my voice."
}
)
print(result["audio_url"])
F5-TTS Parameters:
| Parameter | Type | Description |
|---|---|---|
gen_text | string | Text to synthesize |
ref_audio_url | string | Reference voice audio URL |
ref_text | string | Transcript of reference audio |
model_type | string | "F5-TTS" or "E2-TTS" |
remove_silence | boolean | Remove silence from output |
Endpoint: fal-ai/elevenlabs/tts
Best For: Premium voice quality
const result = await fal.subscribe("fal-ai/elevenlabs/tts", {
input: {
text: "Welcome to fal.ai! Let me tell you about our amazing AI models.",
voice_id: "21m00Tcm4TlvDq8ikWAM", // ElevenLabs voice ID
model_id: "eleven_multilingual_v2"
}
});
console.log(result.audio.url);
ElevenLabs Parameters:
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
voice_id | string | ElevenLabs voice ID |
model_id | string | TTS model version |
stability | number | Voice stability (0-1) |
similarity_boost | number | Voice similarity (0-1) |
ElevenLabs Voice IDs (examples):
21m00Tcm4TlvDq8ikWAM - Rachel (female)AZnzlk1XvdvUeBnXmlld - Domi (female)EXAVITQu4vr4xnSDxMaL - Bella (female)ErXwobaYiN019PkySvjV - Antoni (male)VR6AewLTigWG4xSOukaG - Arnold (male)Endpoint: fal-ai/kokoro/american-english
Best For: Multi-language, natural sounding
const result = await fal.subscribe("fal-ai/kokoro/american-english", {
input: {
text: "This is a test of the Kokoro text-to-speech system.",
voice: "af_bella" // Voice style
}
});
console.log(result.audio.url);
Kokoro Variants:
fal-ai/kokoro/american-english - American Englishfal-ai/kokoro/british-english - British Englishfal-ai/kokoro/japanese - Japanesefal-ai/kokoro/mandarin - Mandarin ChineseKokoro Parameters:
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
voice | string | Voice style identifier |
speed | number | Speech speed multiplier |
Endpoint: fal-ai/xtts
Best For: Open-source voice cloning
const result = await fal.subscribe("fal-ai/xtts", {
input: {
text: "Hello, this is a cloned voice speaking.",
audio_url: "https://example.com/voice-reference.wav",
language: "en"
}
});
XTTS Parameters:
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
audio_url | string | Reference audio for cloning |
language | string | Target language |
| Model | Speed | Accuracy | Languages | Best For |
|---|---|---|---|---|
| Whisper | Medium | Highest | 99+ | Accuracy critical |
| Whisper Turbo | Fast | High | 99+ | Speed needed |
| Whisper Large v3 | Slow | Highest | 99+ | Maximum quality |
| Model | Quality | Voice Clone | Languages | Best For |
|---|---|---|---|---|
| F5-TTS | High | Yes | Multiple | Voice cloning |
| ElevenLabs | Highest | Via API | Many | Premium quality |
| Kokoro | Good | No | Multiple | Multi-language |
| XTTS | Good | Yes | 16 | Open-source |
async function processAudio(audioUrl: string, targetLanguage: string = 'en') {
// 1. Transcribe
const transcription = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: audioUrl,
task: "transcribe",
chunk_level: "segment"
}
});
// 2. If not English, translate
let translation = null;
if (targetLanguage === 'en') {
translation = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: audioUrl,
task: "translate"
}
});
}
return {
original: transcription.text,
translated: translation?.text,
chunks: transcription.chunks
};
}
async function cloneVoiceAndSpeak(
referenceAudioUrl: string,
referenceText: string,
textToSpeak: string
) {
// Use F5-TTS for voice cloning
const result = await fal.subscribe("fal-ai/f5-tts", {
input: {
gen_text: textToSpeak,
ref_audio_url: referenceAudioUrl,
ref_text: referenceText,
remove_silence: true
}
});
return result.audio_url;
}
async function generateSubtitles(videoUrl: string): Promise<string> {
// Extract audio and transcribe
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: videoUrl, // Works with video URLs too
task: "transcribe",
chunk_level: "segment"
}
});
// Generate SRT format
let srt = '';
result.chunks.forEach((chunk, i) => {
srt += `${i + 1}\n`;
srt += `${formatSrtTime(chunk.timestamp[0])} --> ${formatSrtTime(chunk.timestamp[1])}\n`;
srt += `${chunk.text}\n\n`;
});
return srt;
}
function formatSrtTime(seconds: number): string {
const date = new Date(seconds * 1000);
return date.toISOString().substr(11, 12).replace('.', ',');
}
async function generateAudioBook(chapters: string[], voiceId: string) {
const audioUrls = [];
for (const chapter of chapters) {
// Split into manageable chunks
const chunks = splitText(chapter, 5000);
for (const chunk of chunks) {
const result = await fal.subscribe("fal-ai/elevenlabs/tts", {
input: {
text: chunk,
voice_id: voiceId,
model_id: "eleven_multilingual_v2"
}
});
audioUrls.push(result.audio.url);
}
}
return audioUrls;
}
function splitText(text: string, maxLength: number): string[] {
const chunks = [];
let current = '';
text.split('. ').forEach(sentence => {
if ((current + sentence).length < maxLength) {
current += sentence + '. ';
} else {
chunks.push(current.trim());
current = sentence + '. ';
}
});
if (current) chunks.push(current.trim());
return chunks;
}
interface STTInput {
audio_url: string;
task?: "transcribe" | "translate";
language?: string; // ISO 639-1 code
chunk_level?: "segment";
version?: string;
}
interface TTSInput {
// Common
text?: string;
gen_text?: string;
// Voice cloning
ref_audio_url?: string;
ref_text?: string;
audio_url?: string; // XTTS
// Voice selection
voice_id?: string; // ElevenLabs
voice?: string; // Kokoro
model_type?: string; // F5-TTS
// Control
speed?: number;
stability?: number;
similarity_boost?: number;
language?: string;
remove_silence?: boolean;
}
chunk_level: "segment" for subtitlestask: "translate" for non-English to English| Language | Code | STT | TTS |
|---|---|---|---|
| English | en | Yes | Yes |
| Spanish | es | Yes | Yes |
| French | fr | Yes | Yes |
| German | de | Yes | Yes |
| Italian | it | Yes | Yes |
| Portuguese | pt | Yes | Yes |
| Japanese | ja | Yes | Yes |
| Chinese | zh | Yes | Yes |
| Korean | ko | Yes | Yes |
| Russian | ru | Yes | Limited |
| Format | Extension | Supported |
|---|---|---|
| MP3 | .mp3 | Yes |
| WAV | .wav | Yes |
| M4A | .m4a | Yes |
| FLAC | .flac | Yes |
| OGG | .ogg | Yes |
| WebM | .webm | Yes |
| Video | .mp4 | Yes (audio extracted) |
| Model | Output Format |
|---|---|
| F5-TTS | WAV |
| ElevenLabs | MP3 |
| Kokoro | WAV |
| XTTS | WAV |
try {
const result = await fal.subscribe("fal-ai/whisper", {
input: { audio_url: audioUrl, task: "transcribe" }
});
} catch (error) {
if (error.status === 400) {
console.error("Invalid audio file or URL");
} else if (error.status === 413) {
console.error("Audio file too large");
} else {
console.error("Transcription failed:", error.message);
}
}
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.