From elevenlabs-pack
Implements ElevenLabs APIs for speech-to-speech voice conversion, text-to-sound-effects, audio noise removal, and transcription. For voice changing, SFX generation, or audio cleanup tasks.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin elevenlabs-packThis skill is limited to using the following tools:
Secondary ElevenLabs workflows beyond TTS: (1) Speech-to-Speech voice conversion, (2) Sound Effects generation from text descriptions, (3) Audio Isolation for noise removal, and (4) Speech-to-Text transcription.
Provides text-to-speech, speech-to-text with diarization, voice conversion, and audio processing via EachLabs API using ElevenLabs TTS, Whisper, and RVC models. Useful for TTS, transcription, or voice tasks.
Integrates RunwayML SDK for audio APIs (TTS, sound effects, voice isolation, dubbing) into Node.js and Python server-side code.
Implements ElevenLabs TTS with voice settings, instant voice cloning from audio samples, and WebSocket streaming. For building voice generation features.
Share bugs, ideas, or general feedback.
Secondary ElevenLabs workflows beyond TTS: (1) Speech-to-Speech voice conversion, (2) Sound Effects generation from text descriptions, (3) Audio Isolation for noise removal, and (4) Speech-to-Text transcription.
elevenlabs-install-auth setupTransform audio from one voice to another using POST /v1/speech-to-speech/{voice_id}:
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";
import { Readable } from "stream";
import { pipeline } from "stream/promises";
const client = new ElevenLabsClient();
async function speechToSpeech(
sourceAudioPath: string,
targetVoiceId: string,
outputPath: string
) {
const audio = await client.speechToSpeech.convert(targetVoiceId, {
audio: createReadStream(sourceAudioPath),
model_id: "eleven_english_sts_v2", // STS-specific model
voice_settings: JSON.stringify({
stability: 0.5,
similarity_boost: 0.8,
style: 0.0,
}),
remove_background_noise: true, // Built-in noise removal
});
await pipeline(Readable.fromWeb(audio as any), createWriteStream(outputPath));
console.log(`Voice-converted audio saved to ${outputPath}`);
}
// Convert your voice recording to sound like "Rachel"
await speechToSpeech(
"my_recording.mp3",
"21m00Tcm4TlvDq8ikWAM",
"converted.mp3"
);
cURL equivalent:
curl -X POST "https://api.elevenlabs.io/v1/speech-to-speech/21m00Tcm4TlvDq8ikWAM" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-F "audio=@my_recording.mp3" \
-F "model_id=eleven_english_sts_v2" \
-F 'voice_settings={"stability":0.5,"similarity_boost":0.8}' \
-F "remove_background_noise=true" \
--output converted.mp3
Generate cinematic sound effects from text descriptions using POST /v1/sound-generation:
async function generateSoundEffect(
description: string,
outputPath: string,
options?: {
duration?: number; // 0.5-30 seconds (null = auto)
promptInfluence?: number; // 0-1 (default 0.3, higher = follows prompt more closely)
loop?: boolean; // Seamless looping (default false)
}
) {
const audio = await client.textToSoundEffects.convert({
text: description,
duration_seconds: options?.duration,
prompt_influence: options?.promptInfluence ?? 0.3,
// model_id: "eleven_text_to_sound_v2", // default
});
await pipeline(Readable.fromWeb(audio as any), createWriteStream(outputPath));
console.log(`Sound effect saved to ${outputPath}`);
}
// Generate various sound effects
await generateSoundEffect(
"Heavy rain on a tin roof with distant thunder",
"rain.mp3",
{ duration: 10, promptInfluence: 0.6 }
);
await generateSoundEffect(
"Sci-fi laser gun firing three quick bursts",
"laser.mp3",
{ duration: 3, promptInfluence: 0.8 }
);
await generateSoundEffect(
"Gentle forest ambiance with birds chirping",
"forest_loop.mp3",
{ duration: 15, loop: true } // Seamless loop for background audio
);
cURL equivalent:
curl -X POST "https://api.elevenlabs.io/v1/sound-generation" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"text": "Heavy rain on a tin roof with distant thunder",
"duration_seconds": 10,
"prompt_influence": 0.6
}' \
--output rain.mp3
Remove background noise from audio using POST /v1/audio-isolation:
async function isolateVoice(
noisyAudioPath: string,
cleanOutputPath: string
) {
const cleanAudio = await client.audioIsolation.audioIsolation({
audio: createReadStream(noisyAudioPath),
});
await pipeline(
Readable.fromWeb(cleanAudio as any),
createWriteStream(cleanOutputPath)
);
console.log(`Clean audio saved to ${cleanOutputPath}`);
}
// Remove background noise from a recording
await isolateVoice("noisy_interview.mp3", "clean_interview.mp3");
Streaming variant for large files (POST /v1/audio-isolation/stream):
async function isolateVoiceStreaming(
noisyAudioPath: string,
cleanOutputPath: string
) {
const stream = await client.audioIsolation.audioIsolationStream({
audio: createReadStream(noisyAudioPath),
});
const writer = createWriteStream(cleanOutputPath);
for await (const chunk of stream) {
writer.write(chunk);
}
writer.end();
}
cURL equivalent:
curl -X POST "https://api.elevenlabs.io/v1/audio-isolation" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-F "audio=@noisy_interview.mp3" \
--output clean_interview.mp3
Transcribe audio with speaker diarization using POST /v1/speech-to-text:
async function transcribeAudio(audioPath: string) {
const result = await client.speechToText.convert({
audio: createReadStream(audioPath),
model_id: "scribe_v1", // ElevenLabs' STT model
// language_code: "en", // Optional: force language
// diarize: true, // Enable speaker detection
// timestamps_granularity: "word", // "word" or "character"
});
console.log("Transcription:", result.text);
// Word-level timestamps
if (result.words) {
for (const word of result.words) {
console.log(`[${word.start.toFixed(2)}-${word.end.toFixed(2)}] ${word.text}`);
}
}
return result;
}
await transcribeAudio("podcast_episode.mp3");
| Feature | Method | Endpoint | Billing |
|---|---|---|---|
| Speech-to-Speech | POST | /v1/speech-to-speech/{voice_id} | Per character |
| Sound Effects | POST | /v1/sound-generation | Per generation |
| Audio Isolation | POST | /v1/audio-isolation | 1,000 chars/min of audio |
| Audio Isolation Stream | POST | /v1/audio-isolation/stream | 1,000 chars/min of audio |
| Speech-to-Text | POST | /v1/speech-to-text | Per audio minute |
prompt_influence: 0.6-0.8 for precise results, 0.2-0.4 for creative variation| Aspect | Limit |
|---|---|
| Max file size | 500 MB |
| Max duration | 1 hour |
| Supported formats | MP3, WAV, M4A, FLAC, OGG, WEBM |
| PCM optimization | Use file_format: "pcm_s16le_16" for lowest latency |
| Error | HTTP | Cause | Solution |
|---|---|---|---|
model_can_not_do_voice_conversion | 400 | Wrong model for STS | Use eleven_english_sts_v2 |
audio_too_short | 400 | STS input under 1 second | Use longer audio clip |
audio_too_long | 400 | STS input over limit | Trim to under 5 minutes |
invalid_sound_prompt | 400 | Nonsensical SFX description | Write descriptive, specific prompts |
file_too_large | 413 | Audio isolation over 500MB | Compress or split the file |
quota_exceeded | 401 | Character/generation limit hit | Check usage dashboard |
For common errors, see elevenlabs-common-errors. For SDK patterns, see elevenlabs-sdk-patterns.