Use when "Whisper", "speech-to-text", "audio transcription", "ASR", or asking about "transcribe audio", "podcast transcription", "voice recognition", "multilingual speech", "subtitle generation"
Transcribes audio files to text using OpenAI's Whisper model with multilingual support.
/plugin marketplace add eyadsibai/ltk/plugin install ltk@ltk-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
OpenAI's multilingual speech recognition model.
pip install -U openai-whisper
# Requires ffmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
# Access segments with timestamps
for segment in result["segments"]:
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")
| Model | Parameters | Speed | VRAM |
|---|---|---|---|
| tiny | 39M | ~32x | ~1 GB |
| base | 74M | ~16x | ~1 GB |
| small | 244M | ~6x | ~2 GB |
| medium | 769M | ~2x | ~5 GB |
| large | 1550M | 1x | ~10 GB |
| turbo | 809M | ~8x | ~6 GB |
Recommendation: Use turbo for best speed/quality, base for prototyping.
model = whisper.load_model("turbo")
# Auto-detect
result = model.transcribe("audio.mp3")
# Specify language (faster)
result = model.transcribe("audio.mp3", language="en")
result = model.transcribe("spanish.mp3", task="translate")
# Input: Spanish audio -> Output: English text
result = model.transcribe(
"audio.mp3",
initial_prompt="This is a technical podcast about machine learning and AI."
)
result = model.transcribe("audio.mp3", word_timestamps=True)
for segment in result["segments"]:
for word in segment["words"]:
print(f"{word['word']} ({word['start']:.2f}s - {word['end']:.2f}s)")
# Basic transcription
whisper audio.mp3 --model turbo
# Output formats
whisper audio.mp3 --output_format srt # Subtitles
whisper audio.mp3 --output_format vtt # WebVTT
whisper audio.mp3 --output_format json # JSON with timestamps
# Translation
whisper spanish.mp3 --task translate
# Automatically uses GPU if available
model = whisper.load_model("turbo")
# Force CPU
model = whisper.load_model("turbo", device="cpu")
# Force GPU (10-20x faster)
model = whisper.load_model("turbo", device="cuda")
# pip install faster-whisper
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
from langchain.document_loaders import WhisperTranscriptionLoader
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
loader = WhisperTranscriptionLoader(file_path="audio.mp3")
docs = loader.load()
# Use in RAG
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
This skill should be used when the user asks about libraries, frameworks, API references, or needs code examples. Activates for setup questions, code generation involving libraries, or mentions of specific frameworks like React, Vue, Next.js, Prisma, Supabase, etc.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.