From video-skills
Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles, or wants a text version of what's said in a video. Also trigger on "提取字幕", "视频转文字", "语音转文字", "transcribe", "extract audio text", or when the user references getting a script/transcript from any video file (mp4, mkv, mov, avi, webm). This skill is for LOCAL video files — for YouTube or other online URLs, use the download-video skill first to get the file, then transcribe it.
npx claudepluginhub feiskyer/video-skills --plugin video-skillsThis skill uses the workspace's default tool permissions.
Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.
Generates SRT/VTT subtitles and plain text transcripts from video or audio files using AWS Transcribe and ffmpeg. Useful for captions, extracting speech, notes, or searchable content.
Downloads YouTube/other videos via yt-dlp, extracts/converts audio via FFmpeg (mp4, webm, wav), transcribes with OpenAI Whisper via CLI script.
Translates video subtitles to any language (e.g., Hebrew, Arabic) via pipeline: transcribe audio, translate with context, refine semantically, embed RTL-safe subtitles.
Share bugs, ideas, or general feedback.
Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.
Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.
ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"
If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.
# Extract as SRT (stream index 0 for first subtitle track; adjust if needed)
ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y
After extraction, convert SRT to clean text:
\d{2}:\d{2}:\d{2})<i>, </i>, etc.)Save the clean transcript to <video_name>.txt next to the video file. Done — skip Step 3b.
Use the bundled transcription script. It reads credentials from ~/.transcribe_video.env.
Verify the env file exists:
test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING"
If MISSING, tell the user to create ~/.transcribe_video.env with:
OPENAI_API_KEY=your-key-here
# Optional Base URL:
# OPENAI_API_BASE=https://<base-url>/v1/
# Optional Model Name:
# TRANSCRIBE_MODEL=gpt-4o-transcribe
Wait for the user to confirm before proceeding.
Verify dependencies:
python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1
If missing: pip install openai python-dotenv
python3 <skill_directory>/scripts/transcribe.py "<video_path>"
The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to <video_name>.txt next to the video file.
Tell the user: