Skill

transcribe-video

Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles, or wants a text version of what's said in a video. Also trigger on "提取字幕", "视频转文字", "语音转文字", "transcribe", "extract audio text", or when the user references getting a script/transcript from any video file (mp4, mkv, mov, avi, webm). This skill is for LOCAL video files — for YouTube or other online URLs, use the download-video skill first to get the file, then transcribe it.

npx claudepluginhub feiskyer/video-skills --plugin video-skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.

Supporting Assets

scripts/transcribe.py

SKILL.md

Similar Skills

transcribe-video

Generates SRT/VTT subtitles and plain text transcripts from video or audio files using AWS Transcribe and ffmpeg. Useful for captions, extracting speech, notes, or searchable content.

5 tools

startup

video-processor

284

Downloads YouTube/other videos via yt-dlp, extracts/converts audio via FFmpeg (mp4, webm, wav), transcribes with OpenAI Whisper via CLI script.

1 file

open-source-prep

translate-video

Translates video subtitles to any language (e.g., Hebrew, Arabic) via pipeline: transcribe audio, translate with context, refine semantically, embed RTL-safe subtitles.

1 file

translate-video

Stats

Stars11

Forks2

Last CommitApr 15, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Transcribe Video

Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.

Step 1: Identify the video file

Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.

Step 2: Check for embedded subtitles

ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"

If subtitle streams exist → go to Step 3a (extract embedded subtitles)
If no subtitle streams → go to Step 3b (API transcription)

Step 3a: Extract embedded subtitles

If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.

# Extract as SRT (stream index 0 for first subtitle track; adjust if needed)
ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y

After extraction, convert SRT to clean text:

Remove sequence numbers
Remove timestamp lines (lines matching \d{2}:\d{2}:\d{2})
Remove HTML-like tags (<i>, </i>, etc.)
Join remaining non-empty lines

Save the clean transcript to <video_name>.txt next to the video file. Done — skip Step 3b.

Step 3b: API-based transcription

Use the bundled transcription script. It reads credentials from ~/.transcribe_video.env.

Prerequisites check

Verify the env file exists:

test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING"

If MISSING, tell the user to create ~/.transcribe_video.env with:

OPENAI_API_KEY=your-key-here
# Optional Base URL:
# OPENAI_API_BASE=https://<base-url>/v1/
# Optional Model Name:
# TRANSCRIBE_MODEL=gpt-4o-transcribe

Wait for the user to confirm before proceeding.

Verify dependencies:

python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1

If missing: pip install openai python-dotenv

Run transcription

python3 <skill_directory>/scripts/transcribe.py "<video_path>"

The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to <video_name>.txt next to the video file.

Step 4: Report results

Tell the user:

Where the transcript file was saved
How many lines / approximate word count
Whether it came from embedded subtitles or API transcription
Display the first few lines as a preview