From startup
Generates SRT/VTT subtitles and plain text transcripts from video or audio files using AWS Transcribe and ffmpeg. Useful for captions, extracting speech, notes, or searchable content.
How this skill is triggered — by the user, by Claude, or both
Slash command
/startup:transcribe-videoThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate subtitles and transcripts from `$ARGUMENTS` (a video or audio file path, optionally followed by a language code like `en-US` or `es-ES`) using AWS Transcribe.
Generate subtitles and transcripts from $ARGUMENTS (a video or audio file path, optionally followed by a language code like en-US or es-ES) using AWS Transcribe.
Outputs .srt, .vtt, and .txt files next to the source file.
ffmpeg and aws CLI are installed and configuredffmpeg installed (brew install ffmpeg)aws CLI installed and configured with valid credentials (brew install awscli && aws configure)s3:* (create/delete buckets), transcribe:* (start/delete jobs)ffmpeg -i "input.mp4" -vn -acodec mp3 -q:a 2 "/tmp/transcribe-audio.mp3" -y
BUCKET="tmp-transcribe-$(date +%s)"
aws s3 mb "s3://$BUCKET" --region us-east-1
aws s3 cp "/tmp/transcribe-audio.mp3" "s3://$BUCKET/audio.mp3"
JOB_NAME="tmp-job-$(date +%s)"
aws transcribe start-transcription-job \
--transcription-job-name "$JOB_NAME" \
--language-code en-US \
--media-format mp3 \
--media "MediaFileUri=s3://$BUCKET/audio.mp3" \
--subtitles "Formats=srt,vtt" \
--output-bucket-name "$BUCKET" \
--region us-east-1
Language codes: en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP, zh-CN, it-IT, ko-KR, etc. Default to en-US if not specified.
while true; do
STATUS=$(aws transcribe get-transcription-job \
--transcription-job-name "$JOB_NAME" \
--region us-east-1 \
--query 'TranscriptionJob.TranscriptionJobStatus' \
--output text)
if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then break; fi
sleep 5
done
Save .srt and .vtt next to the original file:
aws s3 cp "s3://$BUCKET/$JOB_NAME.srt" "/path/to/input.srt"
aws s3 cp "s3://$BUCKET/$JOB_NAME.vtt" "/path/to/input.vtt"
Download the JSON result and extract the full transcript text:
aws s3 cp "s3://$BUCKET/$JOB_NAME.json" "/tmp/transcribe-result.json"
Then use a tool to extract the .results.transcripts[0].transcript field from the JSON and save it as a .txt file next to the original.
IMPORTANT: Always clean up to avoid recurring S3 storage costs.
# Delete S3 bucket and all contents
aws s3 rb "s3://$BUCKET" --force --region us-east-1
# Delete the transcription job
aws transcribe delete-transcription-job --transcription-job-name "$JOB_NAME" --region us-east-1
# Delete temp audio file
rm -f "/tmp/transcribe-audio.mp3" "/tmp/transcribe-result.json"
From actual transcription runs:
| Video | Duration | Audio Size | Transcribe Time | Subtitle Segments |
|---|---|---|---|---|
| X/Twitter clip | 2:40 | 2.5 MB | ~20 seconds | 83 |
| Screen recording | 18:45 | 11.4 MB | ~60 seconds | 500+ |
original-video.mp4
original-video.srt # Subtitles with timestamps (most compatible)
original-video.vtt # Web-optimized subtitles (for HTML5 <track>)
original-video.txt # Plain text transcript (no timestamps)
ls -lh /path/to/original-video.{srt,vtt,txt}npx claudepluginhub rameerez/claude-code-startup-skillsExtract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles, or wants a text version of what's said in a video. Also trigger on "提取字幕", "视频转文字", "语音转文字", "transcribe", "extract audio text", or when the user references getting a script/transcript from any video file (mp4, mkv, mov, avi, webm). This skill is for LOCAL video files — for YouTube or other online URLs, use the download-video skill first to get the file, then transcribe it.
Transcribes audio/video files to SRT subtitles using ElevenLabs Scribe v2 API. Supports language detection/specification, custom max words/duration/chars per subtitle, and optional JSON output.
Transcribes YouTube/podcast/audio URLs to clean text using auto-captions or local whisper-cpp with Silero VAD. Provides verbatim transcripts as source-of-truth artifacts for research and quote extraction.