Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) Burning subtitles (hardcoding SRT/ASS/VTT), (2) Adding soft subtitle tracks, (3) Extracting subtitles from video, (4) Subtitle format conversion, (5) Styled captions (font, color, outline, shadow), (6) Subtitle positioning and alignment, (7) CEA-608/708 closed captions, (8) Text overlays with drawtext, (9) Whisper AI automatic transcription (FFmpeg 8.0+ with VAD, multi-language, GPU), (10) Batch subtitle processing. Provides: Format reference tables, styling parameter guide, position alignment charts, Whisper model comparison, VAD configuration, dynamic text examples, accessibility best practices. Ensures: Professional captions with proper styling and accessibility compliance.
/plugin marketplace add JosiahSiegel/claude-plugin-marketplace/plugin install ffmpeg-master@claude-plugin-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).
NEVER create new documentation files unless explicitly requested by the user.
| Task | Command |
|---|---|
| Burn SRT | ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4 |
| Burn ASS | ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4 |
| Add soft sub | ffmpeg -i video.mp4 -i subs.srt -c copy -c:s mov_text output.mp4 |
| Extract sub | ffmpeg -i video.mkv -map 0:s:0 output.srt |
| Style subs | -vf "subtitles=s.srt:force_style='FontSize=24,PrimaryColour=&HFFFFFF'" |
| Text overlay | -vf "drawtext=text='Hello':x=10:y=10:fontsize=24:fontcolor=white" |
| Format | Extension | Best For |
|---|---|---|
| SRT | .srt | Simple, universal |
| ASS | .ass | Styled, animated, anime |
| VTT | .vtt | Web/HTML5 video |
Use for subtitle and caption operations:
Complete guide to working with subtitles, closed captions, and text overlays using FFmpeg.
| Format | Extension | Features | Use Case |
|---|---|---|---|
| SubRip | .srt | Simple timing + text | Universal, web |
| Advanced SubStation Alpha | .ass/.ssa | Rich styling, positioning, effects | Anime, styled subs |
| WebVTT | .vtt | Web standard, cues, styling | HTML5 video |
| TTML/DFXP | .ttml/.dfxp | Broadcast, accessibility | Streaming services |
| MOV Text | .mov (embedded) | QuickTime native | Apple ecosystem |
| DVB Subtitle | (embedded) | Bitmap-based | European broadcast |
| PGS | .sup | Blu-ray bitmap subtitles | Blu-ray |
| CEA-608/708 | (embedded) | Closed captions | US broadcast, streaming |
SRT (SubRip):
- Simple text-based format
- Supports basic HTML tags (<b>, <i>, <u>)
- Widely compatible
- No positioning or advanced styling
ASS/SSA:
- Advanced styling (fonts, colors, outlines)
- Precise positioning anywhere on screen
- Animation and effects support
- Karaoke timing
WebVTT:
- HTML5 standard format
- CSS-like styling
- Cue settings for positioning
- Speaker identification
# Burn SRT subtitles
ffmpeg -i video.mp4 -vf "subtitles=subs.srt" output.mp4
# Burn ASS/SSA subtitles (preserves styling)
ffmpeg -i video.mp4 -vf "ass=subs.ass" output.mp4
# Burn subtitles from MKV container
ffmpeg -i video.mkv -vf "subtitles=video.mkv" output.mp4
# Burn specific subtitle track (index 0)
ffmpeg -i video.mkv -vf "subtitles=video.mkv:si=0" output.mp4
# Burn subtitles with stream index
ffmpeg -i video.mkv -vf "subtitles=video.mkv:stream_index=1" output.mp4
# Force style (overrides subtitle styling)
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1'" \
output.mp4
# Yellow subtitles with black outline
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:force_style='FontSize=28,PrimaryColour=&H00FFFF,OutlineColour=&H000000,Outline=3'" \
output.mp4
# Larger font for accessibility
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:force_style='FontSize=36,Bold=1'" \
output.mp4
| Parameter | Description | Example |
|---|---|---|
| FontName | Font family | FontName=Arial |
| FontSize | Size in points | FontSize=24 |
| PrimaryColour | Text color (AABBGGRR) | PrimaryColour=&HFFFFFF |
| SecondaryColour | Karaoke color | SecondaryColour=&H00FFFF |
| OutlineColour | Outline/border color | OutlineColour=&H000000 |
| BackColour | Shadow/background color | BackColour=&H80000000 |
| Bold | Bold text (0/1) | Bold=1 |
| Italic | Italic text (0/1) | Italic=1 |
| Underline | Underlined text (0/1) | Underline=1 |
| Outline | Outline width | Outline=2 |
| Shadow | Shadow depth | Shadow=1 |
| Alignment | Position (numpad style) | Alignment=2 |
| MarginL/R/V | Margins in pixels | MarginV=50 |
ASS uses &HAABBGGRR format (Alpha, Blue, Green, Red):
- White: &HFFFFFF or &H00FFFFFF
- Black: &H000000 or &H00000000
- Yellow: &H00FFFF (00-Blue, FF-Green, FF-Red)
- Red: &H0000FF
- Blue: &HFF0000
- 50% transparent black: &H80000000
# Add SRT to MP4 (MOV text)
ffmpeg -i video.mp4 -i subs.srt \
-c copy -c:s mov_text \
output.mp4
# Add SRT to MKV
ffmpeg -i video.mp4 -i subs.srt \
-c copy -c:s srt \
output.mkv
# Add SRT to WebM (WebVTT)
ffmpeg -i video.webm -i subs.srt \
-c copy -c:s webvtt \
output.webm
# Add multiple languages
ffmpeg -i video.mp4 -i subs_en.srt -i subs_es.srt -i subs_fr.srt \
-map 0:v -map 0:a -map 1 -map 2 -map 3 \
-c copy -c:s mov_text \
-metadata:s:s:0 language=eng -metadata:s:s:0 title="English" \
-metadata:s:s:1 language=spa -metadata:s:s:1 title="Spanish" \
-metadata:s:s:2 language=fra -metadata:s:s:2 title="French" \
output.mp4
# Add ASS subtitles to MKV (preserves styling)
ffmpeg -i video.mp4 -i styled.ass \
-map 0 -map 1 \
-c copy -c:s ass \
output.mkv
# Set subtitle as default
ffmpeg -i video.mp4 -i subs.srt \
-c copy -c:s mov_text \
-disposition:s:0 default \
output.mp4
# Set forced subtitles (always display)
ffmpeg -i video.mp4 -i forced.srt \
-c copy -c:s mov_text \
-disposition:s:0 forced \
output.mp4
# Extract first subtitle track to SRT
ffmpeg -i video.mkv -map 0:s:0 output.srt
# Extract specific subtitle stream
ffmpeg -i video.mkv -map 0:s:1 output.srt
# Extract to ASS format
ffmpeg -i video.mkv -map 0:s:0 output.ass
# Extract to WebVTT
ffmpeg -i video.mkv -map 0:s:0 output.vtt
# Extract all subtitle tracks
ffmpeg -i video.mkv -map 0:s subs_%d.srt
# Show all streams including subtitles
ffprobe -v error -show_entries stream=index,codec_name,codec_type:stream_tags=language,title \
-of csv=p=0 video.mkv
# Show only subtitle streams
ffprobe -v error -select_streams s \
-show_entries stream=index,codec_name:stream_tags=language,title \
-of csv=p=0 video.mkv
# SRT to ASS
ffmpeg -i subs.srt subs.ass
# ASS to SRT (loses styling)
ffmpeg -i subs.ass subs.srt
# SRT to WebVTT
ffmpeg -i subs.srt subs.vtt
# WebVTT to SRT
ffmpeg -i subs.vtt subs.srt
# Simple text overlay
ffmpeg -i video.mp4 \
-vf "drawtext=text='Hello World':x=10:y=10:fontsize=24:fontcolor=white" \
output.mp4
# Centered text
ffmpeg -i video.mp4 \
-vf "drawtext=text='Centered Text':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white" \
output.mp4
# Text at bottom center
ffmpeg -i video.mp4 \
-vf "drawtext=text='Bottom Text':x=(w-tw)/2:y=h-th-20:fontsize=32:fontcolor=white" \
output.mp4
# Text with background box
ffmpeg -i video.mp4 \
-vf "drawtext=text='With Background':x=10:y=10:fontsize=24:fontcolor=white:box=1:boxcolor=black@0.5:boxborderw=5" \
output.mp4
# Text with outline (border)
ffmpeg -i video.mp4 \
-vf "drawtext=text='Outlined Text':x=10:y=10:fontsize=48:fontcolor=white:borderw=3:bordercolor=black" \
output.mp4
# Shadow effect
ffmpeg -i video.mp4 \
-vf "drawtext=text='Shadow Text':x=10:y=10:fontsize=36:fontcolor=white:shadowcolor=black:shadowx=3:shadowy=3" \
output.mp4
# Custom font
ffmpeg -i video.mp4 \
-vf "drawtext=text='Custom Font':fontfile=/path/to/font.ttf:fontsize=24:fontcolor=white" \
output.mp4
# Display timestamp
ffmpeg -i video.mp4 \
-vf "drawtext=text='%{pts\:hms}':x=10:y=10:fontsize=24:fontcolor=white" \
output.mp4
# Display frame number
ffmpeg -i video.mp4 \
-vf "drawtext=text='Frame\: %{n}':x=10:y=10:fontsize=24:fontcolor=white" \
output.mp4
# Display current date/time
ffmpeg -i video.mp4 \
-vf "drawtext=text='%{localtime\:%Y-%m-%d %H\\\:%M\\\:%S}':x=10:y=10:fontsize=24:fontcolor=white" \
output.mp4
# Text from file
ffmpeg -i video.mp4 \
-vf "drawtext=textfile=message.txt:x=10:y=10:fontsize=24:fontcolor=white:reload=1" \
output.mp4
# Scrolling text (credits)
ffmpeg -i video.mp4 \
-vf "drawtext=textfile=credits.txt:x=(w-tw)/2:y=h-t*50:fontsize=24:fontcolor=white" \
output.mp4
# Fade in text
ffmpeg -i video.mp4 \
-vf "drawtext=text='Fade In':x=(w-tw)/2:y=(h-th)/2:fontsize=48:fontcolor=white:alpha='if(lt(t,1),t,1)'" \
output.mp4
# Text appears at specific time (3 seconds)
ffmpeg -i video.mp4 \
-vf "drawtext=text='Appears at 3s':x=10:y=10:fontsize=24:fontcolor=white:enable='gte(t,3)'" \
output.mp4
# Text visible between 2-5 seconds
ffmpeg -i video.mp4 \
-vf "drawtext=text='Visible 2-5s':x=10:y=10:fontsize=24:fontcolor=white:enable='between(t,2,5)'" \
output.mp4
# Blinking text
ffmpeg -i video.mp4 \
-vf "drawtext=text='BLINK':x=10:y=10:fontsize=36:fontcolor=red:alpha='if(mod(floor(t*2),2),1,0)'" \
output.mp4
FFmpeg 8.0 integrates OpenAI's Whisper speech recognition model via whisper.cpp, enabling fully local and offline automatic transcription and subtitle generation.
# Generate subtitles from video using Whisper
ffmpeg -i video.mp4 -vn \
-af "whisper=model=ggml-base.bin:language=auto:queue=3:destination=output.srt:format=srt" \
-f null -
# Specify language for better accuracy
ffmpeg -i video.mp4 -vn \
-af "whisper=model=ggml-base.bin:language=en:destination=english.srt:format=srt" \
-f null -
# Use larger model for higher quality
ffmpeg -i video.mp4 -vn \
-af "whisper=model=ggml-medium.bin:language=auto:destination=output.srt:format=srt" \
-f null -
# Live transcription from microphone (Linux PulseAudio)
ffmpeg -loglevel warning -f pulse -i default \
-af "highpass=f=200,lowpass=f=3000,whisper=model=ggml-medium-q5_0.bin:language=en:queue=10:destination=-:format=json:vad_model=for-tests-silero-v5.1.2-ggml.bin" \
-f null -
# Live transcription from microphone (Windows)
ffmpeg -loglevel warning -f dshow -i audio="Microphone" \
-af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \
-f null -
# Live transcription from microphone (macOS)
ffmpeg -loglevel warning -f avfoundation -i ":0" \
-af "whisper=model=ggml-base.bin:language=en:queue=10:destination=-:format=text" \
-f null -
# Whisper writes to frame metadata, drawtext reads it
ffmpeg -i input.mp4 \
-af "whisper=model=ggml-base.en.bin:language=en" \
-vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=24:fontcolor=white:x=10:y=h-th-10:box=1:boxcolor=black@0.5:boxborderw=5" \
output_with_subtitles.mp4
# Centered subtitles with styling
ffmpeg -i input.mp4 \
-af "whisper=model=ggml-base.bin:language=auto" \
-vf "drawtext=text='%{metadata\:lavfi.whisper.text}':fontsize=32:fontcolor=white:x=(w-tw)/2:y=h-th-50:borderw=2:bordercolor=black" \
output.mp4
# Generate JSON for custom processing
ffmpeg -i video.mp4 -vn \
-af "whisper=model=ggml-base.bin:language=auto:destination=output.json:format=json" \
-f null -
# Use Silero VAD for better speech detection
ffmpeg -i video.mp4 -vn \
-af "whisper=model=ggml-base.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=for-tests-silero-v5.1.2-ggml.bin:vad_threshold=0.5" \
-f null -
# VAD parameters for noisy audio
ffmpeg -i noisy_video.mp4 -vn \
-af "whisper=model=ggml-medium.bin:language=en:queue=20:destination=output.srt:format=srt:vad_model=silero.bin:vad_threshold=0.6:vad_min_speech_duration=0.2:vad_min_silence_duration=0.3" \
-f null -
# Force CPU processing (slower but works without GPU)
ffmpeg -i video.mp4 -vn \
-af "whisper=model=ggml-base.bin:language=en:use_gpu=false:destination=output.srt:format=srt" \
-f null -
| Model | Size | Speed | Quality | VRAM | Recommended For |
|---|---|---|---|---|---|
| tiny | 39 MB | Fastest | Basic | ~1 GB | Quick previews, low-resource |
| base | 74 MB | Fast | Good | ~1 GB | General use, balanced |
| small | 244 MB | Medium | Better | ~2 GB | Higher accuracy |
| medium | 769 MB | Slow | High | ~5 GB | Quality-critical |
| large | 1.55 GB | Slowest | Best | ~10 GB | Maximum accuracy |
| Parameter | Description | Default |
|---|---|---|
model | Path to GGML model file | Required |
language | Language code or "auto" | "auto" |
format | Output format: "text", "srt", "json" | "text" |
destination | Output file path or "-" for stdout | Required |
queue | Buffer size in seconds | 3 |
vad_model | Path to Silero VAD model | None |
vad_threshold | VAD sensitivity (0-1) | 0.5 |
vad_min_speech_duration | Min speech duration (seconds) | 0.1 |
vad_min_silence_duration | Min silence duration (seconds) | 0.5 |
use_gpu | Enable GPU acceleration | true |
Download GGML models from the whisper.cpp project:
# Download base model
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
# Download medium model for better quality
curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin
# Extract CEA-608 captions from ATSC stream
ffmpeg -f lavfi -i "movie=broadcast.ts[out0+subcc]" -map 0:1 captions.srt
# Extract from video with embedded CC
ffmpeg -i video_with_cc.mp4 \
-filter_complex "[0:v]format=yuv420p[v];[0:v]crop=1:1:0:0[c]" \
-map "[c]" -c:v libx264 -f null - 2>&1 | grep -A 1 "Closed caption"
# Embed CEA-608 captions (requires eia608 line)
ffmpeg -i video.mp4 -i captions.scc \
-c:v libx264 -c:a copy \
-vf "movie=captions.scc[captions];[0:v][captions]overlay" \
output.mp4
7 (top-left) 8 (top-center) 9 (top-right)
4 (mid-left) 5 (mid-center) 6 (mid-right)
1 (bottom-left) 2 (bottom-center) 3 (bottom-right)
# Top center subtitles
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:force_style='Alignment=8,MarginV=20'" \
output.mp4
# Left-aligned subtitles
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:force_style='Alignment=1,MarginL=50,MarginV=30'" \
output.mp4
# Right side for speaker identification
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:force_style='Alignment=3,MarginR=50'" \
output.mp4
# Two subtitle tracks at different positions
ffmpeg -i video.mp4 \
-vf "[in]subtitles=speaker1.srt:force_style='Alignment=1,MarginL=50'[tmp];\
[tmp]subtitles=speaker2.srt:force_style='Alignment=3,MarginR=50'" \
output.mp4
#!/bin/bash
# burn_subs_batch.sh
for video in *.mp4; do
base="${video%.mp4}"
if [ -f "${base}.srt" ]; then
ffmpeg -i "$video" \
-vf "subtitles=${base}.srt:force_style='FontSize=24,Outline=2'" \
-c:a copy \
"output/${base}_subbed.mp4"
fi
done
#!/bin/bash
# convert_subs.sh
for srt in *.srt; do
base="${srt%.srt}"
ffmpeg -i "$srt" "${base}.vtt"
done
"Unable to find a suitable output format"
# Specify output format explicitly
ffmpeg -i video.mkv -map 0:s:0 -f srt output.srt
Garbled characters in subtitles
# Force UTF-8 encoding
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:charenc=UTF-8" \
output.mp4
Font not found
# Specify fonts directory
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:fontsdir=/path/to/fonts" \
output.mp4
# List available fonts
fc-list : family | sort | uniq
Subtitle timing offset
# Delay subtitles by 2 seconds
ffmpeg -i video.mp4 \
-vf "subtitles=subs.srt:itsoffset=2" \
output.mp4
# Or use setpts to adjust
ffmpeg -i subs.srt -itsoffset 2 delayed.srt
Subtitle not showing on high-res video
# Scale subtitle rendering to video resolution
ffmpeg -i video_4k.mp4 \
-vf "subtitles=subs.srt:force_style='FontSize=48,Outline=3'" \
output.mp4
# Check if subtitles are present
ffprobe -v error -select_streams s -show_entries stream=codec_name -of default=nw=1 video.mp4
# Count subtitle lines
grep -c "^[0-9]" subs.srt
# Validate SRT format
ffmpeg -i subs.srt -f null -
This guide covers FFmpeg subtitle and caption operations. For text overlays with shapes and graphics, see the shapes-graphics skill.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.