Help us improve
Share bugs, ideas, or general feedback.
From claude-commands
Analyzes video files by extracting frames, describing segments, and burning captions as PNG overlays using PIL and ffmpeg. Useful for adding visual descriptions or evidence annotations.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:video-captionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze a video by extracting frames at intervals, describe each segment visually, then burn captions at the top of the video using PIL-rendered PNG overlays and ffmpeg's `overlay` filter. This approach avoids dependency on ffmpeg's `libass`/`drawtext` filters, which are often absent in Homebrew builds.
Extracts key frames from videos and animated images (GIF, APNG, WebP) into a viewable timeline using peepshow (ffmpeg). Also reads audio transcripts and metadata.
Extracts key frames from MP4 videos at configurable intervals, runs Tesseract OCR, and generates structured Markdown reports with video metadata and timestamped text transcripts.
Edits, processes, and renders videos using FFmpeg and Remotion. Handles stitching clips, transitions, captions, teasers, transcription via Whisper, and analysis with ffprobe.
Share bugs, ideas, or general feedback.
Analyze a video by extracting frames at intervals, describe each segment visually, then burn captions at the top of the video using PIL-rendered PNG overlays and ffmpeg's overlay filter. This approach avoids dependency on ffmpeg's libass/drawtext filters, which are often absent in Homebrew builds.
When the video is evidence for a PR:
.mp4, H.264).vtt or .srt) for gist packagingffprobe -v quiet -print_format json -show_format -show_streams "video.mov" | python3 -c "
import json, sys
d = json.load(sys.stdin)
fmt = d['format']
vs = [s for s in d['streams'] if s['codec_type'] == 'video'][0]
has_audio = any(s['codec_type'] == 'audio' for s in d['streams'])
print(f'Duration: {float(fmt[\"duration\"]):.1f}s, {vs[\"width\"]}x{vs[\"height\"]} @ {vs[\"r_frame_rate\"]}fps, audio={has_audio}')
"
Note: track whether audio exists — it affects the final ffmpeg command.
mkdir -p /tmp/video_frames
ffmpeg -i "video.mov" -vf "fps=1/10,scale=1280:-1" -q:v 3 /tmp/video_frames/frame_%03d.jpg -y
One frame per 10 seconds is sufficient. Adjust fps=1/N for longer videos.
Use the Read tool on each frame image — Claude can view them directly. Read 4 at a time in parallel. Build a timeline:
0-10s: What's happening in the UI/screen/scene
10-20s: What changed, what action is occurring
...
from PIL import Image, ImageDraw, ImageFont
import textwrap, os
VIDEO_W = 3354 # match your video width
FONT_SIZE = 62
PADDING = 40
FONT = '/System/Library/Fonts/Helvetica.ttc' # macOS
# captions: list of (start_sec, end_sec, text)
captions = [
(0, 10, "Description of what's happening..."),
...
]
font = ImageFont.truetype(FONT, FONT_SIZE)
os.makedirs('/tmp/caps', exist_ok=True)
for i, (start, end, text) in enumerate(captions):
avg_char_w = FONT_SIZE * 0.55
max_chars = int((VIDEO_W - 2 * PADDING) / avg_char_w)
lines = textwrap.wrap(text, width=max_chars)
line_h = FONT_SIZE + 10
cap_h = line_h * len(lines) + 2 * PADDING
img = Image.new('RGBA', (VIDEO_W, cap_h), (0, 0, 0, 170)) # semi-transparent black
draw = ImageDraw.Draw(img)
y = PADDING
for line in lines:
draw.text((PADDING, y), line, font=font, fill=(255, 255, 255, 255))
y += line_h
img.save(f'/tmp/caps/cap_{i:02d}_{start}_{end}.png')
import subprocess
# Build inputs and filtergraph
cmd = ['ffmpeg', '-i', 'video.mov']
for start, end, path in [(s, e, f'/tmp/caps/cap_{i:02d}_{s}_{e}.png') for i, (s, e, _) in enumerate(captions)]:
cmd += ['-i', path]
n = len(captions)
prev = '0:v'
parts = []
for i, (start, end, _) in enumerate(captions):
out = f'v{i+1}' if i < n - 1 else 'vout'
parts.append(f"[{prev}][{i+1}:v]overlay=0:0:enable='between(t,{start},{end})'[{out}]")
prev = out
cmd += ['-filter_complex', ';'.join(parts), '-map', '[vout]']
# Only add audio map if audio stream exists
if has_audio:
cmd += ['-map', '0:a', '-c:a', 'copy']
cmd += ['-c:v', 'libx264', '-crf', '18', '-preset', 'fast', 'output captioned.mov', '-y']
subprocess.run(cmd, check=True, timeout=600)
| Step | Tool | Key detail |
|---|---|---|
| Inspect | ffprobe | Get duration, resolution, audio presence |
| Extract frames | ffmpeg fps=1/10 | 1 frame/10s is enough |
| Analyze | Read tool | Read 4 frames in parallel |
| Render captions | PIL Image.new('RGBA') | Semi-transparent: alpha=170 |
| Burn in | ffmpeg overlay + enable='between(t,s,e)' | Chain all overlays in one filtergraph |
-map 0:a without audio → ffmpeg errors. Check streams first.subtitles= or ass= filter → Often absent in Homebrew ffmpeg (no libass). Use PIL+overlay instead.drawtext filter → Also absent if ffmpeg lacks freetype. Same fallback.textwrap.wrap with width = (video_w - 2*padding) / (font_size * 0.55)./System/Library/Fonts/Helvetica.ttc
/System/Library/Fonts/SFNS.ttf
/System/Library/Fonts/SFNSMono.ttf