Help us improve
Share bugs, ideas, or general feedback.
From peepshow
Extracts key frames from videos and animated images (GIF, APNG, WebP) into a viewable timeline using peepshow (ffmpeg). Also reads audio transcripts and metadata.
npx claudepluginhub t0mtaylor/peepshow --plugin peepshowHow this skill is triggered — by the user, by Claude, or both
Slash command
/peepshow:slidesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Static images (JPG/PNG/static WebP) can be read natively. **This skill is only for video and animated images** — anything that has multiple frames across time. It uses `peepshow` (ffmpeg under the hood) to extract a sequence of relevant still frames so they can be viewed as a timeline.
Analyzes video files or YouTube URLs: extracts frames/audio, detects scenes/motion/silence/transitions via ffmpeg tools with structured workflow.
Downloads videos from YouTube, Instagram, X/Twitter, Vimeo, TikTok, or local paths, extracts frames and transcripts (via captions or on-device mlx-whisper), and lets Claude answer questions about the video content.
Decomposes videos into meaningful keyframes using ffmpeg scene detection filter. Extracts images on scene changes (threshold 0.01), timestamps from logs, supports MP4/MOV/WEBM/AVI/MKV. Adjusts sensitivity; warns on low frame counts indicating static videos.
Share bugs, ideas, or general feedback.
Static images (JPG/PNG/static WebP) can be read natively. This skill is only for video and animated images — anything that has multiple frames across time. It uses peepshow (ffmpeg under the hood) to extract a sequence of relevant still frames so they can be viewed as a timeline.
The user may provide the video (or animated image — GIF, APNG, animated WebP) in any of these forms:
/Volumes/share/...)http:// or https:// URLdata:video/...;base64,... or data:image/(gif|apng|webp|png);base64,... URI pasted into the prompt- to read bytes from stdin (you would pipe via Bash)If $ARGUMENTS contains the video reference, use it directly. Otherwise ask the user to share one of the above.
Run the CLI with the Bash tool. JSON is the most reliable output for parsing:
PEEPSHOW_CLIENT=claude-code PEEPSHOW_SESSION="${CLAUDE_SESSION_ID:-}" peepshow "$ARGUMENTS" --emit json
The PEEPSHOW_CLIENT + PEEPSHOW_SESSION env vars tag the run in the manifest and the access log so a shared peepshow serve instance can attribute every run + HTTP call back to the right Claude Code session. Both are optional — peepshow runs fine without them — but setting them costs nothing.
(If the path contains spaces, quote it. Use --emit paths if you prefer reading the human-readable list.)
Parse the output. In JSON mode, frames[].path is the ordered list of absolute paths. The video object gives you container, codec, resolution, fps, duration, and file size — useful context for the user's question without extra prompting. The video.tags object carries container-level metadata embedded in the file (title, artist, album_artist, director, producer, publisher, copyright, genre, description, creation_time, show, episode_id, season_number, etc.) — use it to ground your answer in what the video says it is before describing what you see. The extraction object tells you which strategy (scene vs fps) was used, how many frames were pruned, how many were dropped by the perceptual-hash dedup pass (framesDeduped + dedupDistance), and a coarse motion signal across the kept frames (motionSignalAvg numeric + motionSignalLevel low/medium/high). Use the motion signal to colour your narration (e.g. "rapid action segment" vs "near-static timelapse"); combine with framesDeduped == 0 + motionSignalLevel == "high" to recognise a high-information clip where the LLM should pay close attention to every frame.
Read the audio transcript, if present. When the input had an audio track, the JSON payload's audio object has:
audio.path — the extracted audio.m4a on disk.audio.durationSeconds, audio.codec, audio.peakDbfs, audio.silenceRatio — audio metadata.audio.transcript — when transcription ran (whisper.cpp or a cloud provider), this contains text (full concatenated transcript) and segments[] (each with start / end / text in seconds). Use the transcript to understand what was said in addition to what was shown. For conversational or narration-heavy videos the transcript is often more informative than the frames alone. Cross-reference segment timestamps against frame indices so you can quote who said what when.If audio.path === null the input had no audio track (GIF, APNG, animated WebP, silent video). Skip the audio step cleanly.
If audio.transcript === null or audio.transcript.skippedReason is set, transcription didn't run — just work with the frames.
Read each frame as an image with the Read tool, in order. They are named frame_0001.jpg, frame_0002.jpg, etc. and represent the timeline from earliest to latest.
Answer the user's question using both the frames and the transcript together. Reference timestamps when helpful (derive from video.durationSeconds, frame ordering, and transcript segments[].start). A useful pattern for longer clips: summarise the visual timeline in 2-3 beats, then weave in direct quotes from the transcript to ground what was said at those moments.
Annotate the report — MANDATORY. The JSON payload from step 1 includes an annotate block with the exact command. Without this step the manifest stays empty and the run shows up as no-analysis on the /runs page. Pipe a JSON object with your summary and perFrame covering every frame back into peepshow:
echo '{"summary":"<2-4 sentences describing the timeline>","perFrame":[{"idx":0,"text":"<frame 0 caption>"},{"idx":1,"text":"<frame 1 caption>"},{"idx":2,"text":"<frame 2 caption>"},…,{"idx":N-1,"text":"<frame N-1 caption>"}],"provider":"claude-code","model":"claude-opus-4-7"}' \
| peepshow report annotate "$OUTPUT_DIR"
$OUTPUT_DIR is the run's outputDir (the JSON payload's outputDir field — typically a /tmp/peepshow-... path). The annotate step rewrites manifest.json and report.html atomically; users opening the HTML now see your synthesis under the "LLM analysis" section.
perFrame.length MUST equal frames.length. Every extracted frame gets its own caption — no skipping, no key-beats-only summaries. The /runs page surfaces sparse coverage as partial-captions; agents that ship sparse perFrame are doing it wrong. peepshow logs a warning to stderr when sparse uploads are detected (and rejects them with --strict).
Run this for every invocation, including ones that didn't end with a question — the manifest is the durable record of what you understood. Only skip when the user explicitly asks for raw frames with no synthesis.
Pass these after the input when the defaults are not right:
--max 20 — cap the number of frames returned (default 40)--min 6 — ensure at least this many (falls back from scene detection to fps sampling)--threshold 0.2 — more sensitive scene detection (default 0.3; lower = more frames)--fps 0.5 — skip scene detection and sample at a fixed rate--width 960 — max output width in pixels (default 1280)--format png — output PNG instead of JPG--stats full — include the full stats block (video + extraction details)--stats off — suppress stats entirely if context is tightffmpeg not found: run npm install inside the plugin directory once to fetch the bundled ffmpeg binary.--threshold or force --fps 1.--threshold or lower --max.