Skill

letmewatch

Extracts key frames from videos using ffmpeg scene detection, transcribes audio with optional whisper, for analyzing screen recordings, bug reports, tutorials, and demos.

Python

FFmpeg

developer-tools

npx claudepluginhub binyamineden/letmewatch --plugin letmewatch

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Analyze video content by extracting key frames using ffmpeg scene detection and viewing them as images.

Supporting Assets

video-extract.pyvideo-extract.sh

SKILL.md

Similar Skills

video-decompose

Decomposes videos into meaningful keyframes using ffmpeg scene detection filter. Extracts images on scene changes (threshold 0.01), timestamps from logs, supports MP4/MOV/WEBM/AVI/MKV. Adjusts sensitivity; warns on low frame counts indicating static videos.

1 file

reviw-plugin

video-perception

560

Analyzes video files or YouTube URLs: extracts frames/audio, detects scenes/motion/silence/transitions via ffmpeg tools with structured workflow.

claude-video-vision

Video Frame Review Skill

Extracts video frames with ffmpeg and reviews them via Claude's Read to verify UI flows, detect errors, stuck states, and test outcomes in videos.

claude-commands

Stats

Stars5

Forks1

Last CommitMar 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Let Me Watch

Analyze video content by extracting key frames using ffmpeg scene detection and viewing them as images.

When to Activate

User asks you to watch, review, or analyze a video
User shares a video file path (.mp4, .mov, .mkv, .webm, .avi)
User wants feedback on a screen recording or bug report video
User runs /letmewatch:video, /letmewatch:video-last, or /letmewatch:video-dir

Prerequisites

ffmpeg (required): brew install ffmpeg (macOS) or apt install ffmpeg (Linux)
whisper (optional, for audio): pip install openai-whisper or pip install mlx-whisper

How It Works

The extraction script uses ffmpeg scene detection to find frames where visuals change significantly (like LogRocket focusing on user interactions)
Each frame is timestamped (frame_01m23s.jpg) so you can reference specific moments
Frames are resized to 720p JPEG to stay within context limits
Audio is transcribed if whisper is installed
You view frames in batches of 8, building a complete understanding of the video

Processing a Video

Step 1: Extract frames

Run the extraction script bundled with this skill:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/letmewatch/video-extract.py "<video_path>"

Read the output to find:

WORK_DIR — where frames are stored
TOTAL_FRAMES — how many frames were extracted
TRANSCRIPT — path to audio transcript (or "none")
FRAMES — list of frame file paths

Step 2: Read transcript (if available)

If TRANSCRIPT is not "none", read the transcript file first for audio/narration context.

Step 3: View frames in batches

Read frames in batches of 8 using the Read tool (all 8 in parallel). For each batch:

Note the timestamp in each filename (e.g., frame_00m23s.jpg = 0 minutes 23 seconds)
Describe what you observe: UI state, user interactions, changes between frames, errors

Step 4: Synthesize

After viewing all frames, provide a timestamped summary. Tailor your response:

Bug/UI review: Identify the issue, when it occurs, suggest fixes
Screen recording: Describe the workflow and any issues spotted
Tutorial/walkthrough: Summarize concepts and key takeaways
General video: Describe content and answer questions

Step 5: Cleanup

Remove the temp directory:

rm -rf <WORK_DIR>

Important Notes

Always reference timestamps: "At 01m15s, the error dialog appears"
If scene detection yields few frames, the script auto-falls back to interval extraction
The scene detection threshold is 0.1 by default (catches most UI changes)
Max 40 frames per video, batched in groups of 8
If frames look similar, note the video was static in that range