Skill

video-ocr

Extract text from video files by sampling frames and running Apple Vision OCR, with optional perceptual deduplication

npx claudepluginhub varunr89/claude-marketplace --plugin ocr-toolkit

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ocr-toolkit:video-ocr

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Extracts frames from a video at a configurable FPS using ffmpeg, optionally deduplicates visually similar frames using perceptual hashing (aHash), then runs Apple Vision OCR on each kept frame in parallel.

SKILL.md

62 lines · ~664 tokens

Similar Skills

video-content-extractor

40.4k

Extracts key frames from MP4 videos at configurable intervals, runs Tesseract OCR, and generates structured Markdown reports with video metadata and timestamped text transcripts.

antigravity-awesome-skills

lecture

Extracts transcript and key slides from a local video file using mlx-whisper, then creates a vault-formatted lecture note with embedded screenshots. Works with any language.

2 files7 tools

obsidian-vault-agent

Slides

Extracts key frames from videos and animated images (GIF, APNG, WebP) into a viewable timeline using peepshow (ffmpeg). Also reads audio transcripts and metadata.

peepshow

Stats

Parent stars0

MaintenanceGood

Last CommitMar 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Video OCR

When to use

Use this skill when the user wants to extract text from a video -- for example lecture recordings, tutorial screencasts, or presentation recordings where on-screen text changes over time.

Usage

python3 ${CLAUDE_PLUGIN_ROOT}/scripts/video_ocr.py <video_file> \
  [-o output.jsonl] \
  [--fps 3.0] \
  [--workers 8] \
  [--dedupe] [--dedupe-threshold 0.15] [--hash-size 8] \
  [--frames-out <dir>] \
  [--markdown <output.md>] [--markdown-images]

Key arguments

Argument	Default	Description
`video` (positional)	required	Input video file path
`-o, --output`	`ocr_output.jsonl`	Output JSONL file path
`--fps`	3.0	Frames per second to extract
`--workers`	8	Number of parallel OCR workers
`--dedupe`	false	Deduplicate visually similar frames
`--dedupe-threshold`	0.15	Max visual difference ratio (0-1) to treat as similar
`--hash-size`	8	Perceptual hash size (hash_size x hash_size bits)
`--frames-out`	None	Directory to save kept frames as JPGs
`--markdown`	None	Optional Markdown output path
`--markdown-images`	false	Embed frame images in Markdown (requires --frames-out)

Pipeline

ffmpeg extracts frames from the video at the specified FPS
If --dedupe is enabled, frames are compared using average perceptual hash (aHash) and similar consecutive frames are dropped
Apple Vision OCR runs in parallel on all kept frames
Results are sorted by frame number and written to JSONL (one JSON object per frame with frame, time_sec, and text fields)
Optionally, a Markdown file is generated with frame headings and OCR text blocks

Deduplication

The deduplication feature uses average perceptual hashing (aHash):

Each frame is resized to hash_size x hash_size grayscale
Pixels above the mean are mapped to 1, below to 0
Consecutive frames whose Hamming distance ratio is below --dedupe-threshold are dropped
This efficiently removes near-duplicate frames (e.g., static slides)

Dependencies

ffmpeg (brew install ffmpeg) -- frame extraction
macOS + PyObjC (pip install pyobjc-core pyobjc-framework-Vision pyobjc-framework-Cocoa) -- Vision OCR
Pillow (pip install pillow) -- required for --dedupe

video-ocr

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

video-ocr

Invocation

Context Preview

SKILL.md

Video OCR

When to use

Usage

Key arguments

Pipeline

Deduplication

Dependencies

Similar Skills

Help us improve

Video OCR

When to use

Usage

Key arguments

Pipeline

Deduplication

Dependencies