From sort
Transcribe, summarize, and categorize downloaded videos — defaults to the folder Claude Code was launched in. Trigger on `/sort-videos`, when the user references a video file (yt-dlp output, Instagram reels, TikTok, YouTube downloads, conference talks, lectures), or asks to "transcribe this video", "summarize this talk", "make notes from this reel", "save this talk for listening later", "what's in this video", or "reprocess a video" — even when they don't say "sort". Transcribes with whisper-cpp, optionally OCRs frames for on-screen text, enriches from Instagram oEmbed captions, detects talks/lectures for an extended summary format, exports a tagged MP3 for talks, renames with a content-derived slug, moves into `<target>/AI Library/<topic>/` (created if absent), and writes a companion `.md` summary. Pass a file path or glob to reprocess a specific video.
npx claudepluginhub tal/plugin-marketplace --plugin sortThis skill is limited to using the following tools:
Arguments passed: `$ARGUMENTS`
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Arguments passed: $ARGUMENTS
The target folder is wherever the videos live and where the AI Library should be created. Resolution order:
$ARGUMENTS is a directory path, that's the target folder (search videos inside it).$ARGUMENTS is a file path or glob, the target folder is the parent of those files.The skill is location-agnostic: it works on ~/Downloads, ~/Desktop, a project folder, or anywhere else.
If arguments were provided (a file path, glob pattern, or filename):
<target>/AI Library/)*.mp4, <topic>/*.webm)<target>/AI Library/ — that's fine for re-runs. Process it in place..md file alongside it, overwriting any existing one.If no arguments were provided (batch mode):
Find video files at the root of the target folder and in <target>/Recents/ if it exists. Match any common video extension: .mp4, .webm, .mkv, .avi, .mov, .flv, .m4v, .ts, .wmv. Common yt-dlp naming pattern: <Platform> - <title> [<id>].<ext> (e.g., Instagram - Video by stuartbrazell [DWM02r5EtFq].mp4). Also match the legacy pattern Video by*.<ext> for older downloads. Do NOT include files already inside <target>/AI Library/.
Create <target>/AI Library/ if it doesn't exist before moving anything: mkdir -p "<target>/AI Library".
For each video found, do the following:
ffmpeg -i "<video>" -ar 16000 -ac 1 -c:a pcm_s16le /tmp/<slug>.wav -ywhisper-cli -m /tmp/ggml-base.bin -f /tmp/<slug>.wav --no-timestampscurl -L -o /tmp/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"For each video, use AskUserQuestion to ask the user whether they want to perform OCR on this specific video. Present these options:
If the user says yes to either OCR mode:
Extract frames using the bundled script:
bash "${CLAUDE_PLUGIN_ROOT}/scripts/extract-frames.sh" "<video>" "/tmp/<slug>_frames" 2
The 2 is the interval in seconds (1 frame every 2 seconds). The script outputs the frames directory path.
Launch the video-ocr agent with the frames directory and the selected mode (text-only or text-and-products):
Analyze the frames in /tmp/<slug>_frames/ for on-screen content.
Mode: <text-only|text-and-products>
The agent returns structured markdown. Merge its output into the final .md file under an ## On-Screen Text section (after the transcription content).
Clean up frames: rm -rf /tmp/<slug>_frames/
After transcribing, extract the video ID from the filename (the [XXX] part before the extension) and the platform prefix (e.g., Instagram, TikTok, Youtube).
For Instagram videos, fetch the post caption via the oEmbed API:
curl -s "https://www.instagram.com/api/v1/oembed/?url=https://www.instagram.com/reel/<VIDEO_ID>/" | jq -r '.title'
For other platforms, skip the oEmbed step and rely on the transcription.
Use the caption to enrich the markdown in the following cases:
If the oEmbed request fails or returns no useful caption, proceed with just the transcription.
Classify each video as either short-form or talk-or-lecture. Talks/lectures get a more extensive summary in the markdown file (see step 6).
Use two signals — length and content — and combine them:
Length signal — get the duration with ffprobe:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "<video>"
< 5 min: almost certainly short-form. Classify as short-form without asking.5-15 min: ambiguous. Use content signal to decide; ask if still unsure.> 15 min: likely a talk/lecture. Use content signal to confirm; ask if content signal disagrees.Content signal — look at the transcription for lecture/talk patterns:
Strong short-form signals that override length: recipe walkthrough, product haul, skit, vlog, reaction, tutorial < 10 steps, gameplay clip.
When to ask the user — if the signals disagree or are weak (e.g., a 12-minute cooking video that happens to be monologue-style, or a 20-minute podcast clip that isn't really a talk), use AskUserQuestion with options:
Do NOT ask when the classification is obvious (a 2-minute Instagram reel, or a clearly labeled conference talk from YouTube). Only ask when genuinely uncertain.
Based on the transcription (and OCR / caption enrichment, if collected), determine the best topic folder under <target>/AI Library/.
Discover existing folders at runtime — never hardcode topic names. Each folder builds up its own taxonomy across runs, and a folder list that ships with the skill would be wrong for everyone:
ls -1 "<target>/AI Library/" 2>/dev/null | grep -vE '^(Review|_|\.)'
Use that list as the candidate set. Reuse an existing folder whenever the content plausibly fits — that's how the user's taxonomy stays coherent over time, and minor variation ("Productivity" vs. "Self-Improvement") will fragment the library if you're too eager to create new folders.
Only create a new folder when nothing existing fits. Pick a descriptive 1-3 word Title Case name (e.g., Cooking, Woodworking, Personal Finance). Avoid hyper-specific names that won't catch future videos — prefer broad-but-clear topic names.
For videos classified as talk-or-lecture in step 4: pick a subject folder when one obviously matches the topic (a tech talk → tech-related folder if the user has one). Otherwise default to a folder named for educational content if one already exists, or fall back to creating one named Talks & Lectures only if the user has nothing similar yet.
For new videos (not yet in AI Library):
<brief-description> - <platform> - <creator name> [<video ID>].<original-ext> (e.g., sesame-chicken-recipe - Instagram - Video by louishowardpt [DVs_UEwiIx9].mp4). For legacy files without a platform prefix, use Instagram as the default. Keep the description short (2-5 words, lowercase, hyphenated).<target>/AI Library/ if it doesn't exist.For re-runs (file already in AI Library):
.md file with fresh content.In both cases, create a matching .md file with the same base name. The structure depends on the classification from step 4:
For short-form videos:
# Video by <creator name>## On-Screen Text section with the extracted visual contentFor talk-or-lecture videos — write an extended summary that a reader could skim in 2-3 minutes to get most of the value:
# <Talk title> — <Speaker name> (use the best available title from the caption, transcription intro, or filename)## TL;DR section: 2-4 sentences stating the thesis and the main conclusion## Outline section: a bulleted list of the major sections/topics in order, each with a one-line description## Key Points section: the 5-10 most important arguments, claims, or ideas, each as a short paragraph with context## Notable Quotes section: 2-5 direct quotes worth remembering, with speaker attribution if there are multiple speakers## Terms & Concepts section (if applicable): definitions of jargon, frameworks, or named concepts introduced in the talk## References section (if applicable): books, papers, people, tools, or prior work cited by the speaker## Q&A Highlights section (if a Q&A is present): the most interesting exchanges, summarized## Takeaways section: 3-5 bullets of actionable or memorable conclusions for the viewer## On-Screen Text section with the extracted visual content (slide text is especially valuable for talks)Do not dump the raw transcript in the talk/lecture format — the extended summary replaces it.
For videos classified as talk-or-lecture, also export a tagged MP3 alongside the video and markdown so the talk can be listened to later (e.g., on a phone or in a podcast app). Skip this step for short-form videos.
Output path: same folder and base name as the video, with .mp3 extension (e.g., <brief-description> - <platform> - <creator name> [<video ID>].mp3).
Use ffmpeg to extract and tag in a single pass:
ffmpeg -i "<video>" -vn -acodec libmp3lame -b:a 128k \
-metadata title="<talk title>" \
-metadata artist="<speaker name>" \
-metadata album_artist="<speaker name>" \
-metadata album="<series, conference, or platform>" \
-metadata genre="Speech" \
-metadata date="<YYYY if known>" \
-metadata comment="<one-line TL;DR from the markdown>" \
-id3v2_version 3 \
"<output>.mp3" -y
Populate the tags from the data already gathered:
title — the talk title (same as the H1 in the markdown, without the speaker suffix)artist / album_artist — the speaker's name (fall back to the creator/channel name from the filename if unknown)album — the conference, lecture series, podcast, or platform (e.g., YouTube, TED, Strange Loop 2024). Fall back to the platform prefix from the filename.genre — Speech (use Podcast if the source is clearly a podcast episode)date — the year, if it can be inferred from the caption or transcription; otherwise omitcomment — the TL;DR text from the markdown, trimmed to a single lineIf ffmpeg fails (e.g., corrupt audio), log the failure and continue with the rest of the pipeline — the markdown summary is still the primary output.
When finished, print a summary table of all videos processed:
| Video | Type | Topic Folder | Outputs | Summary |
|---|
Type — short-form or talk-or-lectureOutputs — indicate md for every video, and md + mp3 for talks/lecturesAlso list any videos that were skipped (blank audio, errors, etc.).