From brain-os
Transcribes YouTube, podcast, and audio URLs to clean verbatim text using auto-captions (default) or whisper-cpp with Silero VAD fallback. Outputs transcript and metadata files for accurate quotes in video research.
npx claudepluginhub sonthanh/brain-os-pluginThis skill uses the workspace's default tool permissions.
**Always before any research/finding extraction from a video or audio source.** Aggregator articles and same-cycle summaries hallucinate quotes — they conflate quotes from different talks by the same person. The verbatim transcript is the only trustable primary source.
Transcribes audio/video from YouTube URLs or local files to structured markdown with timestamps, speaker labels, and chapters using Google Gemini API.
Transcribes YouTube videos and local audio/video files with speaker diarization via ElevenLabs. Outputs speaker-labeled transcripts, timings, and metadata for LLM analysis.
Extracts transcripts from YouTube videos via youtube-transcript-api Python library, web fetch, or manual input and generates verbose summaries using STAR + R-I-S-E framework for educational content analysis.
Share bugs, ideas, or general feedback.
Always before any research/finding extraction from a video or audio source. Aggregator articles and same-cycle summaries hallucinate quotes — they conflate quotes from different talks by the same person. The verbatim transcript is the only trustable primary source.
Trigger conditions:
/research is given a YouTube / podcast URL as the source[paraphrase] or [aggregator]Skip conditions:
_transcript-verbatim.md already exists for this URL in the findings folderThe script writes two files:
| File | Content |
|---|---|
<out>/_transcript-verbatim.md | Cleaned, paragraph-reflowed transcript with frontmatter (source URL, voice, method, capture date) |
<out>/_transcription-metadata.json | Build metadata: word count, method used, repetition zones detected, model + prompt, runtime |
The verbatim file is the source-of-truth artifact. Findings, reports, and content angles cite only quotes that grep into this file. Everything else is paraphrase and must be source-tagged.
yt-dlp --write-auto-subs --sub-langs en → fetch VTT → strip karaoke tags → dedupe consecutive lines → reflow into paragraphs.
references/typo-fixes.json); maintain it as new patterns surface--whisper flag)Download audio via yt-dlp → convert to 16kHz mono wav → run whisper-cli with Silero VAD + topic-seeded initial prompt.
When to use whisper:
When NOT to use whisper:
Default to auto-subs. Re-run with --whisper if a finding's verbatim quote contains a typo'd proper noun the user will want to cite.
--prompt seeds whisper's context with topic-relevant proper nouns. Without a prompt, whisper hallucinates novel spellings ("OpenClaw" / "CORTCO" / "JSON things"). Pull the prompt from:
--prompt "..." flag if user provided--get-title --get-description)/research contextAlways include in the prompt: speaker name(s), event name, and 5–10 likely proper nouns / acronyms.
Default output path: {vault}/knowledge/research/findings/{slug}/. Where slug is derived from --slug flag, the calling skill's findings dir, or <voice>-<topic> from yt-dlp metadata.
If the calling skill is /research, write to {vault}/knowledge/research/findings/{research-slug}/_transcript-verbatim.md so subsequent finding files in the same folder can [[wiki-link]] it.
When /research is invoked with a URL that's a YouTube watch link, podcast RSS item, or other audio/video source:
/transcribe-video first, write to the research's findings folder[primary — verbatim] not [paraphrase] or [aggregator]${CLAUDE_PLUGIN_ROOT}/scripts/transcribe-video.ts — TS+bun per global rule.
Direct invoke:
bun ${CLAUDE_PLUGIN_ROOT}/scripts/transcribe-video.ts <URL> --out <DIR> [--whisper] [--prompt "...."]
The script requires:
yt-dlp (brew install yt-dlp)ffmpeg (brew install ffmpeg)whisper-cli (brew install whisper-cpp) + model files (auto-downloaded to ~/.cache/whisper-models/ on first whisper run)If a dependency is missing, the script prints a single-line install command and exits with code 2. Don't try to auto-install — let the user run it.
# default — auto-subs, save to current research findings dir
bun scripts/transcribe-video.ts https://www.youtube.com/watch?v=96jN2OCOfLs --out knowledge/research/findings/karpathy-vibe-to-agentic
# high quality with whisper, with topic-seeded prompt
bun scripts/transcribe-video.ts https://www.youtube.com/watch?v=96jN2OCOfLs \
--out knowledge/research/findings/karpathy-vibe-to-agentic \
--whisper \
--prompt "Andrej Karpathy at Sequoia AI Ascent 2026. Topics: vibe coding, agentic engineering, Software 1.0/2.0/3.0, Claude Code, OpenCode, Codex, NanoGPT, jaggedness, verifiability, Menugen, Nano Banana."
| Failure | Cause | Mitigation |
|---|---|---|
| Repetition loop in whisper output | Beam search degenerate state on long silence / repeated content | Script detects + splices auto-sub text into repetition zone |
| Auto-subs unavailable | Channel disabled subs / region-locked | Auto-fall back to whisper |
| URL is not YouTube | Generic audio file | Skip yt-dlp subs path, go straight to ffmpeg + whisper |
| Whisper missing model | First run | Auto-download large-v3-turbo (1.6GB) + Silero VAD (885KB) — happens once, ~30 sec on fast network |
| Wrong proper nouns in auto-subs | Caption upload defaults | Apply references/typo-fixes.json post-process; user can extend the file |
references/typo-fixes.json — keep current. When a transcript has a new misheard proper noun, add it. The file is sorted by precedence (longer phrases first to avoid partial-match collisions).
When whisper-cli or yt-dlp behavior shifts (e.g. a new flag default), update the script. Keep the SKILL.md decision rules stable — those are the contract with calling skills.