By lattifai
Automate AI-powered media workflows: download videos/audio/captions from YouTube/1000+ platforms, transcribe to timestamped markdown with speakers/chapters via Gemini, translate SRT/VTT/ASS to bilingual output via Claude/Gemini, force-align word-level timings, and batch-convert 30+ caption formats.
npx claudepluginhub lattifai/omni-captions-skills --plugin omnicaptionsUse when user needs accurate/precise caption timing, or aligning captions with audio/video using forced alignment. Corrects caption timing to match actual speech. Uses LattifAI Lattice-1 model.
Use when converting between caption formats (SRT, VTT, ASS, TTML, Gemini MD, etc.). Supports 30+ caption formats.
Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.
Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.
Use when translating captions/captions to another language. Supports bilingual output and context-aware translation. Default uses Claude native, Gemini API optional.
Captions Made Easy — Claude Code Caption Skills
"I need bilingual captions for this Fireship vibe coding video https://youtube.com/watch?v=Tw18-4U7mts"
One sentence. Claude handles the download, transcription, and translation.
npx skills add https://github.com/lattifai/omni-captions-skills
Claude Code Plugin System:
/plugin marketplace add lattifai/omni-captions-skills
/plugin install omnicaptions@lattifai-omni-captions-skills
Local Development:
git clone https://github.com/lattifai/omni-captions-skills.git
claude --plugin-dir ./omni-captions-skills
❯ Make bilingual captions for this Fireship vibe coding video https://youtube.com/watch?v=Tw18-4U7mts
1
00:00:00,000 --> 00:00:03,200
Mass hysteria satisfies a deep human need.
群体性癔症满足了人类某种深层需求。
2
00:00:03,200 --> 00:00:07,440
Vibe coding is programming without actually writing any code yourself.
Vibe coding 就是不用自己写代码的编程方式。
| Skill | Description |
|---|---|
transcribe | YouTube/video → Markdown with timestamps |
translate | Translate captions, bilingual output supported |
convert | Convert between 30+ caption formats |
download | Download YouTube video/audio/captions |
LaiCut | Forced alignment, word-level timing accuracy |
Invoke via
/omnicaptions:transcribeor/omnicaptions-transcribe
Standard transcription gives "approximate" timestamps. LaiCut uses LattifAI Lattice-1 model to match text precisely to audio waveforms, achieving word-level accuracy.
Install LaiCut:
# Using uv (recommended, auto-configures package index)
uv pip install "omni-captions-skills[laicut]" --extra-index-url https://lattifai.github.io/pypi/simple/
# Using pip
pip install "omni-captions-skills[laicut]" --extra-index-url https://lattifai.github.io/pypi/simple/
Supported languages: English, Chinese, German, and mixed
Recommended workflow: Align before translate (translated text doesn't match original audio)
| Feature | API Key | Note |
|---|---|---|
| Translation | None required | Uses Claude by default, works out of the box |
| Transcription | Gemini API | Optional, only needed for transcription |
| LaiCut alignment | LattifAI API | Optional, only needed for precise alignment |
Gemini is only used for video transcription. When a video has no captions, you'll be prompted whether to transcribe — configure then. Translation uses Claude by default, works out of the box.
API keys are prompted automatically and saved to ~/.config/omnicaptions/config.json
# With captions: download → align → translate
omnicaptions download "https://youtube.com/watch?v=xxx"
omnicaptions LaiCut video.mp4 video.en.vtt -o video_LaiCut.srt
omnicaptions translate video_LaiCut.srt -l zh --bilingual
# Without captions: transcribe → align → translate
omnicaptions transcribe video.mp4
omnicaptions LaiCut video.mp4 video_GeminiUnd.md -o video_LaiCut.srt
omnicaptions translate video_LaiCut.srt -l zh --bilingual
Credits: @dotey for the transcription prompt | Built on lattifai-captions
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Frontend design skill for UI/UX implementation
Creative skill for generating algorithmic and generative art. Produces visual designs using mathematical patterns, fractals, and procedural generation.
Humanise text and remove AI writing patterns. Detects and fixes 24 AI tell-tales including inflated language, promotional tone, AI vocabulary, filler phrases, sycophantic tone, and formulaic structure.
Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). Proactively activates in projects with cacheComponents: true, providing patterns for 'use cache' directive, cacheLife(), cacheTag(), cache invalidation, and parameter permutation rendering.