Help us improve
Share bugs, ideas, or general feedback.
From lattifai-skills
Convert between 30+ caption/subtitle formats (SRT, VTT, ASS, JSON, TextGrid, LRC, FCPXML, Premiere, …) and shift timing. Trigger on "convert captions", "SRT to VTT", "转换字幕格式", "shift timing", "ASS styling", "karaoke effect", "导入Premiere", or any caption-format question. Do NOT trigger to fix timing accuracy (`/lai-align`) or translate (`/lai-translate`).
npx claudepluginhub lattifai/lattifai-skills --plugin lattifai-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/lattifai-skills:lai-captionThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert, shift, and style caption files. Format is auto-detected from file extension.
Converts caption files between 30+ formats including SRT, VTT, ASS, TTML, Gemini MD, JSON. Handles YouTube transcripts, batch conversion, and ASS style presets like bilingual.
Translate captions into another language (or produce bilingual captions) while preserving segment count, timing, speaker labels, AND source punctuation density (no inserted em-dashes, parentheses, or bracketed glosses unless the source had them — downstream rendering shows every character). **Primary path uses this session's LLM directly — no API key, no model config.** Trigger on "translate captions", "翻译字幕", "翻译成中文/英文", "make bilingual subtitles", or "translate this" when working with caption files. CLI `lai translate run` is the secondary path for headless / oversized runs.
Provides FFmpeg commands and guides for subtitle/caption tasks: burning SRT/ASS/VTT, adding soft tracks, extracting, format conversion, styling, positioning, CEA-608/708, drawtext overlays, Whisper transcription, batch processing.
Share bugs, ideas, or general feedback.
Convert, shift, and style caption files. Format is auto-detected from file extension.
laicap-convert and lai caption convert are the same command — the former is the shortcut entry-point. For pipeline / non-interactive use, add --direct -Y (direct execution, skip confirmation).
laicap-convert (== lai caption convert) takes two positional args: input_path and output_path. All styling/rendering keys are flat top-level config (render.*, ass.*), not caption.*.
All examples assume a single <base> (media stem or YouTube ID) reused across the pipeline; outputs land in the current directory.
# <base> = podcast
laicap-convert podcast.srt podcast.vtt
Common pairs:
laicap-convert podcast.aligned.json podcast.srt # JSON → playback formats
laicap-convert podcast.srt podcast.ass # basic → styled
laicap-convert podcast.transcript.md podcast.srt # Gemini markdown → SRT
laicap-convert podcast.aligned.json podcast.TextGrid # → Praat
Add -Y for non-interactive runs. Use input_format=srt (top-level flag) to override auto-detection when the extension is wrong.
laicap-shift podcast.srt podcast.shifted.srt 2.5 # forward 2.5 s
laicap-shift podcast.srt podcast.shifted.srt -1.0 # backward 1 s
Top-level ass.* keys (see lai caption convert --help for the full list):
laicap-convert podcast.srt podcast.ass \
ass.font_name="Noto Sans CJK SC" \
ass.font_size=48 \
ass.primary_color="#FFFFFF"
Requires word-level JSON input (produced by /lai-align with caption.render.word_level=true, or /lai-youtube with the same flag — the upstream flag populates words arrays in the JSON).
laicap-convert podcast.aligned.json podcast.karaoke.ass \
ass.karaoke_effect=sweep \
ass.karaoke_color_scheme=azure-gold
render.word_levelscope — default is word-scope (per-word\k/\kf/\koemission). Only passrender.word_level=falseto switch to line-scope (a single override at event start, whole text block animates together — CapCut-style "fade in"). You don't needrender.word_level=trueto enable karaoke;ass.karaoke_effectalone does it.
ass.karaoke_effect: sweep (classic karaoke fill), instant (hard on/off), outline (outline-only highlight)ass.karaoke_color_scheme (12 presets, each tunes primary/secondary/outline/back):
azure-gold, sakura-purple, mint-ocean, gardenia-green, sunset-warm,
prussian-elegant, burgundy-classic, langgan-spring, mars-teal,
spring-field, navy-pink, apricot-darkass.kinetic_style (orthogonal per-word animation, 15 options grouped by feel):
bounce, pop, shake, pulse, swingfade, zoom, rise, typewriter, blur_inglow, neon, wave, flicker, staggerFormat selection, typography conventions (line order / color / font-size ratio), and four scenario recipes (learning SRT / social ASS / karaoke / dual-track upload) all live in /lai-translate §Bilingual Delivery Guide — don't duplicate. Minimal one-liner for a bilingual JSON from /lai-translate:
# `<base>.translated.json` is the merged JSON (source + translation) from
# `/lai-translate merge.py --bilingual`.
laicap-convert podcast.translated.json podcast.zh.translated.ass \
ass.primary_color="#FFFFFF" \
ass.translation_color="#FFC209"
Add ass.karaoke_effect=sweep ass.karaoke_color_scheme=... for per-word karaoke on the source line.
ass.speaker_color=... paints dialogue per speaker (needs /lai-diarize output in the source). Accepts:
"" — disabled (default)"auto" — 10-color LattifAI palette (cycles for >10 speakers)"#RRGGBB,#RRGGBB,..." — CSV of explicit colors, one per speaker in appearance order# Host = cyan-blue, Guest = pink — short CSV palette
laicap-convert diarized.json out.ass \
render.include_speaker_in_text=true \
ass.speaker_color="#658AE4,#F7C3D9"
standardization.* reflows segments to meet CPL/CPS/duration rules before writing. Useful for delivering SRT/VTT to Netflix-class platforms or tighter YouTube specs.
Fields (all top-level):
standardization.min_duration / max_duration — segment duration bounds (s)standardization.min_gap — minimum inter-segment gap (s)standardization.max_lines / max_chars_per_line — line wrapping limitsstandardization.optimal_cps — target characters-per-second for readabilitystandardization.start_margin / end_margin — pre/post roll per segmentstandardization.margin_collision_mode — trim (default), drop, …Ready-made profiles:
# Netflix-ish: 42 CPL × 2 lines, 0.8-7 s
laicap-convert diarized.json out.netflix.srt \
standardization.min_duration=0.8 \
standardization.max_duration=7.0 \
standardization.min_gap=0.08 \
standardization.max_lines=2 \
standardization.max_chars_per_line=42 \
standardization.start_margin=0.05 \
standardization.end_margin=0.15
# YouTube-ish: shorter cues, narrower lines
laicap-convert diarized.json out.youtube.srt \
standardization.min_duration=0.5 \
standardization.max_duration=5.0 \
standardization.max_chars_per_line=35 \
standardization.start_margin=0.03 \
standardization.end_margin=0.10
start_margin / end_margin also benefit karaoke exports (give lyrics breathing room) — combine with the karaoke recipe above.
| Category | Formats (read & write unless noted) |
|---|---|
| Standard | .srt, .vtt, .ass / .ssa, .lrc, .txt, .md |
| Data | .json, .tsv, .csv, .aud |
| Linguistic | .TextGrid, .ttml / .xml |
| NLE | .fcpxml; .prproj.xml (write-only) |
Full list: lai caption convert --help.
Convert to .json first to keep word-level timing, speakers, and translations — then fan out to delivery formats.
| Problem | Fix |
|---|---|
| Unknown input format | Specify input_format=srt (top-level flag, not caption.input_format) |
| Encoding error | Re-save the file as UTF-8 |
| Karaoke has no highlighting | Source JSON needs words arrays — re-run /lai-align with caption.render.word_level=true, then convert with just ass.karaoke_effect=sweep (no render.word_level needed; default is word-scope) |
No parameter named 'caption' / 'input' | For laicap-convert, styling keys are flat (render.*, ass.*) and input/output are positional (input_path/output_path) — there is no caption.* namespace here |
| Plain text missing timing | Add timing via /lai-align first |
/lai-align — fix timing accuracy (conversion doesn't change timing)/lai-transcribe — generate captions from audio first/lai-translate — translate before bilingual conversion/lai-diarize — add speaker labels for speaker-colored output