Help us improve
Share bugs, ideas, or general feedback.
From lattifai-skills
Translate captions into another language (or produce bilingual captions) while preserving segment count, timing, speaker labels, AND source punctuation density (no inserted em-dashes, parentheses, or bracketed glosses unless the source had them — downstream rendering shows every character). **Primary path uses this session's LLM directly — no API key, no model config.** Trigger on "translate captions", "翻译字幕", "翻译成中文/英文", "make bilingual subtitles", or "translate this" when working with caption files. CLI `lai translate run` is the secondary path for headless / oversized runs.
npx claudepluginhub lattifai/lattifai-skills --plugin lattifai-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/lattifai-skills:lai-translateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Preferred model: Claude Sonnet** (cost-efficient for this agent-driven workload). This skill runs on whatever model is active in the parent session — any Claude model works; no hard switch. If you're deliberately on Opus / Haiku, that's fine.
Translates SRT/VTT/ASS caption files and YouTube transcripts to other languages with context-aware LLM processing and bilingual output. Claude default; optional Gemini API.
Convert between 30+ caption/subtitle formats (SRT, VTT, ASS, JSON, TextGrid, LRC, FCPXML, Premiere, …) and shift timing. Trigger on "convert captions", "SRT to VTT", "转换字幕格式", "shift timing", "ASS styling", "karaoke effect", "导入Premiere", or any caption-format question. Do NOT trigger to fix timing accuracy (`/lai-align`) or translate (`/lai-translate`).
Translates subtitle blocks between languages while enforcing broadcast constraints: 42 chars per line, max two lines, and readable speed for viewers.
Share bugs, ideas, or general feedback.
Preferred model: Claude Sonnet (cost-efficient for this agent-driven workload). This skill runs on whatever model is active in the parent session — any Claude model works; no hard switch. If you're deliberately on Opus / Haiku, that's fine.
This skill translates using the agent's own LLM capability — the model you are running right now. It does not call out to any external translation service by default. Helper scripts do the mechanical parsing, chunking, and writing so the agent only has to produce translations.
Primary path (agent-driven, default): chunk.py → agent translates → merge.py → validate.py.
Secondary path: lai translate run (CLI with its own LLM backend) — use only when there is no agent in the loop (batch pipelines, CI) or when a transcript is too large to fit in the agent's context.
Any violation is a bug; validate.py enforces them.
start / end, speaker — preserved verbatimtext preserved, translation non-empty[MUSIC], [APPLAUSE], …) — keep them as-is inside translations<source> — caption file (SRT, VTT, JSON, ASS, …) parsed via lattifai-captions (30+ formats)zh, ja, ko, es, fr, de, …)--bilingual — keep source text + attach translation (renderers emit dual lines)--mode refined — agent does a self-critique revision pass on accuracy / naturalness / terminology / voice--chunk-size N — segments per chunk fed to the agent (default 30)<base> = source media stem (e.g. podcast from podcast.mp3) or YouTube video ID. Files all land in the current directory:
# 1. Split the source into agent-sized chunks (JSON, stripped to essentials)
python skills/lai-translate/scripts/chunk.py podcast.aligned.json --chunk-size 30 -o podcast.chunks.json
# 2. Agent translates each chunk and writes <base>.translation.json:
# {"target_lang": "zh",
# "items": [{"idx": 0, "translation": "..."}, ...]}
# 3. Merge translations back into the source format
# replace mode (default) — produces a standalone <lang> SRT:
python skills/lai-translate/scripts/merge.py podcast.aligned.json podcast.translation.json -o podcast.zh.srt
# bilingual mode — source + translation lines (SRT/VTT/ASS/JSON):
python skills/lai-translate/scripts/merge.py podcast.aligned.json podcast.translation.json --bilingual -o podcast.zh.translated.srt
# 4. Validate invariants
python skills/lai-translate/scripts/validate.py podcast.aligned.json podcast.zh.translated.srt
If -o is omitted, merge.py derives <base>.<lang>[.translated]<ext> automatically (e.g. podcast.aligned.json translated to zh yields podcast.zh.json / podcast.zh.translated.json). Pipeline-stage suffixes (.aligned, .transcript, .diarized, .translation) are stripped from the source stem to recover a clean <base>.
<base>.translation.json with idx matching the source — do not add/remove/reorder itemsDo NOT inject punctuation absent from the source. Production audit on a
sister pipeline (14 k zh supervisions): 16 % had —— em-dashes added to
sentences whose source contained no dash, and ~24 cases added parentheses
that were never spoken. Both deform delivery — em-dashes force a hard
pause, parentheses get rendered or read as an aside.
When the source has none of them, the translation must not introduce any of:
—— / — double or single em-dash-- ASCII double-hyphen(…) / (…) parentheses (full-width or half-width)【…】 / […] bracketsRewrite with the connector the source already implies (period, comma, "because" / "也就是" / "因为", clause split). Examples:
src: "I think it's nextgen because these things go crazy."
✗ "我觉得那就是下一代打法——这些东西就是会疯传。"
✓ "我觉得那就是下一代打法,因为这些东西就是会疯传。"
src: "the demultiplexer or demux."
✗ "解复用器(也叫 demux)。"
✓ "解复用器,也叫 demux。"
validate.py flags these as punct_drift_* warnings (stderr), but the
agent should not produce them in the first place — the validator is a
safety net, not an excuse.
The CLI subcommand is lai translate caption (not lai translate run); input / output are positional:
lai translate caption podcast.aligned.json podcast.zh.srt \
translation.target_lang=zh \
translation.mode=normal
Configure an LLM backend once (required for this path only):
lai config set translation.llm.model_name gemini-3-flash-preview
# Gemini key: see /lai-transcribe
# Or OpenAI-compatible:
lai config set translation.llm.model_name gpt-4o
lai config set OPENAI_API_KEY <your-key>
This guide is the canonical reference for bilingual captions. /lai-caption and /lai-karaoke cross-reference it instead of duplicating content.
| Scenario | Recommended format | Why |
|---|---|---|
| Language learning / foreign-film viewing | SRT dual-line (source on top, translation below) | Universal player support, readable on any device; fastest to ship |
| Social short video (抖音 / 小红书 / Reels / TikTok) | ASS dual-color (source primary, translation in accent) | Inline styling, brand color, adaptive font_size for vertical video |
| Lyric / karaoke video | ASS bilingual + karaoke (per-word sweep on source, translation line below) | Keeps per-word highlight; see /lai-karaoke §Bilingual variant |
| Broadcast / platform upload (YouTube / Netflix / Bilibili) | Two independent tracks — one SRT per language | Platforms want separate subtitle streams, not dual-line merged |
| Further processing / re-rendering | Bilingual JSON (both text and translation populated) | Round-trippable; preserves words, speaker, and enables any of the above later |
#00FFFF (cool), Amber #FFC209 (warm), Pink #F7C3D9 (soft). Aim for contrast ratio ≥ 2:1 against both the source line and the background/lai-caption §Broadcast-Grade Profiles for per-line wrapping rulesAll four recipes assume <base>.aligned.json exists (produced by /lai-align or /lai-youtube) and the agent has produced <base>.translation.json via step 2 of the §Primary Path. Pick <base> once (media stem or YouTube ID) and reuse.
R1 — Language-learning SRT (dual-line, universal player)
python skills/lai-translate/scripts/merge.py \
podcast.aligned.json podcast.translation.json \
--bilingual -o podcast.zh.translated.srt
python skills/lai-translate/scripts/validate.py \
podcast.aligned.json podcast.zh.translated.srt
Default: source on top, translation below. Works in VLC, YouTube upload, PotPlayer, mpv, Premiere.
R2 — Social short video (ASS dual-color, adaptive font)
# First merge into bilingual JSON (preserves words + translation)
python skills/lai-translate/scripts/merge.py \
podcast.aligned.json podcast.translation.json \
--bilingual -o podcast.translated.json
# Then render ASS — MUST set karaoke_effect for translation_color to take effect
# (see "Rendering constraint" below). `instant` = zero animation, just the dual-
# color output we want for social.
laicap-convert --direct -Y podcast.translated.json podcast.zh.translated.ass \
ass.karaoke_effect=instant \
ass.primary_color="#FFFFFF" \
ass.translation_color="#FFC209" \
ass.font_size=86 \
ass.play_res_x=1080 ass.play_res_y=1920 \
standardization.start_margin=0.05 \
standardization.end_margin=0.15
For portrait 9:16 use ass.font_size=86, landscape 16:9 76, 4K landscape 151 (see /lai-karaoke §Adaptive Font Size).
Rendering constraint —
ass.translation_coloris injected into the dialogue only whenass.karaoke_effectis set (sweep/instant/outline). Plain ASS without a karaoke effect renders both lines in one style (both useprimary_color). Usekaraoke_effect=instantto get dual-color without any per-word animation.
R3 — Bilingual karaoke (per-word sweep + translation line)
Route through /lai-karaoke — it handles aspect-aware font sizing, presets, and the ass.translation_color wiring. Quick form:
laicap-convert --direct -Y podcast.translated.json podcast.karaoke.zh.translated.ass \
ass.karaoke_effect=sweep \
ass.karaoke_color_scheme=azure-gold \
ass.translation_color="#00FFFF" \
ass.font_size=<from probe_media.py>
R4 — Two independent tracks (platform upload)
# Source-language SRT — skip translation entirely, just convert <base>.aligned.json
laicap-convert --direct -Y podcast.aligned.json podcast.en.srt
# Target-language SRT — merge WITHOUT --bilingual (replace mode)
python skills/lai-translate/scripts/merge.py \
podcast.aligned.json podcast.translation.json -o podcast.zh.srt
Upload both files as separate subtitle tracks. This is what YouTube / Bilibili multi-language upload and Netflix TTG workflows expect — dual-line merged files get rejected at QC.
The bilingual JSON produced by merge.py --bilingual is the canonical intermediate format — it carries text, translation, target_lang, words (if any), and speaker (if any). Once you have it, every rendering downstream (R1–R3, karaoke variants, speaker-colored bilingual, …) is a single laicap-convert call away. Build the JSON once, render many.
| Problem | Fix |
|---|---|
merge.py reports N missing translations | Re-translate those idxes and re-run merge.py |
validate.py flags segment count drift | Bug — re-run chunk.py + merge.py from a clean state |
| Terminology inconsistent across chunks | Run in refined mode, or pass earlier chunks as context |
| Source-language text leaks into translation | Refined mode's self-critique; validate validate.py on JSON output |
| Transcript too large for the agent | Fall back to CLI (secondary path) |
| Source language unclear | Ask the user to confirm before translating |
/lai-transcribe, /lai-align — produce the caption file first/lai-caption — convert translated output (e.g., bilingual ASS)/lai-summarize — summarize instead of translating