From video-recap-skills
Authors and validates timestamped Chinese narration scripts (解说词) for analyzed videos. Reads understanding brief + VLM analysis, writes narration.json with BLOCK-based writing rules, 7:3 narration/silence ratio, and anti-hallucination constraints.
How this skill is triggered — by the user, by Claude, or both
Slash command
/video-recap-skills:video-scriptThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Authoring + validation of the narration script. The **agent writes `work_dir/narration.json`**
Authoring + validation of the narration script. The agent writes work_dir/narration.json
following the rules below; then validate.py lints it against the understanding index, and in
full mode time-aligns it to quiet windows.
Read work_dir/agent_narration_brief.md (scenes, durations, quiet windows, char budget) first.
Digest long dialogue via asr_writing_chunks.json; judge "is there speech/a silent slot here?"
via timeline_fusion.json. Check raw vlm_analysis.json / asr_result.json for details.
In full mode, timestamps are original-video time. In orchestrated cut mode, pass 1 only writes clip_plan.json; after edited_source.mp4 exists, pass 2 writes narration.json in output timeline time.
写稿前先跑 python3 skills/video-recap/scripts/recap_inspect.py --work-dir <work_dir> state 看清楚当前模式、缺哪个产物、下一步该写什么。cut pass 2 写解说时用 recap_inspect.py --work-dir <work_dir> clip-map --output-start <s> --output-end <e>(或 --source-start/--source-end)核对输出↔原片时间轴,确认某段成片对应哪段原片、有没有跨剪辑边界或落进被剪掉的区间。
[
{"start": 5.0, "end": 12.0, "narration": "解说文本。", "pause_after_ms": 250, "overlaps_speech": true}
]
| Field | Meaning |
|---|---|
start / end | narration start/end seconds: original-video time in full mode; output timeline time in orchestrated cut pass 2 |
narration | narration text |
pause_after_ms | pause after segment, default 250 (keeps a tight rhythm) |
overlaps_speech | overlaps original dialogue; default true for continuous-bed style, false only in true silence |
Optionally also author original_subtitles.json — [{start,end,text}] (OUTPUT time) — the calibrated
original dialogue burned during the original-audio gaps (ASR errors/names fixed, only what is actually
spoken there). Rendered in 「」 to set it apart from narration. If omitted, assemble falls back to a
conservative auto-ASR mapping. See the brief's 原声留白字幕 section.
字数 / brief 头部 speech budget 估算,太长就拆块,太短就合并成完整想法。--context 或 background_research.json 给出角色名时优先使用。A separate quality pass (LLM-as-judge), distinct from the mechanical lint below. Needs the chat API key (same as VLM).
python3 scripts/review.py --work-dir <work_dir> (auto-detects cut mode and grounds against the OUTPUT timeline when a validated cut is present; pass --timeline source to force source grounding)narration_review.md. For every error finding (ESPECIALLY category=hallucination — a claim
not grounded in the visual/ASR evidence), revise narration.json and re-run review until either:
OK with zero error findings, ORwork_dir/narration_review_override.md naming WHICH finding
(segment + category), WHY it is acceptable, and who signed off. Unaddressed error findings with
no override entry mean the draft is NOT ready.validate.py — the hard gate).GATE rule: review NEVER blocks the tooling (it leans on a flaky chat API and a re-render is cheap).
validate.py is the deterministic hard gate. The override log makes "we saw the finding and chose to
ship it" auditable — review.py / validate.py never read it; it is a record for the human in the loop.
Override block shape — work_dir/narration_review_override.md (append-only):
## Override — <date>
- Finding: segment 4 / category=hallucination
- Reviewer said: "‘他早已知情’无画面/对白依据"
- Decision: KEEP — grounded in the --context synopsis (s2 reveal); reviewer lacked that context.
- Signed: <agent/human>
python3 scripts/validate.py --work-dir <work_dir> --mode full # or --mode cut
Writes narration_lint.json; in full mode rewrites narration.json with quiet-window alignment.
Fix any lint errors and re-run until clean.
The orchestrated video-recap --edit-mode cut flow is cut-first / narrate-second:
work_dir/clip_plan.json only, using original-time source ranges to keep.edited_source.mp4 and rebuilds the brief with kept clips on the output timeline.narration.json directly in output time (0..edited_source.mp4 duration). Validate with --mode cut_output via the orchestrator.The direct video-cut legacy path can still map original-time narration, but the orchestrated recap path does not use that remap.
{"target_duration": "10m", "clips": [{"start": 12.0, "end": 38.0, "reason": "冲突开端"}]}
剪辑模式写作要点(解说要对上剪后的画面,不是原片):
clip_plan.json 使用原片 source timestamps,挑出完整故事弧,不写解说。edited_source.mp4 已存在时,brief 会列出 kept clips 的 OUTPUT ranges;解说 beat 数量按实际剪后时长估算,不按原片时长。narration.json 的 [start,end] 直接落在剪后成片 0..total 时间轴上,不会再做 source→output 映射,也不会静默丢弃。片名/题材明确但缺乏剧情上下文时,先按 背景调研指南 写
background_research.json再写解说——否则解说只能"看图说话"。brief 在 substrate 偏薄时会把密度目标降为上限而非配额:宁可少写、写实,也不要为凑数堆画面描述。
review.py does NOT edit narration.json and does NOT block the pipeline — it is advisory.validate.py does NOT rewrite the meaning of the text — it only checks/aligns timing and quiet windows.npx claudepluginhub worldwonderer/video-recap-skillsGenerates Chinese-narration recap videos from source files. Orchestrates video understanding, narration writing, scene cutting, voiceover synthesis, and final assembly using a single MiMo API key and ffmpeg.
Writes documentary narration from picture descriptions and research notes. Drafts commentary that adds context without describing what is already visible on screen.
Generate professional voiceover narration for a video with audio-video sync using Azure TTS by default, or Gemini 3.1 Flash TTS when configured. Use this skill whenever the user wants to add narration, voiceover, commentary, or voice dubbing to any video file — even if they just say "add audio to this video" or "make a narrated version." Also trigger when the user has a screen recording, demo, tutorial, or presentation video that needs a voice track. Trigger on Chinese requests like "视频配音", "给视频加旁白", "录屏解说", "视频加语音", "视频添加声音", "生成视频旁白", "自动配音", "视频解说词".