Automatically generate Chinese-narrated video recaps from any source video using a single MiMo API key and ffmpeg. The pipeline analyzes video content (scene detection, ASR, visual understanding), writes timestamped narration scripts, cuts and assembles clips, synthesizes Chinese TTS voiceover, and produces a final recap video with subtitles and loudness normalization.
Assemble a final recap video: mux narration audio over the source video, duck the original audio under the narration, render subtitles (SRT/ASS, optionally burned in), and loudness- normalize. Use as the last stage of the video-recap bundle. Consumes the source video + tts_meta.json (+ narration placement); produces recap_<name>.mp4 + subtitles.srt/.ass. 触发词: 视频合成, 混音, 字幕, 压字幕, assemble video, mux, ducking, subtitles, 成片.
Cut a long video down to selected source ranges (montage / clip assembly). Part of the video-recap bundle: in the orchestrated (two-pass) flow, consumes clip_plan.json + the source video, produces edited_source.mp4; the agent then writes narration.json against the output timeline. When invoked standalone WITHOUT --no-narration-map, also remaps an existing narration.json → narration_mapped.json (legacy single-pass path). 触发词: 视频剪辑, 剪辑式解说, video cut, clip plan, 拼剪.
Generate a Chinese-narration recap video from an input video, end to end. Use when the user gives a video file (.mp4 / .mov / .mkv / .webm) and asks to add narration, generate voiceover, dub, summarize, or produce a recap (短剧 / 电视剧 / 电影 / 纪录片 / 科普). Orchestrates the video-* skill bundle: understanding → (agent writes narration) → cut → voiceover → assemble. 触发词: 视频解说, 视频旁白, 生成解说, 视频recap, video recap, voiceover, narration, auto-dub, recap.
Write a timestamped Chinese narration script (解说词 / 旁白) for an already-analyzed video, then lint/validate it. Use after video-understanding has produced agent_narration_brief.md + vlm_analysis.json, when you need to author the recap narration (style, anti-hallucination, 字数公式, density, hook/throughline). Input: the understanding index in work_dir. Output: narration.json (validated). 触发词: 解说词, 写解说, 视频旁白, narration script, 写稿, 解说文案.
Analyze a video into a structured understanding index: scene detection, ASR transcript, per-scene visual (VLM) analysis, silence windows, a fused timeline, and a narration-writing brief. Use to understand / index / summarize what happens in a video, or as the first stage of the video-recap bundle before writing narration. Input: a video file. Output: scenes.json, asr_result.json, vlm_analysis.json, silence_periods.json, timeline_fusion.json, agent_narration_brief.md. 触发词: 视频理解, 视频分析, 视频索引, video understanding, analyze video, 看懂视频.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
中文 · English
在 claude code 仅需一句话把视频剪辑成解说视频。 本地只要 ffmpeg 加小米 MiMo Token Plan 的 API Key,不用 GPU、不用下载模型,macOS / Linux / Windows 均可运行。
成片之外,还能一键导出剪映草稿手动精修,原片、解说、BGM、字幕:
flowchart LR
research["背景调研"] --> understand
video(["视频"]) --> understand["理解<br/>场景·ASR·VLM"] --> script["写稿<br/>Agent"] --> voiceover["配音<br/>MiMo TTS"] --> assemble["组装<br/>混音·字幕"] --> output(["Recap"])
understand -. 剪辑模式:先剪后配 .-> cut["剪辑<br/>先剪成片"] -.-> script
classDef io fill:#eef6ff,stroke:#4f86c6,color:#1f2937;
classDef opt fill:#f3f4f6,stroke:#9ca3af,color:#374151;
class video,output io;
class research,cut opt;
ffmpeg 没别的依赖。background_research.json,VLM 才更容易认出谁是谁。--edit-mode cut 先把长视频剪成成片,再对着成片写解说,时间轴天然对齐。① 装插件——复制到 claude code:
安装这个插件:https://github.com/worldwonderer/video-recap-skills
② 装 ffmpeg(流水线本身不用 pip install,脚本都是标准库 + PATH 上的 ffmpeg,Python 3.10+):
brew install ffmpeg # macOS
sudo apt install ffmpeg # Debian/Ubuntu
choco install ffmpeg # Windows(或 scoop / winget install ffmpeg)
字幕默认烧进画面,需要带 libass(subtitles 滤镜) 的 ffmpeg——上面这些包基本都自带。如果你的 ffmpeg 没编 libass,开跑前会立刻报错并提示(也可以加 --no-burn-subtitles 输出未遮黑条的 MP4 + .srt 外挂字幕)。用 python3 scripts/recap.py --doctor 自检。
③ 配 MiMo API Key(一个 key 同时驱动 ASR / VLM / TTS):
export MIMO_API_KEY=your-mimo-key
# tp-* 的 Token-Plan key 会自动连集群,可选 cn | sgp | ams:
export MIMO_TOKEN_PLAN_CLUSTER=cn
按量付费的 sk-* key 默认走 https://api.xiaomimimo.com/v1。其它都有默认值;想分别配 key/URL 或改模型、音色、响度、字幕等,可见
配置手册。
把视频丢给它,顺手给点视频背景:
给 /path/to/video.mp4 做个解说。这是《庆余年》第一集,主角是范闲。
它会分析视频、照背景写解说,产出带字幕的 recap_<名>.mp4。
把 /path/to/long.mp4 剪成十分钟左右的解说短片,字幕压进画面。
背后是编排器把几个阶段串起来跑,中间停下来让 Agent 写解说(剪辑模式会停两次:先写 clip_plan.json 挑片段,剪成成片后再对着成片写 narration.json)。第一次跑前可先自检环境:
python3 skills/video-recap/scripts/recap.py --doctor
| Skill | 职责 | 输入 → 输出(work_dir 契约) |
|---|---|---|
| video-understanding | 场景检测 · 抽帧 · ASR(mimo-v2.5-asr)· VLM(mimo-v2.5)· 时间轴融合 · 生成 brief(--consolidate 索引默认开) | 视频 → scenes / asr_result / vlm_analysis / silence_periods / timeline_fusion / agent_narration_brief.md |
| video-script | 写作规则(SKILL.md)+ 评审(LLM 评委)+ lint/校验 | brief + 索引 → narration.json |
| video-cut | 片段计划 → 拼剪成片(剪辑模式先剪后配,解说按成片时间轴写,无需重映射) | clip_plan.json + 视频 → edited_source.mp4 |
| video-voiceover | 合成解说音频(MiMo TTS,mimo-v2.5-tts) | narration.json → tts_segments/ + tts_meta.json |
| video-assemble | 混音 · 压低原声 · 渲染字幕 · 多轨时间线(可选导出剪映) | 视频 + tts_meta → recap_<名>.mp4 + subtitles.srt/.ass + timeline.json |
| video-recap | 编排器 + --doctor | 视频 → recap_<名>.mp4 |
recap_<video>.mp4:成片(固定输出名,每次运行原地覆盖,迭代解说时刷新同一文件)。subtitles.srt(默认烧录字幕,同时产出 subtitles.ass;--no-burn-subtitles 关闭)work_dir/narration.json:解说脚本(narration_lint.json 时间诊断、narration_review.md 评审意见)work_dir/agent_narration_brief.md:给 Agent 的时间和场景 briefwork_dir/vlm_analysis.json · asr_result.json · silence_periods.json · timeline_fusion.json:理解产物work_dir/clip_plan.json · edited_source.mp4 · recap_phase.json:剪辑模式产物(解说在成片时间轴上写,recap_phase.json 记录剪/配进度供断点续跑)work_dir/timeline.json · work_dir/assembly_manifest.json · tts_segments/ · tts_meta.json:多轨时间线、渲染记录与 TTS 音频解说块之间的原声留白会把【原声台词】烧成字幕(用 「」 和解说区分开)。默认这份字幕由 Agent 校对、ASR 兜底——但 ASR 时间偏粗,偶尔会和原声对不上。想要更准,直接放一份字幕文件到 work_dir,它会作为首选来源:
work_dir/user_subtitles.json:[{"start": 秒, "end": 秒, "text": "台词"}],按成片时间轴直接使用;或包一层 {"timeline": "source", "lines": [...]} 用原片时间轴,系统按剪辑计划自动映射到成片。work_dir/user_subtitles.srt / .ass:默认按原片时间轴解析并映射到成片。优先级:你的字幕文件 › Agent 校对的 original_subtitles.json › ASR 兜底。来源准确时按句精确落到对应留白,不再用粗略的估时。
逆向导入已有小说。将已写好的小说(半成品或完本)反向解析为标准项目目录结构,兼容后续写作流程。
npx claudepluginhub worldwonderer/video-recap-skillsLet Claude watch a video. Downloads with yt-dlp, extracts scene-change frames with ffmpeg, runs a dense 0-10s hook microscope, pulls captions or falls back to Whisper, emits a structured report.md, and (optionally) auto-saves into your Obsidian vault.
Download, transcribe, and narrate videos
Summarize videos, audio, and podcasts via BibiGPT CLI directly in the terminal
Edit CapCut and JianYing projects from Claude Code — subtitles, timing, speed, volume, templates, cut long-form to shorts.
Give Claude the ability to watch and understand videos — extracts frames and audio for full video perception
Turn videos into a sequence of relevant still frames + transcript + a self-contained HTML report so Claude can view them as images, hear the audio, and write its analysis back into the report. Pass a local path, an http(s) URL, or pipe video bytes on stdin.