Search everything...

Stats

Actions

Available In

video-recap-skills

Name: video-recap-skills
Author: worldwonderer

By worldwonderer

Automatically generate Chinese-narrated video recaps from any source video using a single MiMo API key and ffmpeg. The pipeline analyzes video content (scene detection, ASR, visual understanding), writes timestamped narration scripts, cuts and assembles clips, synthesizes Chinese TTS voiceover, and produces a final recap video with subtitles and loudness normalization.

npx claudepluginhub worldwonderer/video-recap-skills

Popularity

Stars

Top 5%

287

Med: 0·Avg: 283

Installs

Med: 0·Avg: 1

Forks

Top 5%

Med: 0·Avg: 36

Health & Quality

Maintenance

Top 25%

Excellent10.0/10

Med: 7/10·Avg: 7.4/10

Community

42%

Med: 42%·Avg: 42.1%

What's Inside

Skills6

video-assemble

/video-assemble

Assemble a final recap video: mux narration audio over the source video, duck the original audio under the narration, render subtitles (SRT/ASS, optionally burned in), and loudness- normalize. Use as the last stage of the video-recap bundle. Consumes the source video + tts_meta.json (+ narration placement); produces recap_<name>.mp4 + subtitles.srt/.ass. 触发词: 视频合成, 混音, 字幕, 压字幕, assemble video, mux, ducking, subtitles, 成片.

video-cut

/video-cut

Cut a long video down to selected source ranges (montage / clip assembly). Part of the video-recap bundle: in the orchestrated (two-pass) flow, consumes clip_plan.json + the source video, produces edited_source.mp4; the agent then writes narration.json against the output timeline. When invoked standalone WITHOUT --no-narration-map, also remaps an existing narration.json → narration_mapped.json (legacy single-pass path). 触发词: 视频剪辑, 剪辑式解说, video cut, clip plan, 拼剪.

video-recap

/video-recap

Generate a Chinese-narration recap video from an input video, end to end. Use when the user gives a video file (.mp4 / .mov / .mkv / .webm) and asks to add narration, generate voiceover, dub, summarize, or produce a recap (短剧 / 电视剧 / 电影 / 纪录片 / 科普). Orchestrates the video-* skill bundle: understanding → (agent writes narration) → cut → voiceover → assemble. 触发词: 视频解说, 视频旁白, 生成解说, 视频recap, video recap, voiceover, narration, auto-dub, recap.

video-script

/video-script

Write a timestamped Chinese narration script (解说词 / 旁白) for an already-analyzed video, then lint/validate it. Use after video-understanding has produced agent_narration_brief.md + vlm_analysis.json, when you need to author the recap narration (style, anti-hallucination, 字数公式, density, hook/throughline). Input: the understanding index in work_dir. Output: narration.json (validated). 触发词: 解说词, 写解说, 视频旁白, narration script, 写稿, 解说文案.

video-understanding

/video-understanding

Analyze a video into a structured understanding index: scene detection, ASR transcript, per-scene visual (VLM) analysis, silence windows, a fused timeline, and a narration-writing brief. Use to understand / index / summarize what happens in a video, or as the first stage of the video-recap bundle before writing narration. Input: a video file. Output: scenes.json, asr_result.json, vlm_analysis.json, silence_periods.json, timeline_fusion.json, agent_narration_brief.md. 触发词: 视频理解, 视频分析, 视频索引, video understanding, analyze video, 看懂视频.

Stats

Version0.3.0

ReleasedJun 19, 2026

LanguagePython

Stars287

Forks49

MaintenanceExcellent

LicenseMIT

Last CommitJun 19, 2026

AddedJun 14, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

README

video-recap-skills

中文 · English

在 claude code 仅需一句话把视频剪辑成解说视频。 本地只要 ffmpeg 加小米 MiMo Token Plan 的 API Key，不用 GPU、不用下载模型，macOS / Linux / Windows 均可运行。

演示

成片之外，还能一键导出剪映草稿手动精修，原片、解说、BGM、字幕：

这是什么

flowchart LR
    research["背景调研"] --> understand
    video(["视频"]) --> understand["理解<br/>场景·ASR·VLM"] --> script["写稿<br/>Agent"] --> voiceover["配音<br/>MiMo TTS"] --> assemble["组装<br/>混音·字幕"] --> output(["Recap"])
    understand -. 剪辑模式：先剪后配 .-> cut["剪辑<br/>先剪成片"] -.-> script
    classDef io fill:#eef6ff,stroke:#4f86c6,color:#1f2937;
    classDef opt fill:#f3f4f6,stroke:#9ca3af,color:#374151;
    class video,output io;
    class research,cut opt;

为什么用它

一个 key 跑全程。 ASR、VLM、TTS 全走小米 MiMo，本地除了 ffmpeg 没别的依赖。
该查资料时先查。 片名/剧情明确或 brief 提示素材偏薄时，把人物关系、剧情背景存进 background_research.json，VLM 才更容易认出谁是谁。
解说成块，原声也成块。 解说一段段连着讲、整块一次配音，段间留白把精彩原声整段放回满音量——大致七三开。
先剪后配，画面对齐。 --edit-mode cut 先把长视频剪成成片，再对着成片写解说，时间轴天然对齐。
能接着在剪映里改。 可选导出多轨剪映草稿，原片、解说、BGM、字幕各占一轨。

安装

① 装插件——复制到 claude code：

安装这个插件：https://github.com/worldwonderer/video-recap-skills

② 装 ffmpeg（流水线本身不用 pip install，脚本都是标准库 + PATH 上的 ffmpeg，Python 3.10+）：

brew install ffmpeg                        # macOS
sudo apt install ffmpeg                     # Debian/Ubuntu
choco install ffmpeg                        # Windows（或 scoop / winget install ffmpeg）

字幕默认烧进画面，需要带 libass（subtitles 滤镜） 的 ffmpeg——上面这些包基本都自带。如果你的 ffmpeg 没编 libass，开跑前会立刻报错并提示（也可以加 --no-burn-subtitles 输出未遮黑条的 MP4 + .srt 外挂字幕）。用 python3 scripts/recap.py --doctor 自检。

③ 配 MiMo API Key（一个 key 同时驱动 ASR / VLM / TTS）：

export MIMO_API_KEY=your-mimo-key
# tp-* 的 Token-Plan key 会自动连集群，可选 cn | sgp | ams：
export MIMO_TOKEN_PLAN_CLUSTER=cn

按量付费的 sk-* key 默认走 https://api.xiaomimimo.com/v1。其它都有默认值；想分别配 key/URL 或改模型、音色、响度、字幕等，可见配置手册。

怎么用

把视频丢给它，顺手给点视频背景：

给 /path/to/video.mp4 做个解说。这是《庆余年》第一集，主角是范闲。

它会分析视频、照背景写解说，产出带字幕的 recap_<名>.mp4。

把 /path/to/long.mp4 剪成十分钟左右的解说短片，字幕压进画面。

背后是编排器把几个阶段串起来跑，中间停下来让 Agent 写解说（剪辑模式会停两次：先写 clip_plan.json 挑片段，剪成成片后再对着成片写 narration.json）。第一次跑前可先自检环境：

python3 skills/video-recap/scripts/recap.py --doctor

架构

Skill	职责	输入 → 输出（`work_dir` 契约）
video-understanding	场景检测 · 抽帧 · ASR（`mimo-v2.5-asr`）· VLM（`mimo-v2.5`）· 时间轴融合 · 生成 brief（`--consolidate` 索引默认开）	`视频` → `scenes / asr_result / vlm_analysis / silence_periods / timeline_fusion / agent_narration_brief.md`
video-script	写作规则（SKILL.md）+ 评审（LLM 评委）+ lint/校验	`brief + 索引` → `narration.json`
video-cut	片段计划 → 拼剪成片（剪辑模式先剪后配，解说按成片时间轴写，无需重映射）	`clip_plan.json + 视频` → `edited_source.mp4`
video-voiceover	合成解说音频（MiMo TTS，`mimo-v2.5-tts`）	`narration.json` → `tts_segments/ + tts_meta.json`
video-assemble	混音 · 压低原声 · 渲染字幕 · 多轨时间线（可选导出剪映）	`视频 + tts_meta` → `recap_<名>.mp4 + subtitles.srt/.ass + timeline.json`
video-recap	编排器 + `--doctor`	`视频` → `recap_<名>.mp4`

输出

recap_<video>.mp4：成片（固定输出名，每次运行原地覆盖，迭代解说时刷新同一文件）。subtitles.srt（默认烧录字幕，同时产出 subtitles.ass；--no-burn-subtitles 关闭）
work_dir/narration.json：解说脚本（narration_lint.json 时间诊断、narration_review.md 评审意见）
work_dir/agent_narration_brief.md：给 Agent 的时间和场景 brief
work_dir/vlm_analysis.json · asr_result.json · silence_periods.json · timeline_fusion.json：理解产物
work_dir/clip_plan.json · edited_source.mp4 · recap_phase.json：剪辑模式产物（解说在成片时间轴上写，recap_phase.json 记录剪/配进度供断点续跑）
work_dir/timeline.json · work_dir/assembly_manifest.json · tts_segments/ · tts_meta.json：多轨时间线、渲染记录与 TTS 音频

自带原声字幕（可选，更准）

解说块之间的原声留白会把【原声台词】烧成字幕（用 「」 和解说区分开）。默认这份字幕由 Agent 校对、ASR 兜底——但 ASR 时间偏粗，偶尔会和原声对不上。想要更准，直接放一份字幕文件到 work_dir，它会作为首选来源：

work_dir/user_subtitles.json：[{"start": 秒, "end": 秒, "text": "台词"}]，按成片时间轴直接使用；或包一层 {"timeline": "source", "lines": [...]} 用原片时间轴，系统按剪辑计划自动映射到成片。
work_dir/user_subtitles.srt / .ass：默认按原片时间轴解析并映射到成片。

优先级：你的字幕文件 › Agent 校对的 original_subtitles.json › ASR 兜底。来源准确时按句精确落到对应留白，不再用粗略的估时。

参考文档

View full README on GitHub

video-recap-skills

Popularity

Health & Quality

What's Inside

Confidence

README

video-recap-skills

演示

这是什么

为什么用它

安装

怎么用

架构

输出

自带原声字幕（可选，更准）

参考文档

Similar Plugins

watch

video-skills

bibi

capcut-cli

claude-video-vision

peepshow

More by worldwonderer

story-import

video-recap-skills

演示

这是什么

为什么用它

安装

怎么用

架构

输出

自带原声字幕（可选，更准）

参考文档

More by worldwonderer

story-import

Similar Plugins

watch

video-skills

bibi

capcut-cli

claude-video-vision

peepshow

Popularity

Health & Quality