From media-fx
Automates raw video editing: removes silences, retakes, duplicates, garbage via Whisper STT analysis; replaces narration with ElevenLabs TTS. Trims silence dynamically using Korean text. For screen recordings, lectures.
npx claudepluginhub dandacompany/dantelabs-agentic-school --plugin media-fxThis skill uses the workspace's default tool permissions.
Automated video editing pipeline that uses Whisper speech-to-text to detect and remove silence gaps, retakes/duplicates, and garbage segments. Optionally replaces narration with AI-generated TTS audio and trims silence dynamically.
Generate professional voiceover narration for a video with audio-video sync using Azure TTS by default, or Gemini 3.1 Flash TTS when configured. Use this skill whenever the user wants to add narration, voiceover, commentary, or voice dubbing to any video file — even if they just say "add audio to this video" or "make a narrated version." Also trigger when the user has a screen recording, demo, tutorial, or presentation video that needs a voice track. Trigger on Chinese requests like "视频配音", "给视频加旁白", "录屏解说", "视频加语音", "视频添加声音", "生成视频旁白", "自动配音", "视频解说词".
Processes audio/video semantically: trims clips by query, isolates vocals, dubs into other languages from YouTube URLs or local files.
Provides AI-assisted workflows for editing real video footage: transcribe/plan cuts with Claude, execute FFmpeg trims/concats, augment via Remotion/ElevenLabs/fal.ai, polish in Descript/CapCut. For vlogs/tutorials.
Share bugs, ideas, or general feedback.
Automated video editing pipeline that uses Whisper speech-to-text to detect and remove silence gaps, retakes/duplicates, and garbage segments. Optionally replaces narration with AI-generated TTS audio and trims silence dynamically.
brew install ffmpegpip install -r ~/.claude/skills/video-editor/scripts/requirements.txtsource ~/.claude/auth/elevenlabs.env~/.claude/skills/video-editor/scripts/video_editor.py
Standard editing workflow to remove silence, retakes, and garbage from recordings.
SCRIPT="~/.claude/skills/video-editor/scripts/video_editor.py"
# Step 1: Analyze
python $SCRIPT analyze recording.mp4
# Step 2: Review *_edit_plan.json, then execute
python $SCRIPT execute recording.mp4 recording_edit_plan.json
Replace original narration with AI-generated TTS voice, then trim excess silence.
SCRIPT="~/.claude/skills/video-editor/scripts/video_editor.py"
source ~/.claude/auth/elevenlabs.env
# Steps 1-2: Same as Workflow A
python $SCRIPT analyze recording.mp4
python $SCRIPT execute recording.mp4 recording_edit_plan.json
# Step 3: Prepare TTS segments (maps timestamps, filters garbage)
python $SCRIPT tts-prepare recording_edited.mp4 recording_whisper.json recording_edit_plan.json
# Step 4: Review *_tts_segments.json for text errors, then generate TTS
python $SCRIPT tts-generate recording_edited.mp4 recording_edited_tts_segments.json \
--voice-id YOUR_VOICE_ID
# Step 5: Trim silence dynamically
python $SCRIPT trim-silence recording_edited_tts.mp4 \
recording_edited_tts_segments.json recording_edited_tts/
Transcribe video with Whisper and generate an edit plan.
python scripts/video_editor.py analyze INPUT_VIDEO [OPTIONS]
Output files (saved alongside the input):
*_whisper.json - Full Whisper transcription with timestamps*_analysis.json - Detailed analysis (gaps, duplicates, garbage)*_edit_plan.json - KEEP/REMOVE segment listKey parameters:
-m, --whisper-model: Model size (default: medium)-l, --language: Language code (default: ko)--silence-threshold: Auto-remove silences above this duration (default: 10.0)--min-gap: Minimum gap to report (default: 2.0)Apply the edit plan to produce the edited video.
python scripts/video_editor.py execute INPUT_VIDEO EDIT_PLAN.json [OPTIONS]
Key parameters:
-o, --output: Output file path (default: *_edited.ext)--skip-denoise: Skip audio denoising-s, --denoise-strength: Spectral gating strength 0.0-1.0 (default: 0.4)--highpass/--lowpass: Bandpass filter range in Hz (default: 80-13000)Map Whisper segments to edited video timestamps and prepare for TTS generation.
python scripts/video_editor.py tts-prepare EDITED_VIDEO WHISPER.json EDIT_PLAN.json [OPTIONS]
This command:
*_tts_segments.json for reviewKey parameters:
--corrections: JSON file with text corrections ({"wrong": "correct", ...})Output: *_tts_segments.json - Array of segments with edited timestamps and cleaned text.
Review this file before proceeding to fix Whisper transcription errors.
Generate TTS audio using ElevenLabs API and create video with new narration.
python scripts/video_editor.py tts-generate EDITED_VIDEO SEGMENTS.json --voice-id VOICE_ID [OPTIONS]
This command:
Key parameters:
--voice-id (required): ElevenLabs voice ID--tts-model: ElevenLabs model (default: eleven_multilingual_v2)--stability: Voice stability 0.0-1.0 (default: 0.5)--similarity-boost: Similarity boost 0.0-1.0 (default: 0.8)--style: Style exaggeration 0.0-1.0 (default: 0.3)--force: Re-generate all TTS files (ignores cache)--skip-denoise: Skip audio denoising-s, --denoise-strength: Denoise strength (default: 0.4)Output files:
*_tts/tts_NNN.mp3 - Individual TTS audio files (cached for re-runs)*_tts.mp4 - Video with TTS narrationCaching: Successfully generated TTS files are reused on re-run. Use --force to regenerate.
Authentication: Requires ELEVENLABS_API_KEY environment variable. Load with source ~/.claude/auth/elevenlabs.env.
Trim silence from video with dynamic or fixed caps. Designed for TTS videos where silence gaps between segments need to be reduced.
python scripts/video_editor.py trim-silence TTS_VIDEO SEGMENTS.json TTS_DIR/ [OPTIONS]
Dynamic mode (default): Analyzes Korean text endings to set appropriate silence caps per segment:
., !, ?): 0.5s capFixed mode: Applies a uniform silence cap to all gaps.
Key parameters:
--mode: dynamic (default) or fixed--cap: Fixed silence cap in seconds (default: 0.5)--cap-sentence: Dynamic sentence-ending cap (default: 0.5)--cap-comma: Dynamic comma-connector cap (default: 0.3)--cap-continue: Dynamic continuing-phrase cap (default: 0.15)Examples:
# Dynamic mode (recommended for Korean)
python $SCRIPT trim-silence video_tts.mp4 segments.json tts_dir/
# Fixed mode with 0.5s cap
python $SCRIPT trim-silence video_tts.mp4 segments.json tts_dir/ --mode fixed --cap 0.5
# Tighter dynamic caps for fast-paced content
python $SCRIPT trim-silence video_tts.mp4 segments.json tts_dir/ \
--cap-sentence 0.3 --cap-comma 0.2 --cap-continue 0.1
Whisper may transcribe Korean words incorrectly. Create a corrections JSON file to fix recurring errors:
{
"프론프트": "프롬프트",
"핵심 기동": "핵심 기둥",
"파이성": "파이썬"
}
Apply during tts-prepare:
python $SCRIPT tts-prepare video.mp4 whisper.json plan.json --corrections fixes.json
[
{"action": "KEEP", "start": 0.0, "end": 56.48, "note": ""},
{"action": "REMOVE", "start": 56.48, "end": 77.52, "note": "무음 21.0초"},
{"action": "KEEP", "start": 77.52, "end": 189.46, "note": ""}
]
Gaps between Whisper segments exceeding --silence-threshold (default 10s) are automatically marked as REMOVE.
Segments sharing the same opening text (first 15 characters) are identified as retakes. The plan keeps only the last occurrence.
Detected by: duration under 0.3s, empty text, or Whisper hallucination patterns.
For trim-silence dynamic mode, segment endings are classified into three tiers using Korean linguistic patterns:
-c copy. No re-encoding preserves original quality.*_tts/ directory. Re-running tts-generate skips existing files.Quota costs: ~1 character = 1 character from quota. Estimate: 30 chars/segment x 100 segments = 3,000 chars.
Models:
eleven_multilingual_v2 - High quality, Korean support (default)eleven_turbo_v2_5 - Faster, slightly lower qualityRate limiting: Automatic retry with exponential backoff (3 retries, 5s/10s/20s waits).