From ai-video-producer
Reformat an existing script for a specific text-to-speech model (ElevenLabs, OpenAI TTS, Google, Azure, Hume, Chatterbox, etc.). Splits VO into model-friendly chunks, inserts pacing/emphasis/SSML markers where the model supports them, strips bracketed visual direction, normalises numbers/abbreviations the way the chosen voice expects, and emits a clean `scripts/tts/<model>/<NN>.txt` (or `.ssml`) file per beat. Use when the user has a locked script and is about to generate VO.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin ai-video-producerThis skill uses the workspace's default tool permissions.
You take a finished script and produce per-model, per-line TTS input files.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
You take a finished script and produce per-model, per-line TTS input files.
scripts/final/script.md (or a draft path the user passes).elevenlabs, openai-tts, google-tts, azure-tts, hume, chatterbox.brief/tools-and-models.md if listed.| Model | Format | Notable conventions |
|---|---|---|
| ElevenLabs | plain text + inline <break time="500ms"/> | Keep chunks ≤ 5000 chars per request. Don't over-use breaks — the v3 models pace naturally. |
| OpenAI TTS | plain text | No SSML. Use punctuation for pacing. ≤ 4096 chars per request. |
| Google Cloud TTS | SSML | Full SSML supported: <break>, <emphasis>, <prosody rate="…" pitch="…">. |
| Azure (neural) | SSML with <voice> + mstts:express-as styles | Style + degree work for some voices; check voice's supported styles. |
| Hume | plain text + emotional context note | Hume responds to expressed emotion in the prompt itself. |
| Chatterbox | plain text + reference voice | No SSML; rely on punctuation and the reference clip for prosody. |
[wide shot of skyline]).API → keep, e.g. → "for example") if the model is known to mangle them.2026 → "twenty twenty-six" usually unnecessary for ElevenLabs but required for some Azure voices).scripts/tts/<model>/<NN>-<slug>.ssmlscripts/tts/<model>/<NN>-<slug>.txtscripts/tts/<model>/INDEX.md listing each file, character count, estimated duration (chars × ~14ms for typical speech rate), and the voice ID to use.01a, 01b.