From claude-blog
Generates audio narration of blog posts using Google Gemini TTS. Supports summary, full read-aloud, and two-speaker podcast modes with 30 voices. Outputs MP3 and HTML5 embed code. Useful for 'narrate blog', 'tts', or 'podcast mode' requests.
npx claudepluginhub agricidaniel/claude-blog --plugin claude-blogThis skill uses the workspace's default tool permissions.
Generate professional audio narration of blog content using Google's Gemini TTS.
Converts text, documents (PDF/DOCX/MD/TXT), and JSON scripts to speech audio via Google Cloud TTS. Supports Neural2 voices, podcasts, 40+ languages; requires API key and ffmpeg.
Generates audiobooks, podcasts, or educational audio from user topics using Claude-written scripts and ElevenLabs TTS. Supports custom lengths, voice effects; outputs MP3 files.
Generates two-host conversational podcast MP3 audio and transcripts from text content. Creates JSON dialogue scripts then runs Python TTS synthesis script. Supports English/Chinese.
Share bugs, ideas, or general feedback.
Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
| Command | What it does |
|---|---|
/blog audio generate <file> | Generate audio narration of a blog post |
/blog audio voices | Show available voices with characteristics |
/blog audio setup | Check/configure API key for Gemini TTS |
run.py)GOOGLE_AI_API_KEY environment variable (same key used by blog-image)# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
# WRONG:
python3 scripts/generate_audio.py --text "..." # Fails without venv
Before generating audio, check for the API key:
echo $GOOGLE_AI_API_KEY
export GOOGLE_AI_API_KEY=your-key
This is the same key used by /blog image -- if image generation works, audio works too."For /blog audio setup:
GOOGLE_AI_API_KEY is set in environment.mcp.json), the key is already availablepython3 scripts/run.py generate_audio.py --text "Test" --dry-run --jsonFor /blog audio voices:
Load references/voices.md and present the voice catalog to the user.
Ask the user which voice they prefer, or recommend based on content type:
For /blog audio generate <file>:
Read the file and extract:
Ask the user (or auto-select if they specified --mode):
| Mode | When to use | Output |
|---|---|---|
| Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary |
| Full | Complete read-aloud (5-15 min) | Full article as natural speech |
| Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |
CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode: Write a 200-300 word spoken summary of the article. Rules:
Full mode: Strip the markdown content to clean spoken text:
Dialogue mode: Write a 2-person conversation script about the article:
[Speaker1] What's the key takeaway here?If the user chose a voice, use it. Otherwise, recommend based on mode:
Write the prepared text to a temp file, then call:
# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_prepared.txt \
--voice Charon \
--model flash \
--output /path/to/audio/post-slug.mp3 \
--json
# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_dialogue.txt \
--voice Puck \
--voice2 Kore \
--model pro \
--output /path/to/audio/post-slug-dialogue.mp3 \
--json
Model selection:
flash (default): Fast, cheap. Good for summaries and standard narration.pro: Higher quality. Use for dialogue mode or premium content.Present the result to the user:
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
[audio src="audio/post-slug.mp3"]
Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".
When invoked internally from blog-write:
Input:
text: Prepared text (already cleaned by Claude)voice: Voice name (default: Charon)voice2: Second voice for dialogue (optional)model: flash or prooutput_path: Where to save the fileOutput:
### Audio Narration
- **Path:** /path/to/audio/post-slug.mp3
- **Duration:** 3:42
- **Voice:** Charon
- **Embed:** `<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>`
Graceful fallback: If GOOGLE_AI_API_KEY is not set, return immediately
with no error. The writing workflow continues without audio. Never block
blog-write because audio generation is unavailable.
| Error | Resolution |
|---|---|
| GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey |
| FFmpeg not found | Install: sudo apt install ffmpeg. Falls back to WAV output. |
| Rate limited | Wait and retry. Check limits at https://aistudio.google.com/rate-limit |
| Text too long (>32k tokens) | Split into sections, generate separately |
| Unknown voice name | Run /blog audio voices to see valid options |
| API error | Check key validity, model availability (preview models) |
| API key missing (internal call) | Return silently -- writing workflow continues |
Load on-demand -- do NOT load all at startup:
references/voices.md -- Full 30-voice catalog, recommendations by content type, dialogue pairings