Enables natural voice conversations in Claude Code via STT/TTS. Invoke with voicemode:converse MCP tool or auto-activates on voice mentions.
From voicemodenpx claudepluginhub mbailey/voicemode --plugin voicemodeThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
If VoiceMode isn't working or MCP fails to connect, run:
/voicemode:install
After install, reconnect MCP: /mcp → select voicemode → "Reconnect" (or restart Claude Code).
Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).
Note: The Python package is voice-mode (hyphen), but the CLI command is voicemode (no hyphen).
| Task | Use | Why |
|---|---|---|
| Voice conversations | MCP voicemode:converse | Faster - server already running |
| Service start/stop | MCP voicemode:service | Works within Claude Code |
| Installation | CLI voice-mode-install | One-time setup |
| Configuration | CLI voicemode config | Edit settings directly |
| Diagnostics | CLI voicemode diag | Administrative tasks |
Use the converse MCP tool to speak to users and hear their responses:
# Speak and listen for response (most common usage)
voicemode:converse("Hello! What would you like to work on?")
# Speak without waiting (for narration while working)
voicemode:converse("Searching the codebase now...", wait_for_response=False)
For most conversations, just pass your message - defaults handle everything else.
Use default converse tool parameters unless there's a good reason not to. Timing parameters (listen_duration_max, listen_duration_min) use smart defaults with silence detection - don't override unless the user requests it or you see a clear need. Defaults are configurable by the user via ~/.voicemode/voicemode.env.
| Parameter | Default | Description |
|---|---|---|
message | required | Text to speak |
wait_for_response | true | Listen after speaking |
voice | auto | TTS voice |
For all parameters, see Converse Parameters.
wait_for_response=False when announcing actionsWhen performing actions during a voice conversation, use parallel tool calls to eliminate dead air. Send the voice message and the action in the same turn so they execute concurrently.
# FAST: One turn — voice and action fire simultaneously
# Turn 1: speak (fire-and-forget) + do the work (all parallel)
voicemode:converse("Checking that now.", wait_for_response=False)
bash("git status")
Agent(prompt="Research X", run_in_background=True)
# Turn 2: speak the results (with listening)
voicemode:converse("Here's what I found: ...", wait_for_response=True)
# SLOW: Two turns — unnecessary sequential delay
# Turn 1: speak
voicemode:converse("Checking that now.", wait_for_response=False)
# Turn 2: do the work
bash("git status")
# Turn 3: speak results
voicemode:converse("Here's what I found: ...", wait_for_response=True)
| Scenario | Approach | Why |
|---|---|---|
| Announce + do work | Parallel | No dependency between speech and action |
| Announce + spawn agent | Parallel | Agent runs in background anyway |
| Check result then report | Sequential | Need result before speaking |
| Listen for response | Sequential | wait_for_response=True blocks until user finishes |
wait_for_response=False for the speak call when combining with other toolsWhen the user asks you to wait or give them time:
Short pauses (up to 60 seconds): If the user says something ending with "wait" (e.g., "hang on", "give me a sec", "wait"), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.
Longer pauses (2+ minutes): Use bash sleep N where N is seconds. For example, if the user says "give me 5 minutes":
sleep 300 # Wait 5 minutes
Then call converse again when the wait is over:
voicemode:converse("Five minutes is up. Ready when you are.")
Configuration: The short pause duration is configurable via VOICEMODE_WAIT_DURATION (default: 60 seconds).
If Whisper STT fails but the audio was recorded successfully, you can manually transcribe the saved audio file:
# Transcribe the most recent recording
whisper-cli ~/.voicemode/audio/latest-STT.wav
# Or check if file exists first (safe for inclusion in automation)
if [ -f ~/.voicemode/audio/latest-STT.wav ]; then
whisper-cli ~/.voicemode/audio/latest-STT.wav
fi
Requirements:
VOICEMODE_SAVE_AUDIO=true in ~/.voicemode/voicemode.envVOICEMODE_SAVE_ALL=true (saves all audio and transcriptions)VOICEMODE_DEBUG=true (enables debug mode with audio saving)How it works:
~/.voicemode/audio/ with timestampslatest-STT.wav symlink always points to the most recent recordingWhen to use:
See also: Troubleshooting - No Speech Detected
voicemode service status # All services
voicemode service status whisper # Specific service
Shows service status including running state, ports, and health.
# Install VoiceMode CLI and configure services
uvx voice-mode-install --yes
# Install local services (Apple Silicon recommended)
voicemode service install whisper
voicemode service install kokoro
See Getting Started for detailed steps.
# Start/stop services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")
# View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
| Service | Port | Purpose |
|---|---|---|
| whisper | 2022 | Speech-to-text |
| kokoro | 8880 | Text-to-speech |
| voicemode | 8765 | HTTP/SSE server |
Actions: status, start, stop, restart, logs, enable, disable
voicemode config list # Show all settings
voicemode config set VOICEMODE_TTS_VOICE nova # Set default voice
voicemode config edit # Edit config file
Config file: ~/.voicemode/voicemode.env
See Configuration Guide for all options.
Background music during VoiceMode sessions with track-level control.
# Core playback
voicemode dj play /path/to/music.mp3 # Play a file or URL
voicemode dj status # What's playing
voicemode dj pause # Pause playback
voicemode dj resume # Resume playback
voicemode dj stop # Stop playback
# Navigation and volume
voicemode dj next # Skip to next chapter
voicemode dj prev # Go to previous chapter
voicemode dj volume 30 # Set volume to 30%
# Music For Programming
voicemode dj mfp list # List available episodes
voicemode dj mfp play 49 # Play episode 49
voicemode dj mfp sync # Convert CUE files to chapters
# Music library
voicemode dj find "daft punk" # Search library
voicemode dj library scan # Index ~/Audio/music
voicemode dj library stats # Show library info
# Play history and favorites
voicemode dj history # Show recent plays
voicemode dj favorite # Toggle favorite on current track
Configuration: Set VOICEMODE_DJ_VOLUME in ~/.voicemode/voicemode.env to customize startup volume (default: 50%).
# Service management
voicemode service status # All services
voicemode service start whisper # Start a service
voicemode service logs kokoro # View logs
# Diagnostics
voicemode deps # Check dependencies
voicemode diag info # System info
voicemode diag devices # Audio devices
# DJ Mode
voicemode dj play <file|url> # Start playback
voicemode dj status # What's playing
voicemode dj next/prev # Navigate chapters
voicemode dj stop # Stop playback
voicemode dj mfp play 49 # Music For Programming
Transfer voice conversations between Claude Code agents for multi-agent workflows.
Use cases:
# 1. Announce the transfer
voicemode:converse("Transferring you to a project agent.", wait_for_response=False)
# 2. Spawn with voice instructions (mechanism depends on your setup)
spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user")
# 3. Go quiet - let new agent take over
Hand-back:
voicemode:converse("Transferring you back to the assistant.", wait_for_response=False)
# Stop conversing, exit or go idle
See Call Routing for comprehensive guides:
Expose local Whisper (STT) and Kokoro (TTS) to other devices on your Tailnet via HTTPS.
*.ts.net domains# Expose TTS (Kokoro on port 8880)
tailscale serve --bg --set-path /v1/audio/speech http://localhost:8880/v1/audio/speech
# Expose STT (Whisper on port 2022)
tailscale serve --bg --set-path /v1/audio/transcriptions http://localhost:2022/v1/audio/transcriptions
# Verify configuration
tailscale serve status
# Reset all serve config
tailscale serve reset
After setup, endpoints are available at:
https://<hostname>.<tailnet>.ts.net/v1/audio/speechhttps://<hostname>.<tailnet>.ts.net/v1/audio/transcriptionshttps://app.voicemode.dev originsIn the VoiceMode Connect web app settings (app.voicemode.dev/settings), set:
https://<hostname>.<tailnet>.ts.nethttps://<hostname>.<tailnet>.ts.netAudio feedback tones that play during Claude Code tool use. Toggle with voicemode soundfonts on/off. See Soundfonts Guide.
| Topic | Link |
|---|---|
| Converse Parameters | All Parameters |
| Installation | Getting Started |
| Configuration | Configuration Guide |
| Claude Code Plugin | Plugin Guide |
| Whisper STT | Whisper Setup |
| Kokoro TTS | Kokoro Setup |
| Pronunciation | Pronunciation Guide |
| Troubleshooting | Troubleshooting |
| Soundfonts | Soundfonts Guide |
| CLI Reference | CLI Docs |
| DJ Mode | Background Music |