From ac-tools
Installs and configures VoiceMode MCP server for voice interactions in Claude Code using local Kokoro TTS and Whisper STT, with bash commands for uvx install, MCP addition, and endpoint config.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ac-tools:setup-voice-modeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Install and configure VoiceMode MCP for voice interactions with Claude Code.
Install and configure VoiceMode MCP for voice interactions with Claude Code.
uvx voice-mode-install --yes
claude mcp add --scope user voicemode -- uvx --refresh voice-mode
voicemode config set VOICEMODE_TTS_BASE_URLS http://127.0.0.1:8880/v1
voicemode config set VOICEMODE_STT_BASE_URLS http://127.0.0.1:2022/v1
voicemode config set VOICEMODE_PREFER_LOCAL true
voicemode config set VOICEMODE_ALWAYS_TRY_LOCAL true
This is critical. Without explicit _BASE_URLS, the default includes https://api.openai.com/v1 as fallback, which crashes with OPENAI_API_KEY errors even when local services are running.
claude mcp list
mcp__voicemode__converse toolKokoro TTS may take 5+ minutes to load on first run while it downloads and initializes the model (~111MB). Check status with:
voicemode service kokoro status
Two MCP restarts required:
Without the second restart, you may get "OpenAI API key" errors even with local config.
Edit config with:
voicemode config edit
List all options:
voicemode config list
| Setting | Description |
|---|---|
VOICEMODE_PREFER_LOCAL | Prefer local providers over cloud (true/false) |
VOICEMODE_ALWAYS_TRY_LOCAL | Always attempt local providers first (true/false) |
VOICEMODE_SAVE_AUDIO | Save audio files (true/false, default: false) |
VOICEMODE_WHISPER_MODEL | Whisper model (tiny, base, small, medium, large-v2) |
VOICEMODE_KOKORO_DEFAULT_VOICE | Default voice (e.g., af_sky) |
OPENAI_API_KEY | Required only for cloud processing |
VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1 and VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1 (no API key needed)OPENAI_API_KEY and set URLs to https://api.openai.com/v1OPENAI_API_KEY and set URLs to http://127.0.0.1:8880/v1,https://api.openai.com/v1 (TTS) and http://127.0.0.1:2022/v1,https://api.openai.com/v1 (STT)VOICEMODE_TTS_BASE_URLS and VOICEMODE_STT_BASE_URLS point to local endpoints only (step 3). The PREFER_LOCAL flag alone is NOT sufficient — it does not remove OpenAI from the fallback chainvoicemode service kokoro logsThe default tiny model is fast but less accurate. For better transcription:
| Model | Size | Accuracy | Speed |
|---|---|---|---|
| tiny | 75MB | ~70% | Fastest |
| small | 466MB | ~82% | Fast |
| medium | 1.4GB | ~88% | Moderate |
voicemode config set VOICEMODE_WHISPER_MODEL small
# or for best accuracy:
voicemode config set VOICEMODE_WHISPER_MODEL medium
Restart Whisper service after changing:
voicemode service whisper restart
For significantly faster transcription on Apple Silicon, convert Whisper to Core ML:
# Install whisper.cpp via Homebrew
brew install whisper-cpp
# Set Whisper directory
WHISPER_DIR=~/.voicemode/services/whisper
1. Download model
cd $WHISPER_DIR/models
./download-ggml-model.sh medium
2. Install Python dependencies
pip3 install torch coremltools openai-whisper ane_transformers
3. Convert to Core ML
cd $WHISPER_DIR
./models/generate-coreml-model.sh medium
4. Update config
voicemode config set VOICEMODE_WHISPER_MODEL medium
5. Restart Whisper
voicemode service whisper restart
# Check Core ML model exists
ls -la $WHISPER_DIR/models/ggml-medium-encoder.mlmodelc
When running, logs should show: GPU: Metal, Core ML: Enabled
npx claudepluginhub waterplanai/agentic-config --plugin ac-toolsEnables voice conversations with Claude Code using speech-to-text and text-to-speech. Includes setup, diagnostics, and MCP-based voice interaction.
Switches voice transcription from OpenAI Whisper API to local whisper.cpp on Apple Silicon. Currently WhatsApp-only. Requires voice-transcription skill first.
Sets up and tests text-to-speech and transcription backends (sag, OpenAI TTS, macOS say, Whisper). Run `/agent:voice` for status, setup, or test.