Help us improve
Share bugs, ideas, or general feedback.
From voicemode
Adds custom voices to VoiceMode TTS using local mlx-audio. Clones voices from 3-9s reference clips for impressions or voice=<name> in converse. Manages installation and troubleshooting.
npx claudepluginhub mbailey/voicemode --plugin voicemodeHow this skill is triggered — by the user, by Claude, or both
Slash command
/voicemode:impressionsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Make VoiceMode speak in any voice. The model takes a short reference clip and synthesises fresh speech in that voice via local Qwen3-TTS on top of mlx-audio.
Clones voices via ElevenLabs Instant Voice Cloning pipeline: sourcing reference audio, preparing samples, uploading for IVC, testing with TTS, and tuning settings.
Enables voice conversations with Claude Code using speech-to-text and text-to-speech. Provides MCP tools for speaking, listening, and managing voice service.
Installs and configures VoiceMode MCP server for voice interactions in Claude Code using local Kokoro TTS and Whisper STT, with bash commands for uvx install, MCP addition, and endpoint config.
Share bugs, ideas, or general feedback.
Make VoiceMode speak in any voice. The model takes a short reference clip and synthesises fresh speech in that voice via local Qwen3-TTS on top of mlx-audio.
Status: Preview / experimental. Apple Silicon only. Opt-in.
voice= argument in voicemode:converse doesn't match a known Kokoro voicemlx-audio service# 1. Install the local TTS service (one-time, Apple Silicon only)
voicemode service install mlx-audio
# 2. Add a voice from a reference clip
voicemode clone add fleabag ~/Downloads/fleabag-clip.wav
# 3. Use it
voicemode converse --voice fleabag
In the MCP converse tool, pass voice="fleabag" -- VoiceMode auto-routes any voice that matches a profile in VOICEMODE_VOICES_DIR to mlx-audio instead of Kokoro / OpenAI.
voicemode clone add validates the input before doing any expensive work:
default.wav.If your source is longer than 9 seconds, trim with the same one-liner the runtime error suggests:
ffmpeg -i in.wav -ss 0 -t 8 out.wav
Voices live as directories under ~/.voicemode/voices/<name>/:
~/.voicemode/voices/fleabag/
├── default.wav # required: 3-9s of clean reference audio, mono 24kHz 16-bit PCM
└── voice.md # auto-generated by `voicemode clone add` -- name, source, duration, format, transcript
voice.md carries YAML front matter with name, source (original input path), duration_seconds, format (literal mono 24kHz 16-bit PCM, loudnorm I=-16 TP=-1.5 LRA=11), and transcript. It documents what the clip is and where it came from.
voices.json at the voices root is retained as a legacy index -- voicemode clone add writes an entry pointing at <name>/default.wav so older consumers keep working. Prefer the directory layout above for new work.
Multiple WAVs are allowed alongside default.wav; symlink whichever one is "active" to default.wav. A directory with multiple WAVs and no default.wav is treated as a sample bin and skipped.
5-9 seconds of clean conversational speech beats 30 seconds of noisy podcast audio. The model copies what it hears -- including hum, music beds, and laugh tracks. See docs/finding-samples.md for ranking heuristics, an mlx-whisper word-timestamp ranker concept, and ffmpeg loudnorm recipes.
| Variable | Default | Purpose |
|---|---|---|
VOICEMODE_VOICES_DIR | ~/.voicemode/voices | Where voice profiles live |
VOICEMODE_REMOTE_VOICES_DIR | (unset) | Path on remote mlx-audio host (path translation) |
VOICEMODE_MLX_AUDIO_BASE_URL | http://127.0.0.1:8890/v1 | OpenAI-compatible mlx-audio endpoint |
VOICEMODE_IMPRESSIONS_MODEL | mlx-community/Qwen3-TTS-12Hz-1.7B-Base-bf16 | Hugging Face model ID |
The unreleased 8.7.0 candidate used VOICEMODE_CLONE_* names. They're honoured in 8.7.x with a one-shot deprecation warning and removed in 8.8.0:
| Deprecated | Use instead |
|---|---|
VOICEMODE_CLONE_BASE_URL | VOICEMODE_MLX_AUDIO_BASE_URL |
VOICEMODE_CLONE_MODEL | VOICEMODE_IMPRESSIONS_MODEL |
VOICEMODE_CLONE_PORT | VOICEMODE_MLX_AUDIO_PORT |
If you see those in a user's voicemode.env, suggest updating them.
af_sky (or any other Kokoro voice name) shadows the Kokoro voice. Pick distinctive names like fleabag, mike-2026, bryan_morning.