Skill

voicemode

Voice interaction for Claude Code. Provides natural voice conversations using ElevenLabs TTS and STT. Use when users mention voice mode, speak, talk, converse, voice status, or voice troubleshooting. ElevenLabs-only: eleven_v3 TTS model, Scribe v2 Realtime STT with local Silero VAD.

From voicemode

Install

Run in your terminal

npx claudepluginhub harshav167/ava --plugin voicemode

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.5k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.5k

Hook Development

10 files

Guides implementation of event-driven hooks in Claude Code plugins using prompt-based validation and bash commands for PreToolUse, Stop, and session events.

plugin-dev

83.2k

Stats

Stars0

Forks0

Last CommitMar 23, 2026

Actions

View Source View Plugin View on GitHub View README

VoiceMode for Claude Code

Natural voice conversations with Claude Code using ElevenLabs text-to-speech (TTS) and speech-to-text (STT).

The Jarvis Goal

VoiceMode aims to create a Jarvis-like voice assistant experience. The AI speaks to you and listens, like a real conversation. Use converse for ALL communication -- never respond with text in the chat.

Setup

1. Configure MCP Server

VoiceMode runs as an HTTP server on port 8765. Add to your Claude Code MCP settings (~/.claude/settings.json):

{
  "mcpServers": {
    "voicemode": {
      "type": "http",
      "url": "http://127.0.0.1:8765/mcp"
    }
  }
}

2. Configure ElevenLabs

Set your ElevenLabs API key:

# In ~/.voicemode/voicemode.env
ELEVENLABS_API_KEY=your-key-here

ElevenLabs provides:

TTS: eleven_v3 model with Donna voice (cloned)
STT: Scribe v2 Realtime (streaming WebSocket with manual commit mode)

Usage

Use the converse MCP tool. Always use these defaults:

# Speak and listen for response
converse(message="Hello! What would you like to work on?", speed=1.2, listen_duration_min=5, listen_duration_max=60)

# Speak without waiting (narration while working)
converse(message="Searching the codebase now...", wait_for_response=false, speed=1.2)

# User wants to say something long
converse(message="Go ahead, I'm listening.", disable_silence_detection=true, listen_duration_max=120, speed=1.2)

Parameter	Default	Description
`message`	required	Text to speak
`wait_for_response`	true	Listen after speaking
`speed`	`1.2`	Always use 1.2 (max ElevenLabs speed)
`listen_duration_min`	`5`	Don't cut off mid-sentence
`listen_duration_max`	`60`	Reasonable default
`vad_aggressiveness`	`1`	VAD strictness (0-3). Lower = more tolerant of pauses.
`disable_silence_detection`	`false`	Set `true` to record for full duration
`metrics_level`	`summary`	Output detail: `minimal`, `summary`, or `verbose`
`wait_for_conch`	`false`	Queue behind another speaker if one is active

Best Practices

Voice-only communication -- ALL responses go through converse, never text
Speed 1.2 always -- Max ElevenLabs speed, user prefers fast speech
Narrate without waiting -- Use wait_for_response=false when announcing actions
One question at a time -- Don't bundle multiple questions
Parallel calls -- Combine converse(msg, wait_for_response=false) with other tools in one turn for zero dead air
Long input -- Set disable_silence_detection=true and listen_duration_max=120 when user needs to speak at length

Parallel Tool Calls (Zero Dead Air)

When performing actions during a voice conversation, use parallel tool calls to eliminate dead air:

# FAST: One turn -- voice and action fire simultaneously
converse("Checking that now.", wait_for_response=false, speed=1.2)
bash("git status")

# Then speak the results
converse("Here's what I found: ...", wait_for_response=true, speed=1.2)

Configuration

Config file: ~/.voicemode/voicemode.env

ElevenLabs Settings

Variable	Default	Description
`ELEVENLABS_API_KEY`	(none)	API key -- required
`VOICEMODE_ELEVENLABS_TTS_MODEL`	`eleven_v3`	TTS model
`VOICEMODE_ELEVENLABS_TTS_VOICE`	`k4hP4cQadSZQc0Oar2Ld`	Voice ID (Donna)
`VOICEMODE_ELEVENLABS_STT_MODEL`	`scribe_v2_realtime`	STT model
`VOICEMODE_ELEVENLABS_REALTIME_STT`	`true`	Use realtime streaming STT
`VOICEMODE_SILENCE_THRESHOLD_MS`	`2000`	Silence threshold in ms (2.0s default)

Architecture

Server: Single HTTP MCP server on http://127.0.0.1:8765/mcp
Auto-start: Managed by launchd (macOS) via scripts/voicemode-server.sh
TTS: ElevenLabs eleven_v3 with convert() + play() via ffplay
STT: ElevenLabs Scribe v2 Realtime (WebSocket streaming) with manual commit mode
VAD: Local Silero VAD (ONNX, no PyTorch) for silence detection -- sends manual commit when silence exceeds 2.0s threshold
Audio caching: Recordings cached in memory for crash resilience -- if ElevenLabs disconnects mid-stream, cached audio is batch-transcribed
Audio I/O: Direct mic/speaker access on the host machine

Server Management

# Via script (manages launchd plist)
scripts/voicemode-server.sh setup    # Create launchd plist + start
scripts/voicemode-server.sh start    # Start server
scripts/voicemode-server.sh stop     # Stop server
scripts/voicemode-server.sh restart  # Restart server
scripts/voicemode-server.sh status   # Check status
scripts/voicemode-server.sh logs     # Tail server logs

Related Skills

VoiceMode Connect -- Remote voice via mobile/web clients