Installs OpenAI Whisper transcription plugin for NanoTars to convert voice notes from WhatsApp, Telegram, Discord channels into agent-readable text.
npx claudepluginhub terrifiedbug/nanotars-skills --plugin nanotars-transcriptionThis skill uses the workspace's default tool permissions.
Automatic voice message transcription via OpenAI's Whisper API. When users send voice notes on any channel (WhatsApp, Telegram, Discord), the transcription hook converts them to text before the agent sees the message.
Adds OpenAI Whisper API transcription to ClaudeClaw WhatsApp channel. Downloads and transcribes voice notes into text as [Voice: <transcript>] for agent responses.
Sets up and tests agent voice backends for TTS (sag/ElevenLabs, OpenAI, macOS say) and STT (whisper-cli, OpenAI Whisper). Dispatches on /agent:voice commands and phrases like 'speak this' or 'transcribe audio'.
Transcribes Telegram and WhatsApp voice messages locally on Apple Silicon Macs using MLX Whisper. Free, private replacement for paid APIs like OpenAI.
Share bugs, ideas, or general feedback.
Automatic voice message transcription via OpenAI's Whisper API. When users send voice notes on any channel (WhatsApp, Telegram, Discord), the transcription hook converts them to text before the agent sees the message.
Works with any channel plugin that sets mediaType='audio' and mediaHostPath on inbound messages.
Before installing, verify NanoTars is set up:
[ -d node_modules ] && echo "DEPS: ok" || echo "DEPS: missing"
docker image inspect nanoclaw-agent:latest &>/dev/null && echo "IMAGE: ok" || echo "IMAGE: not built"
if grep -q "ANTHROPIC_API_KEY\|CLAUDE_CODE_OAUTH_TOKEN" .env 2>/dev/null || [ -f "$HOME/.claude/.credentials.json" ]; then echo "AUTH: ok"; else echo "AUTH: missing"; fi
If any check fails, tell the user to run /nanotars-setup first and stop.
grep "^OPENAI_API_KEY=" .env 2>/dev/null && echo "KEY_SET" || echo "KEY_MISSING"
[ -d plugins/transcription ] && echo "PLUGIN_EXISTS" || echo "PLUGIN_MISSING"
If already configured, ask the user if they want to reconfigure or just verify.
Use the AskUserQuestion tool to present this:
You'll need an OpenAI API key for Whisper transcription.
Get one at: https://platform.openai.com/api-keys
Cost:
$0.006 per minute of audio ($0.003 per typical 30-second voice note)
Wait for user to provide their API key.
# Remove existing line if present
sed -i '/^OPENAI_API_KEY=/d' .env
# Add the new key
echo "OPENAI_API_KEY=THE_KEY_HERE" >> .env
cp -r ${CLAUDE_PLUGIN_ROOT}/files/ plugins/transcription/
The plugin loader handles npm install automatically on next startup (dependencies: true in manifest).
By default this plugin is available to all groups and channel types. To restrict access, edit plugins/transcription/plugin.json and set:
"groups" to specific group folder names (e.g., ["main"]) instead of ["*"]"channels" to specific channel types (e.g., ["whatsapp"]) instead of ["*"]Ask the user if they want to restrict access. Most users will keep the defaults.
npm run build
nanotars restart # or launchctl on macOS
Tell the user:
Voice transcription is ready! Test it by sending a voice note in any registered chat.
Voice messages appear to the agent as:
[Voice: <transcribed text>]
Watch for transcription in the logs:
tail -f logs/nanotars.log | grep -i "voice\|transcri"
OPENAI_API_KEY is set in .env and has creditsmediaType/mediaHostPath on audio messagesPer-group credential overrides: Not applicable. Transcription is a system-wide service that processes all inbound audio.
rm -rf plugins/transcription/sed -i '/^OPENAI_API_KEY=/d' .env