From claudeclaw
Adds OpenAI Whisper API transcription to ClaudeClaw WhatsApp channel. Downloads and transcribes voice notes into text as [Voice: <transcript>] for agent responses.
npx claudepluginhub sbusso/claudeclawThis skill uses the workspace's default tool permissions.
This skill adds automatic voice message transcription to ClaudeClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as `[Voice: <transcript>]`.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
This skill adds automatic voice message transcription to ClaudeClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].
Check if src/transcription.ts exists. If it does, skip to Phase 3 (Configure). The code changes are already in place.
Use AskUserQuestion to collect information:
AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?
If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.
Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.
git remote -v
If whatsapp is missing, add it:
git remote add whatsapp https://github.com/qwibitai/claudeclaw-whatsapp.git
git fetch whatsapp skill/voice-transcription
git merge whatsapp/skill/voice-transcription || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
This merges in:
src/transcription.ts (voice transcription module using OpenAI Whisper)src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)src/channels/whatsapp.test.tsopenai npm dependency in package.jsonOPENAI_API_KEY in .env.exampleIf the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
npm install --legacy-peer-deps
npm run build
npx vitest run src/channels/whatsapp.test.ts
All tests must pass and build must be clean before proceeding.
If the user doesn't have an API key:
I need you to create an OpenAI API key:
- Go to https://platform.openai.com/api-keys
- Click "Create new secret key"
- Give it a name (e.g., "ClaudeClaw Transcription")
- Copy the key (starts with
sk-)Cost:
$0.006 per minute of audio ($0.003 per typical 30-second voice note)
Wait for the user to provide the key.
Add to .env:
OPENAI_API_KEY=<their-key>
Sync to container environment:
mkdir -p data/env && cp .env data/env/env
The container reads environment from data/env/env, not .env directly.
Service name: Derived from the directory name:
com.claudeclaw.<dirname>(macOS) /claudeclaw-<dirname>(Linux). For example, if cwd ismy-assistant, the service iscom.claudeclaw.my-assistant. Determine the correct service name before running service commands below.
npm run build
launchctl kickstart -k gui/$(id -u)/com.claudeclaw # macOS
# Linux: systemctl --user restart claudeclaw
Tell the user:
Send a voice note in any registered WhatsApp chat. The agent should receive it as
[Voice: <transcript>]and respond to its content.
tail -f logs/claudeclaw.log | grep -i voice
Look for:
Transcribed voice message — successful transcription with character countOPENAI_API_KEY not set — key missing from .envOpenAI transcription failed — API error (check key validity, billing)Failed to download audio message — media download issueOPENAI_API_KEY is set in .env AND synced to data/env/envcurl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200Check logs for the specific error. Common causes:
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.