BridgeSpeak
Give agents a voice. Ship audio-first.
A cross-agent skill from BridgeMind that gives Claude Code, Hermes, and OpenClaw agents the ability to speak.
Powered by OpenAI's gpt-realtime-2. Plays on the user's speakers via the system's native audio player.
Why BridgeSpeak?
AI coding agents are getting good enough to work alongside you — but they still only talk through text. When you're walking the dog, driving, cooking, or staring at another monitor, a silent terminal is dead air. A short voice read-out — "Build complete. All 47 tests passed." — turns the agent into a teammate you can actually leave running.
The OpenAI Realtime API (gpt-realtime-2) ships flagship voices (marin, cedar) that finally sound human enough that you stop noticing they're synthesized. But wiring a WebSocket, decoding base64 PCM chunks, wrapping them in a WAV header, and piping that to the right system audio player on macOS / Linux / Windows is a half-day of plumbing every agent author writes from scratch.
BridgeSpeak is that half-day, packaged once. Drop it into any agent that follows the agentskills.io standard. Your agent gains a single capability: bash speak.sh "text". Everything else — auth, transport, audio formats, player detection, headless fallback — is encapsulated.
What's Inside
| Component | Type | What It Does |
|---|
bridgespeak | Skill | The capability. Auto-loaded when the user asks the agent to speak / read aloud / narrate / announce. Tells the agent how to invoke speak.sh, which voice to pick, and when not to speak. |
speak.py | Script | Python WebSocket client. Connects to gpt-realtime-2, streams pcm16 @ 24 kHz mono, wraps as WAV, plays with the system's native player. ~280 lines, no external dependencies beyond websockets. |
speak.sh | Wrapper | POSIX shell entry point for macOS / Linux. Picks Python 3, forwards args. |
speak.ps1 | Wrapper | PowerShell entry point for Windows. |
Install
As a Claude Code plugin
claude plugin install bridgespeak@bridgemind-plugins
Or copy the skill manually (Claude Code, Cursor, Codex, Gemini CLI, …)
# Project-level
mkdir -p .claude/skills
cp -r skills/bridgespeak .claude/skills/
# Personal / global
mkdir -p ~/.claude/skills
cp -r skills/bridgespeak ~/.claude/skills/
Then make the script executable:
chmod +x ~/.claude/skills/bridgespeak/scripts/speak.sh
Hermes (NousResearch)
# Hermes loads skills from skills/<category>/<name>/
mkdir -p ~/.hermes/skills/voice
cp -r skills/bridgespeak ~/.hermes/skills/voice/bridgespeak
cp -r scripts ~/.hermes/skills/voice/bridgespeak/scripts
Hermes will pick up metadata.hermes.required_environment_variables and pass OPENAI_API_KEY through.
OpenClaw
# OpenClaw default skill workspace
mkdir -p ~/.openclaw/workspace/skills
cp -r skills/bridgespeak ~/.openclaw/workspace/skills/bridgespeak
cp -r scripts ~/.openclaw/workspace/skills/bridgespeak/scripts
Or publish to clawdhub: clawdhub publish ./skills/bridgespeak.
Or symlink during development
ln -s "$(pwd)/skills/bridgespeak" ~/.claude/skills/bridgespeak
ln -s "$(pwd)/scripts" ~/.claude/skills/bridgespeak/scripts
Install the runtime dependency
python3 -m pip install --user websockets
That's the only Python dep. Audio playback uses the system's native player (no install needed on macOS; one apt install on Linux if you don't already have paplay / aplay / ffplay; built-in on Windows via Media.SoundPlayer).
Set your OpenAI API key
export OPENAI_API_KEY=sk-...
Or persist to a chmod-600 config file — see api-key-setup.md.
How It Works
One-shot text-to-audio over WebSocket
The script does exactly four things: