voice-stt

Local push-to-talk speech-to-text for Linux. faster-whisper on CUDA, hardware PTT via evdev, Unix-socket fanout to any consumer — including a Claude Code channel that streams dictated transcripts straight into a running Claude Code session.

Architecture

   PTT key (hardware button, hotkey, etc.)
        │
        ▼
   voice-stt-ptt (evdev listener)
        │
        ▼
   voice-stt start/stop  ──►  voice-sttd  (holds model in VRAM, captures mic)
                                    │
                                    ▼
                              OUT_SOCK (Unix socket, line-delimited UTF-8)
                                    │
        ┌───────────────────────────┼───────────────────────────┐
        ▼                           ▼                           ▼
  voice-stt listen           voice-stt type              voice-stt clip
  (stdout, pipe              (xdotool into                (xclip clipboard)
   to anything)               focused window)

The daemon broadcasts each utterance to all connected output clients, so you can run as many consumers in parallel as you want.

One-time setup

Clone the repo anywhere you like — the examples below assume ~/projects/voice-stt:

git clone https://github.com/MaxInertia/claude-voice-input-channel.git ~/projects/voice-stt
cd ~/projects/voice-stt

System packages (Ubuntu/Debian):

# required
sudo apt install libportaudio2

# optional — only if you want the X11-specific consumers and hotkey:
sudo apt install xdotool xclip xbindkeys

libportaudio2 is required by sounddevice to open the mic. The X11 packages are only needed if you want the voice-stt type (xdotool) or voice-stt clip (xclip) consumers, or the keyboard-PTT fallback (xbindkeys). The daemon, PTT listener, Claude Code channel, and the listen consumer all work on any Linux display server without them.

Install uv (if you don't already have it):

curl -LsSf https://astral.sh/uv/install.sh | sh
# new shells pick it up automatically; for the current shell:
export PATH="$HOME/.local/bin:$PATH"

CUDA libs (cuBLAS + cuDNN) are pulled in as Python deps (nvidia-cublas-cu12, nvidia-cudnn-cu12) and dlopen'd at startup by daemon.py, so you do not need system libcudnn or to fiddle with LD_LIBRARY_PATH. You only need a working NVIDIA driver (check with nvidia-smi).

Install the Python dependencies (from the repo root):

uv sync

Configuration is read from a .env file at the repo root. Copy the example and edit if you want to change the defaults (model, compute type, input device, PTT key — all documented inline in the example):

cp .env.example .env
$EDITOR .env

The defaults in .env.example work out of the box for an 8 GB NVIDIA GPU on a modern Linux desktop with PipeWire. You can skip editing .env entirely and the daemon will run with the builtin defaults.

First run of the daemon downloads the model (~1.5GB for medium.en) from HuggingFace into ~/.cache/huggingface/. After that it's fully offline — no audio, transcripts, or telemetry leave the machine.

Run

The scripts/voice-stt-svc helper launches both the daemon and the PTT listener in the background and tears them down again. You can run it directly from the repo:

./scripts/voice-stt-svc start    # launch voice-sttd + voice-stt-ptt (backgrounded)
./scripts/voice-stt-svc status   # show pids / running state
./scripts/voice-stt-svc logs     # tail both log files
./scripts/voice-stt-svc stop     # kill both, clean up sockets
./scripts/voice-stt-svc restart

Optional: if you have a personal bin directory on your PATH (commonly ~/bin or ~/.local/bin), symlink the wrapper into it so you can call it as a bare voice-stt-svc from anywhere:

# example — adjust the target directory to wherever your PATH picks up
# personal binaries (check with: echo $PATH)
ln -sf "$PWD/scripts/voice-stt-svc" ~/.local/bin/voice-stt-svc

Logs land at /tmp/voice-stt-daemon.log and /tmp/voice-stt-ptt.log. There is no autostart on boot — you launch it when you want it.

Once voice-stt-svc start reports both running, hold your configured PTT key and speak. To consume the transcripts, run any consumer in the foreground:

cd ~/projects/voice-stt
uv run voice-stt listen           # stdout
uv run voice-stt type             # type into focused window
uv run voice-stt clip             # copy to clipboard

Push-to-talk hotkey

voice-stt-ptt is a small evdev listener that watches for a chosen key's press/release and calls voice-stt start / voice-stt stop accordingly. Hold the key to dictate; release to transcribe.

Requires read access to /dev/input/event* — add yourself to the input group once:

sudo usermod -aG input $USER
# log out and back in

voice-stt

Popularity

Health & Quality

What's Inside

README

voice-stt

Architecture

One-time setup

Run

Push-to-talk hotkey

Confidence

Similar Plugins

voicemode

deepgram-pack

claudio

transcribe-md

claude-mem

nanobanana

Popularity

Health & Quality

Similar Plugins

voicemode

deepgram-pack

claudio

transcribe-md

claude-mem

nanobanana