Help us improve
Share bugs, ideas, or general feedback.
From dora-skills
Builds audio processing pipelines with dora-rs: STT (Whisper), TTS (Kokoro), VAD, microphone capture, and full voice assistant dataflows.
npx claudepluginhub zhanghandong/dora-skills --plugin dora-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/dora-skills:domain-audioThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Building audio and speech applications with dora-rs
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Guides systematic root-cause debugging when tests fail, builds break, or unexpected errors occur. Provides a structured triage checklist to preserve evidence, localize, and fix issues instead of guessing.
Share bugs, ideas, or general feedback.
Building audio and speech applications with dora-rs
Dora supports audio processing pipelines including:
- id: stt
build: pip install dora-distil-whisper
path: dora-distil-whisper
inputs:
audio: microphone/audio
outputs:
- text
env:
MODEL: base # tiny, base, small, medium, large
LANGUAGE: en
- id: tts
build: pip install dora-kokoro-tts
path: dora-kokoro-tts
inputs:
text: llm/response
outputs:
- audio
env:
VOICE: af_bella
- id: vad
build: pip install dora-vad
path: dora-vad
inputs:
audio: microphone/audio
outputs:
- speech_segments
- is_speaking
nodes:
# Microphone input
- id: microphone
build: pip install dora-microphone
path: dora-microphone
inputs:
tick: dora/timer/millis/100
outputs:
- audio
env:
SAMPLE_RATE: "16000"
CHANNELS: "1"
# Voice activity detection
- id: vad
build: pip install dora-vad
path: dora-vad
inputs:
audio: microphone/audio
outputs:
- speech_audio
- is_speaking
# Speech to text
- id: stt
build: pip install dora-distil-whisper
path: dora-distil-whisper
inputs:
audio: vad/speech_audio
outputs:
- text
env:
MODEL: base
# Language model
- id: llm
build: pip install dora-qwen
path: dora-qwen
inputs:
text: stt/text
outputs:
- response
# Text to speech
- id: tts
build: pip install dora-kokoro-tts
path: dora-kokoro-tts
inputs:
text: llm/response
outputs:
- audio
# Audio playback
- id: speaker
build: pip install dora-pyaudio
path: dora-pyaudio
inputs:
audio: tts/audio
# microphone_node.py
import numpy as np
import sounddevice as sd
from dora import Node
node = Node()
# Audio parameters
SAMPLE_RATE = 16000
CHANNELS = 1
CHUNK_SIZE = 1024
def audio_callback(indata, frames, time, status):
if status:
print(status)
# Store audio data
audio_buffer.append(indata.copy())
audio_buffer = []
with sd.InputStream(
samplerate=SAMPLE_RATE,
channels=CHANNELS,
callback=audio_callback,
blocksize=CHUNK_SIZE
):
for event in node:
if event["type"] == "INPUT" and event["id"] == "tick":
if audio_buffer:
# Combine and send audio chunks
audio = np.concatenate(audio_buffer, axis=0)
audio_buffer.clear()
node.send_output("audio", audio.flatten())
elif event["type"] == "STOP":
break
# stt_node.py
import numpy as np
import whisper
from dora import Node
node = Node()
model = whisper.load_model("base")
for event in node:
if event["type"] == "INPUT" and event["id"] == "audio":
audio = event["value"].astype(np.float32) / 32768.0
# Transcribe
result = model.transcribe(audio, fp16=False)
text = result["text"].strip()
if text:
node.send_output("text", [text])
elif event["type"] == "STOP":
break
# tts_node.py
import numpy as np
from dora import Node
from TTS.api import TTS
node = Node()
tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC")
for event in node:
if event["type"] == "INPUT" and event["id"] == "text":
text = event["value"][0] if isinstance(event["value"], list) else str(event["value"])
# Generate speech
wav = tts.tts(text)
audio = np.array(wav, dtype=np.float32)
node.send_output("audio", audio)
elif event["type"] == "STOP":
break
# Audio as numpy array
# - Shape: (samples,) or (samples, channels)
# - dtype: np.float32 or np.int16
# - Sample rate: typically 16000 or 44100
# 16-bit integer audio
audio_int16 = np.array([...], dtype=np.int16)
# Float32 audio (normalized to [-1, 1])
audio_float32 = audio_int16.astype(np.float32) / 32768.0
Use appropriate chunk sizes
Use VAD to reduce processing
Choose appropriate model sizes
Buffer audio for batch processing
inputs:
audio:
source: microphone/audio
queue_size: 10 # Buffer multiple chunks
| Node | Package | Purpose |
|---|---|---|
| dora-microphone | pip install dora-microphone | Audio input |
| dora-pyaudio | pip install dora-pyaudio | Audio output |
| dora-distil-whisper | pip install dora-distil-whisper | Speech-to-text |
| dora-kokoro-tts | pip install dora-kokoro-tts | Text-to-speech |
| dora-vad | pip install dora-vad | Voice activity |
| dora-outtetts | pip install dora-outtetts | TTS alternative |