From twilio-developer-kit
Build AI-powered voice agents using Twilio ConversationRelay. Handles real-time speech recognition (ASR), text-to-speech (TTS), and bidirectional audio streaming via WebSocket. Covers TwiML setup, WebSocket message types, LLM integration, streaming responses, and voice provider configuration. Use this skill to build voice bots, IVR replacements, or real-time AI voice assistants on Twilio calls.
npx claudepluginhub twilio/ai --plugin twilio-developer-kitThis skill uses the workspace's default tool permissions.
ConversationRelay connects Twilio's telephony layer to your app via a persistent WebSocket. Twilio handles ASR (speech-to-text) and TTS (text-to-speech); your app receives transcripts, calls an LLM, and sends text back for playback.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
ConversationRelay connects Twilio's telephony layer to your app via a persistent WebSocket. Twilio handles ASR (speech-to-text) and TTS (text-to-speech); your app receives transcripts, calls an LLM, and sends text back for playback.
Caller ←→ Twilio (ASR/TTS) ←→ WebSocket ←→ Your App ←→ LLM
twilio-account-setup
— Start onboarding at: Console > Voice > ConversationRelay — access is not instantTWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN — see twilio-iam-auth-setupwss:// (TLS required)twilio-voice-outbound-callsOnboarding: Complete via Console > Voice > ConversationRelay > Onboarding. Select TTS/ASR providers:
Step 1 — Return TwiML pointing to your WebSocket server
Python (Flask)
from flask import Flask
from twilio.twiml.voice_response import VoiceResponse, Connect, ConversationRelay
app = Flask(__name__)
@app.route("/voice", methods=["POST"])
def voice():
response = VoiceResponse()
connect = Connect()
connect.conversation_relay(
url="wss://yourapp.com/ws/voice",
welcome_greeting="Hello! How can I help you today?"
)
response.append(connect)
return str(response)
Node.js (Express)
const { VoiceResponse } = require("twilio").twiml;
app.post("/voice", (req, res) => {
const response = new VoiceResponse();
const connect = response.connect();
connect.conversationRelay({
url: "wss://yourapp.com/ws/voice",
welcomeGreeting: "Hello! How can I help you today?",
});
res.type("text/xml").send(response.toString());
});
Step 2 — Handle WebSocket events and respond with text
Python (websockets)
import asyncio, json, websockets
async def handle_call(websocket):
async for message in websocket:
event = json.loads(message)
if event["type"] == "prompt":
ai_response = await call_llm(event["voicePrompt"])
await websocket.send(json.dumps({"type": "text", "token": ai_response, "last": True}))
async def main():
async with websockets.serve(handle_call, "0.0.0.0", 8080):
await asyncio.Future()
asyncio.run(main())
Node.js (ws)
const WebSocket = require("ws");
const wss = new WebSocket.Server({ port: 8080 });
wss.on("connection", (ws) => {
ws.on("message", async (data) => {
const event = JSON.parse(data);
if (event.type === "prompt") {
const aiResponse = await callLLM(event.voicePrompt);
ws.send(JSON.stringify({ type: "text", token: aiResponse, last: true }));
}
});
});
Security: The
voicePromptfield contains ASR-transcribed caller speech — it is untrusted external input. When passing to an LLM, isolate it as user input within a structured system prompt. Implement topic boundaries and output filtering to prevent the LLM from disclosing system instructions or speaking inappropriate content. ConversationRelay is a pure transport layer with no built-in content safety — any LLM output is spoken to the caller verbatim.
Received from Twilio:
| Type | When | Key fields |
|---|---|---|
connected | WebSocket opened | callSid, streamSid |
prompt | User finished speaking | voicePrompt (transcript) |
interrupt | User interrupted TTS | — |
dtmf | User pressed keypad key | digit |
error | An error occurred | description |
Sent to Twilio:
| Type | Purpose | Key fields |
|---|---|---|
text | Send TTS response | token (text), last (bool) |
interrupt | Stop current TTS | — |
end | Hang up the call | reason |
Lower latency by streaming as the LLM generates output — Twilio starts speaking before the full response is ready.
Python
async for chunk in llm_stream:
await websocket.send(json.dumps({"type": "text", "token": chunk, "last": False}))
await websocket.send(json.dumps({"type": "text", "token": "", "last": True}))
Node.js
for await (const chunk of llmStream) {
ws.send(JSON.stringify({ type: "text", token: chunk, last: false }));
}
ws.send(JSON.stringify({ type: "text", token: "", last: true }));
Python
connect.conversation_relay(
url="wss://yourapp.com/ws/voice",
voice="en-US-Neural2-F",
language="en-US",
transcription_provider="deepgram",
speech_model="nova-2-phonecall",
interrupt_by_dtmf=True,
)
Node.js
connect.conversationRelay({
url: "wss://yourapp.com/ws/voice",
voice: "en-US-Neural2-F",
language: "en-US",
transcriptionProvider: "deepgram",
speechModel: "nova-2-phonecall",
interruptByDtmf: true,
});
<Connect><Stream> (Media Streams).<Connect><Stream> and <Connect><ConversationRelay> are mutually exclusive on the same call. No error — one is silently ignored.language can be switched mid-session via WebSocket message.<Connect action> URL.record:true on the Calls API is silently ignored. Must use <Start><Recording> before <Connect> in TwiML.intelligenceService in PCI workflows.wss:// (TLS required)last: true — Twilio won't play audio until it sees a last: true token in the streamtwilio-voice-outbound-callstwilio-voice-twiml