Skill

azure-ai-voicelive-ts

Builds real-time bidirectional voice AI applications using Azure AI Voice Live SDK in JavaScript/TypeScript for Node.js and browsers.

JavaScript

Popularity

Parent stars

37,902

Parent forks

6,202

Shared by

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/antigravity-awesome-skills:azure-ai-voicelive-ts

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.

SKILL.md

475 lines · ~3.3k tokens

Stats

LanguagePython

Parent stars37,902

Parent forks6,202

MaintenanceExcellent

Last CommitMay 16, 2026

Actions

View Source View Plugin View on GitHub View README

@azure/ai-voicelive (JavaScript/TypeScript)

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.

Installation

npm install @azure/ai-voicelive @azure/identity
# TypeScript users
npm install @types/node

Current Version: 1.0.0-beta.3

Supported Environments:

Node.js LTS versions (20+)
Modern browsers (Chrome, Firefox, Safari, Edge)

Environment Variables

AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>
# Optional: Logging
AZURE_LOG_LEVEL=info

Authentication

Microsoft Entra ID (Recommended)

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";

const client = new VoiceLiveClient(endpoint, credential);

API Key

import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");

const client = new VoiceLiveClient(endpoint, credential);

Client Hierarchy

VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
    ├── updateSession()      → Configure session options
    ├── subscribe()          → Event handlers (Azure SDK pattern)
    ├── sendAudio()          → Stream audio input
    ├── addConversationItem() → Add messages/function outputs
    └── sendEvent()          → Send raw protocol events

Quick Start

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;

// Create client and start session
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Configure session
await session.updateSession({
  modalities: ["text", "audio"],
  instructions: "You are a helpful AI assistant. Respond naturally.",
  voice: {
    type: "azure-standard",
    name: "en-US-AvaNeural",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

// Subscribe to events
const subscription = session.subscribe({
  onResponseAudioDelta: async (event, context) => {
    // Handle streaming audio output
    const audioData = event.delta;
    playAudioChunk(audioData);
  },
  onResponseTextDelta: async (event, context) => {
    // Handle streaming text
    process.stdout.write(event.delta);
  },
  onInputAudioTranscriptionCompleted: async (event, context) => {
    console.log("User said:", event.transcript);
  },
});

// Send audio from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
  session.sendAudio(audioBuffer);
}

Session Configuration

await session.updateSession({
  // Modalities
  modalities: ["audio", "text"],
  
  // System instructions
  instructions: "You are a customer service representative.",
  
  // Voice selection
  voice: {
    type: "azure-standard",  // or "azure-custom", "openai"
    name: "en-US-AvaNeural",
  },
  
  // Turn detection (VAD)
  turnDetection: {
    type: "server_vad",      // or "azure_semantic_vad"
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  
  // Audio formats
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
  
  // Tools (function calling)
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" }
        },
        required: ["location"]
      }
    }
  ],
  toolChoice: "auto",
});

Event Handling (Azure SDK Pattern)

The SDK uses a subscription-based event handling pattern:

const subscription = session.subscribe({
  // Connection lifecycle
  onConnected: async (args, context) => {
    console.log("Connected:", args.connectionId);
  },
  onDisconnected: async (args, context) => {
    console.log("Disconnected:", args.code, args.reason);
  },
  onError: async (args, context) => {
    console.error("Error:", args.error.message);
  },
  
  // Session events
  onSessionCreated: async (event, context) => {
    console.log("Session created:", context.sessionId);
  },
  onSessionUpdated: async (event, context) => {
    console.log("Session updated");
  },
  
  // Audio input events (VAD)
  onInputAudioBufferSpeechStarted: async (event, context) => {
    console.log("Speech started at:", event.audioStartMs);
  },
  onInputAudioBufferSpeechStopped: async (event, context) => {
    console.log("Speech stopped at:", event.audioEndMs);
  },
  
  // Transcription events
  onConversationItemInputAudioTranscriptionCompleted: async (event, context) => {
    console.log("User said:", event.transcript);
  },
  onConversationItemInputAudioTranscriptionDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  
  // Response events
  onResponseCreated: async (event, context) => {
    console.log("Response started");
  },
  onResponseDone: async (event, context) => {
    console.log("Response complete");
  },
  
  // Streaming text
  onResponseTextDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  onResponseTextDone: async (event, context) => {
    console.log("\n--- Text complete ---");
  },
  
  // Streaming audio
  onResponseAudioDelta: async (event, context) => {
    const audioData = event.delta;
    playAudioChunk(audioData);
  },
  onResponseAudioDone: async (event, context) => {
    console.log("Audio complete");
  },
  
  // Audio transcript (what assistant said)
  onResponseAudioTranscriptDelta: async (event, context) => {
    process.stdout.write(event.delta);
  },
  
  // Function calling
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const result = await getWeather(args.location);
      
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(result),
      });
      
      await session.sendEvent({ type: "response.create" });
    }
  },
  
  // Catch-all for debugging
  onServerEvent: async (event, context) => {
    console.log("Event:", event.type);
  },
});

// Clean up when done
await subscription.close();

Function Calling

// Define tools in session config
await session.updateSession({
  modalities: ["audio", "text"],
  instructions: "Help users with weather information.",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City and state or country",
          },
        },
        required: ["location"],
      },
    },
  ],
  toolChoice: "auto",
});

// Handle function calls
const subscription = session.subscribe({
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const weatherData = await fetchWeather(args.location);
      
      // Send function result
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(weatherData),
      });
      
      // Trigger response generation
      await session.sendEvent({ type: "response.create" });
    }
  },
});

Voice Options

Voice Type	Config	Example
Azure Standard	`{ type: "azure-standard", name: "..." }`	`"en-US-AvaNeural"`
Azure Custom	`{ type: "azure-custom", name: "...", endpointId: "..." }`	Custom voice endpoint
Azure Personal	`{ type: "azure-personal", speakerProfileId: "..." }`	Personal voice clone
OpenAI	`{ type: "openai", name: "..." }`	`"alloy"`, `"echo"`, `"shimmer"`

Supported Models

Model	Description	Use Case
`gpt-4o-realtime-preview`	GPT-4o with real-time audio	High-quality conversational AI
`gpt-4o-mini-realtime-preview`	Lightweight GPT-4o	Fast, efficient interactions
`phi4-mm-realtime`	Phi multimodal	Cost-effective applications

Turn Detection Options

// Server VAD (default)
turnDetection: {
  type: "server_vad",
  threshold: 0.5,
  prefixPaddingMs: 300,
  silenceDurationMs: 500,
}

// Azure Semantic VAD (smarter detection)
turnDetection: {
  type: "azure_semantic_vad",
}

// Azure Semantic VAD (English optimized)
turnDetection: {
  type: "azure_semantic_vad_en",
}

// Azure Semantic VAD (Multilingual)
turnDetection: {
  type: "azure_semantic_vad_multilingual",
}

Audio Formats

Format	Sample Rate	Use Case
`pcm16`	24kHz	Default, high quality
`pcm16-8000hz`	8kHz	Telephony
`pcm16-16000hz`	16kHz	Voice assistants
`g711_ulaw`	8kHz	Telephony (US)
`g711_alaw`	8kHz	Telephony (EU)

Key Types Reference

Type	Purpose
`VoiceLiveClient`	Main client for creating sessions
`VoiceLiveSession`	Active WebSocket session
`VoiceLiveSessionHandlers`	Event handler interface
`VoiceLiveSubscription`	Active event subscription
`ConnectionContext`	Context for connection events
`SessionContext`	Context for session events
`ServerEventUnion`	Union of all server events

Error Handling

import {
  VoiceLiveError,
  VoiceLiveConnectionError,
  VoiceLiveAuthenticationError,
  VoiceLiveProtocolError,
} from "@azure/ai-voicelive";

const subscription = session.subscribe({
  onError: async (args, context) => {
    const { error } = args;
    
    if (error instanceof VoiceLiveConnectionError) {
      console.error("Connection error:", error.message);
    } else if (error instanceof VoiceLiveAuthenticationError) {
      console.error("Auth error:", error.message);
    } else if (error instanceof VoiceLiveProtocolError) {
      console.error("Protocol error:", error.message);
    }
  },
  
  onServerError: async (event, context) => {
    console.error("Server error:", event.error?.message);
  },
});

Logging

import { setLogLevel } from "@azure/logger";

// Enable verbose logging
setLogLevel("info");

// Or via environment variable
// AZURE_LOG_LEVEL=info

Browser Usage

// Browser requires bundler (Vite, webpack, etc.)
import { VoiceLiveClient } from "@azure/ai-voicelive";
import { InteractiveBrowserCredential } from "@azure/identity";

// Use browser-compatible credential
const credential = new InteractiveBrowserCredential({
  clientId: "your-client-id",
  tenantId: "your-tenant-id",
});

const client = new VoiceLiveClient(endpoint, credential);

// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });

// Process audio and send to session
// ... (see samples for full implementation)

Best Practices

Always use DefaultAzureCredential — Never hardcode API keys
Set both modalities — Include ["text", "audio"] for voice assistants
Use Azure Semantic VAD — Better turn detection than basic server VAD
Handle all error types — Connection, auth, and protocol errors
Clean up subscriptions — Call subscription.close() when done
Use appropriate audio format — PCM16 at 24kHz for best quality

Reference Links

Resource	URL
npm Package	https://www.npmjs.com/package/@azure/ai-voicelive
GitHub Source	https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive
Samples	https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples
API Reference	https://learn.microsoft.com/javascript/api/@azure/ai-voicelive

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

Limitations

Use this skill only when the task clearly matches the scope described above.
Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.

azure-ai-voicelive-ts

Popularity

Invocation

Context Preview

SKILL.md

azure-ai-voicelive-ts

Popularity

Invocation

Context Preview

SKILL.md

@azure/ai-voicelive (JavaScript/TypeScript)

Installation

Environment Variables

Authentication

Microsoft Entra ID (Recommended)

API Key

Client Hierarchy

Quick Start

Session Configuration

Event Handling (Azure SDK Pattern)

Function Calling

Voice Options

Supported Models

Turn Detection Options

Audio Formats

Key Types Reference

Error Handling

Logging

Browser Usage

Best Practices

Reference Links

When to Use

Limitations

Similar Skills

@azure/ai-voicelive (JavaScript/TypeScript)

Installation

Environment Variables

Authentication

Microsoft Entra ID (Recommended)

API Key

Client Hierarchy

Quick Start

Session Configuration

Event Handling (Azure SDK Pattern)

Function Calling

Voice Options

Supported Models

Turn Detection Options

Audio Formats

Key Types Reference

Error Handling

Logging

Browser Usage

Best Practices

Reference Links

When to Use

Limitations

Similar Skills