Skill

podcast-generation

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

From microsoft-azure-skills

Install

Run in your terminal

npx claudepluginhub jadecli/jadecli-claude-plugins --plugin microsoft-azure-skills

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/acceptance-criteria.md

references/architecture.md

references/code-examples.md

scripts/pcm_to_wav.py

Skill Content

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

138.6k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

Orchestrates subagents to execute phased plans: deploys for implementation, verification, anti-pattern checks, code quality review, and commits only after passing checks.

claude-mem

45.7k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitFeb 9, 2026

Actions

View Source View Plugin View on GitHub View README

podcast-generation

From microsoft-azure-skills

Core Workflow

Backend Audio Generation

from openai import AsyncOpenAI import base64 # Convert HTTPS endpoint to WebSocket URL ws_url = endpoint.replace("https://", "wss://") + "/openai/v1" client = AsyncOpenAI( websocket_base_url=ws_url, api_key=api_key ) audio_chunks = [] transcript_parts = [] async with client.realtime.connect(model="gpt-realtime-mini") as conn: # Configure for audio-only output await conn.session.update(session={ "output_modalities": ["audio"], "instructions": "You are a narrator. Speak naturally." }) # Send text to narrate await conn.conversation.item.create(item={ "type": "message", "role": "user", "content": [{"type": "input_text", "text": prompt}] }) await conn.response.create() # Collect streaming events async for event in conn: if event.type == "response.output_audio.delta": audio_chunks.append(base64.b64decode(event.delta)) elif event.type == "response.output_audio_transcript.delta": transcript_parts.append(event.delta) elif event.type == "response.done": break # Convert PCM to WAV (see scripts/pcm_to_wav.py) pcm_audio = b''.join(audio_chunks) wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)

Frontend Audio Playback

// Convert base64 WAV to playable blob const base64ToBlob = (base64, mimeType) => { const bytes = atob(base64); const arr = new Uint8Array(bytes.length); for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i); return new Blob([arr], { type: mimeType }); }; const audioBlob = base64ToBlob(response.audio_data, 'audio/wav'); const audioUrl = URL.createObjectURL(audioBlob); new Audio(audioUrl).play();

Voice	Character
alloy	Neutral
echo	Warm
fable	Expressive
onyx	Deep
nova	Friendly
shimmer	Clear

Voice

Character

alloy

Neutral

echo

Warm

fable

Expressive

onyx

Deep

nova

Friendly

shimmer

Clear

Core Workflow

Backend Audio Generation

Frontend Audio Playback

Voice	Character
alloy	Neutral
echo	Warm
fable	Expressive
onyx	Deep
nova	Friendly
shimmer	Clear

Voice

Character

alloy

Neutral

echo

Warm

fable

Expressive

onyx

Deep

nova

Friendly

shimmer

Clear

podcast-generation

podcast-generation

Podcast Generation with GPT Realtime Mini

Quick Start

Environment Configuration

Core Workflow

Backend Audio Generation

Frontend Audio Playback

Voice Options

Realtime API Events

Audio Format

References

Podcast Generation with GPT Realtime Mini

Quick Start

Environment Configuration

Core Workflow

Backend Audio Generation

Frontend Audio Playback

Voice Options

Realtime API Events

Audio Format

References