From zoom-skills
Provides guidance for accessing live audio, video, transcript, chat, and screen share from Zoom meetings via WebSocket RTMS protocol. For backend AI/ML apps, transcription, streaming, and analysis.
npx claudepluginhub zoom/skills --plugin zoom-skillsThis skill uses the workspace's default tool permissions.
Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, Video SDK sessions, and Zoom Contact Center Voice in real-time. RTMS uses a WebSocket-based protocol with open standards and does not require a meeting bot to capture the media plane.
RUNBOOK.mdconcepts/connection-architecture.mdconcepts/lifecycle-flow.mdexamples/ai-integration.mdexamples/manual-websocket.mdexamples/rtms-bot.mdexamples/sdk-quickstart.mdreferences/connection.mdreferences/data-types.mdreferences/environment-variables.mdreferences/media-types.mdreferences/quickstart.mdreferences/webhooks.mdtroubleshooting/common-issues.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Provides ClickHouse patterns for MergeTree schemas, query optimization, aggregations, window functions, joins, and data ingestion for high-performance analytics.
Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, Video SDK sessions, and Zoom Contact Center Voice in real-time. RTMS uses a WebSocket-based protocol with open standards and does not require a meeting bot to capture the media plane.
RTMS is primarily a backend media ingestion service.
Optional architecture (common):
Use RTMS for media/data plane, and use frontend frameworks/Zoom Apps for presentation + user interactions.
Official Documentation: https://developers.zoom.us/docs/rtms/ SDK Reference (JS): https://zoom.github.io/rtms/js/ SDK Reference (Python): https://zoom.github.io/rtms/py/ Sample Repository: https://github.com/zoom/rtms-samples
New to RTMS? Follow this path:
Complete Implementation:
Reference:
Having issues?
| Product | Webhook Event | Payload ID | App Type |
|---|---|---|---|
| Meetings | meeting.rtms_started / meeting.rtms_stopped | meeting_uuid | General App |
| Webinars | webinar.rtms_started / webinar.rtms_stopped | meeting_uuid (same!) | General App |
| Video SDK | session.rtms_started / session.rtms_stopped | session_id | Video SDK App |
| Zoom Contact Center Voice | Product-specific RTMS/ZCC Voice events | Product-specific stream/session identifiers | Contact Center / approved RTMS integration |
Once connected, the core signaling/media socket model is shared across products. Meetings, webinars, and Video SDK sessions use the familiar start/stop webhooks. Zoom Contact Center Voice adds its own RTMS/ZCC Voice event family and should be treated as the same transport model with product-specific event payloads.
RTMS is a data pipeline that gives your app access to live media from Zoom meetings, webinars, and Video SDK sessions without participant bots. Instead of having automated clients join meetings, use RTMS to collect media data directly from Zoom's infrastructure.
| Media Type | Format | Use Cases |
|---|---|---|
| Audio | PCM (L16), G.711, G.722, Opus | Transcription, voice analysis, recording |
| Video | H.264, JPG, PNG | Recording, AI vision, thumbnails, active participant selection |
| Screen Share | H.264, JPG, PNG | Content capture, slide extraction |
| Transcript | JSON text | Meeting notes, search, compliance |
| Chat | JSON text | Archive, sentiment analysis |
src_language and enable_lid. Default behavior is LID enabled. Set enable_lid: false to force a fixed language.data_opt is set to VIDEO_SINGLE_INDIVIDUAL_STREAM.STREAM_CLOSE_REQ over the signaling socket and wait for STREAM_CLOSE_RESP.| Approach | Best For | Complexity |
|---|---|---|
SDK (@zoom/rtms) | Most use cases | Low - handles WebSocket complexity |
| Manual WebSocket | Custom protocols, other languages | High - full protocol implementation |
Need RTMS access? Post in Zoom Developer Forum requesting RTMS access with your use case.
import rtms from "@zoom/rtms";
// All RTMS start/stop events across products
const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
// Handle webhook events
rtms.onWebhookEvent(({ event, payload }) => {
if (!RTMS_EVENTS.includes(event)) return;
const client = new rtms.Client();
client.onAudioData((data, timestamp, metadata) => {
console.log(`Audio from ${metadata.userName}: ${data.length} bytes`);
});
client.onTranscriptData((data, timestamp, metadata) => {
const text = data.toString('utf8');
console.log(`${metadata.userName}: ${text}`);
});
client.onJoinConfirm((reason) => {
console.log(`Joined session: ${reason}`);
});
// SDK handles all WebSocket connections automatically
// Accepts both meeting_uuid and session_id transparently
client.join(payload);
});
For full control or non-SDK languages, implement the two-phase WebSocket protocol:
const WebSocket = require('ws');
const crypto = require('crypto');
const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];
// 1. Generate signature
// For meetings/webinars: uses meeting_uuid. For Video SDK: uses session_id.
function generateSignature(clientId, idValue, streamId, clientSecret) {
const message = `${clientId},${idValue},${streamId}`;
return crypto.createHmac('sha256', clientSecret).update(message).digest('hex');
}
// 2. Handle webhook
app.post('/webhook', (req, res) => {
res.status(200).send(); // CRITICAL: Respond immediately!
const { event, payload } = req.body;
if (RTMS_EVENTS.includes(event)) {
connectToRTMS(payload);
}
});
// 3. Connect to signaling WebSocket
function connectToRTMS(payload) {
const { server_urls, rtms_stream_id } = payload;
// meeting_uuid for meetings/webinars, session_id for Video SDK
const idValue = payload.meeting_uuid || payload.session_id;
const signature = generateSignature(CLIENT_ID, idValue, rtms_stream_id, CLIENT_SECRET);
const signalingWs = new WebSocket(server_urls);
signalingWs.on('open', () => {
signalingWs.send(JSON.stringify({
msg_type: 1, // Handshake request
protocol_version: 1,
meeting_uuid: idValue,
rtms_stream_id,
signature,
media_type: 9 // AUDIO(1) | TRANSCRIPT(8)
}));
});
// ... handle responses, connect to media WebSocket
}
See: Manual WebSocket Guide for complete implementation.
Combine types with bitwise OR:
| Type | Value | Description |
|---|---|---|
| Audio | 1 | PCM audio samples |
| Video | 2 | H.264/JPG video frames |
| Screen Share | 4 | Separate from video! |
| Transcript | 8 | Real-time speech-to-text |
| Chat | 16 | In-meeting chat messages |
| All | 32 | All media types |
Example: Audio + Transcript = 1 | 8 = 9
| Issue | Solution |
|---|---|
| Only 1 connection allowed | New connections kick out existing ones. Track active sessions! |
| Respond 200 immediately | If webhook delays, Zoom retries creating duplicate connections |
| Heartbeat mandatory | Respond to msg_type 12 with msg_type 13, or connection dies |
| Reconnection is YOUR job | RTMS doesn't auto-reconnect. Media keep-alive tolerance is now about 65s; signaling remains around 60s |
| Transcript language drift | Use src_language plus enable_lid: false when you want fixed-language transcription instead of automatic language switching |
| Audio params must match the support matrix | Do not mix arbitrary content_type, codec, sample_rate, channel, and send_rate. The media server rejects unsupported combinations |
| Single participant video only | VIDEO_SINGLE_INDIVIDUAL_STREAM supports one participant at a time. A new VIDEO_SUBSCRIPTION_REQ overrides the previous selection |
| Graceful close is explicit now | Use STREAM_CLOSE_REQ / STREAM_CLOSE_RESP when your backend wants to terminate the stream cleanly |
# Required - Authentication
ZM_RTMS_CLIENT=your_client_id # Zoom OAuth Client ID
ZM_RTMS_SECRET=your_client_secret # Zoom OAuth Client Secret
# Optional - Webhook server
ZM_RTMS_PORT=8080 # Default: 8080
ZM_RTMS_PATH=/webhook # Default: /
# Optional - Logging
ZM_RTMS_LOG_LEVEL=info # error, warn, info, debug, trace
ZM_RTMS_LOG_FORMAT=progressive # progressive or json
ZM_RTMS_LOG_ENABLED=true
ZOOM_CLIENT_ID=your_client_id
ZOOM_CLIENT_SECRET=your_client_secret
ZOOM_SECRET_TOKEN=your_webhook_token # For webhook validation
meeting.rtms_startedmeeting.rtms_stoppedwebinar.rtms_started (if using webinars)webinar.rtms_stopped (if using webinars)meeting:read:meeting_audiomeeting:read:meeting_videomeeting:read:meeting_transcriptmeeting:read:meeting_chatwebinar:read:webinar_audio (if using webinars)webinar:read:webinar_video (if using webinars)webinar:read:webinar_transcript (if using webinars)webinar:read:webinar_chat (if using webinars)session.rtms_startedsession.rtms_stopped| Repository | Description |
|---|---|
| rtms-samples | RTMSManager, boilerplates, AI samples |
| rtms-quickstart-js | JavaScript SDK quickstart |
| rtms-quickstart-py | Python SDK quickstart |
| rtms-sdk-cpp | C++ SDK |
| zoom-rtms | Main SDK repository |
| Sample | Description |
|---|---|
| rtms-meeting-assistant-starter-kit | AI meeting assistant with summaries |
| arlo-meeting-assistant | Production meeting assistant with DB |
| videosdk-rtms-transcribe-audio | Whisper transcription |
Need help? Start with Integrated Index section below for complete navigation.
This section was migrated from SKILL.md.
RTMS provides real-time access to live audio, video, transcript, chat, and screen share from Zoom meetings, webinars, and Video SDK sessions.
Treat RTMS as a backend service for receiving and processing media streams.
Do not model RTMS as a frontend-only SDK.
If you're new to RTMS, follow this order:
Run preflight checks first -> RUNBOOK.md
Understand the architecture -> concepts/connection-architecture.md
Choose your approach -> SDK or Manual
Understand the lifecycle -> concepts/lifecycle-flow.md
Configure media types -> references/media-types.md
Troubleshoot issues -> troubleshooting/common-issues.md
rtms/
├── SKILL.md # Main skill overview
├── SKILL.md # This file - navigation guide
│
├── concepts/ # Core architectural patterns
│ ├── connection-architecture.md # Two-phase WebSocket design
│ └── lifecycle-flow.md # Webhook to streaming flow
│
├── examples/ # Complete working code
│ ├── sdk-quickstart.md # Using @zoom/rtms SDK
│ ├── manual-websocket.md # Raw protocol implementation
│ ├── rtms-bot.md # Complete RTMS bot implementation
│ └── ai-integration.md # Transcription and analysis
│
├── references/ # Reference documentation
│ ├── media-types.md # Audio, video, transcript, chat, share
│ ├── data-types.md # All enums and constants
│ ├── connection.md # WebSocket protocol details
│ └── webhooks.md # Event subscription
│
└── troubleshooting/ # Problem solving guides
└── common-issues.md # FAQ and solutions
meeting.rtms_started. Uses General App with OAuth.webinar.rtms_started. Payload still uses meeting_uuid (NOT webinar_uuid).session.rtms_started. Payload uses session_id (NOT meeting_uuid).concepts/connection-architecture.md
RTMS uses two separate WebSocket connections:
examples/sdk-quickstart.md vs examples/manual-websocket.md
| SDK | Manual |
|---|---|
| Handles WebSocket complexity | Full protocol control |
| Automatic reconnection | DIY reconnection |
| Less code | More code |
| Best for most use cases | Best for custom requirements |
troubleshooting/common-issues.md
Two-Phase WebSocket Design
Webhook Response Timing
Heartbeat is Mandatory
Signature Generation
HMAC-SHA256(clientSecret, "clientId,meetingUuid,streamId")session_id in place of meetingUuidmeeting_uuid (not webinar_uuid)Media Types are Bitmasks
Screen Share is SEPARATE from Video
-> Media Types - Check configuration
-> Data Types
Based on Zoom RTMS SDK v1.x and official documentation as of 2026.
Happy coding!
Remember: Start with SDK Quickstart for the fastest path, or Manual WebSocket if you need full control.