From google-gemini-gemini-skills-1
Develops real-time bidirectional audio/video/text streaming apps with Gemini Live API via WebSockets. Handles VAD, native audio, function calling, session management, ephemeral tokens. Covers Python/JS/TS SDKs.
npx claudepluginhub joshuarweaver/cascade-ai-ml-engineering --plugin google-gemini-gemini-skills-1This skill uses the workspace's default tool permissions.
The Live API enables **low-latency, real-time voice and video interactions** with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
thinkingLevel)[!NOTE] The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.
gemini-3.1-flash-live-preview — Optimized for low-latency, real-time dialogue. Native audio output, thinking (via thinkingLevel). 128k context window. This is the recommended model for all Live API use cases.[!WARNING] The following Live API models are deprecated and will be shut down. Migrate to
gemini-3.1-flash-live-preview.
gemini-2.5-flash-native-audio-preview-12-2025— Migrate togemini-3.1-flash-live-preview.gemini-live-2.5-flash-preview— Released June 17, 2025. Shutdown: December 9, 2025.gemini-2.0-flash-live-001— Released April 9, 2025. Shutdown: December 9, 2025.
google-genai — pip install google-genai@google/genai — npm install @google/genai[!WARNING] Legacy SDKs
google-generativeai(Python) and@google/generative-ai(JS) are deprecated. Use the new SDKs above.
To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over WebRTC or WebSockets:
audio/pcm;rate=16000[!IMPORTANT] Use
send_realtime_input/sendRealtimeInputfor all real-time user input (audio, video, and text).send_client_content/sendClientContentis only supported for seeding initial context history (requires settinginitial_history_in_client_contentinhistory_config). Do not use it to send new user messages during the conversation.
[!WARNING] Do not use
mediainsendRealtimeInput. Use the specific keys:audiofor audio data,videofor images/video frames, andtextfor text input.
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });
from google.genai import types
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO],
system_instruction=types.Content(
parts=[types.Part(text="You are a helpful assistant.")]
)
)
async with client.aio.live.connect(model="gemini-3.1-flash-live-preview", config=config) as session:
pass # Session is active
const session = await ai.live.connect({
model: 'gemini-3.1-flash-live-preview',
config: {
responseModalities: ['audio'],
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
},
callbacks: {
onopen: () => console.log('Connected'),
onmessage: (response) => console.log('Message:', response),
onerror: (error) => console.error('Error:', error),
onclose: () => console.log('Closed')
}
});
await session.send_realtime_input(text="Hello, how are you?")
session.sendRealtimeInput({ text: 'Hello, how are you?' });
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)
session.sendRealtimeInput({
audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});
# frame: raw JPEG-encoded bytes
await session.send_realtime_input(
video=types.Blob(data=frame, mime_type="image/jpeg")
)
session.sendRealtimeInput({
video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});
[!IMPORTANT] A single server event can contain multiple content parts simultaneously (e.g., audio chunks and transcript). Always process all parts in each event to avoid missing content.
async for response in session.receive():
content = response.server_content
if content:
# Audio — process ALL parts in each event
if content.model_turn:
for part in content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
# Transcription
if content.input_transcription:
print(f"User: {content.input_transcription.text}")
if content.output_transcription:
print(f"Gemini: {content.output_transcription.text}")
# Interruption
if content.interrupted is True:
pass # Stop playback, clear audio queue
// Inside the onmessage callback
const content = response.serverContent;
if (content?.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data; // Base64 encoded
}
}
}
if (content?.inputTranscription) console.log('User:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* Stop playback, clear audio queue */ }
TEXT or AUDIO per session, not bothWhen migrating from gemini-2.5-flash-native-audio-preview-12-2025 to gemini-3.1-flash-live-preview:
gemini-2.5-flash-native-audio-preview-12-2025 to gemini-3.1-flash-live-preview.thinkingLevel (minimal, low, medium, high) instead of thinkingBudget. Default is minimal for lowest latency.send_client_content is only for seeding initial context history (set initial_history_in_client_content in history_config). Use send_realtime_input for text during conversation.TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO instead of TURN_INCLUDES_ONLY_ACTIVITY. If sending constant video frames, consider sending only during audio activity to reduce costs.send_realtime_input for all real-time user input (audio, video, text). Reserve send_client_content only for seeding initial context historyaudioStreamEnd when the mic is paused to flush cached audioIf the search_documentation tool (from the Google MCP server) is available, use it as your only documentation source:
search_documentation with your query[!IMPORTANT] When MCP tools are present, never fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching.
If no MCP documentation tools are available, fetch from the official docs index:
llms.txt URL: https://ai.google.dev/gemini-api/docs/llms.txt
This index contains links to all documentation pages in .md.txt format. Use web fetch tools to:
llms.txt to discover available documentation pageshttps://ai.google.dev/gemini-api/docs/live-session.md.txt)[!IMPORTANT] Those are not all the documentation pages. Use the
llms.txtindex to discover available documentation pages
The Live API supports 70 languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Hindi, Arabic, Russian, and many more. Native audio models automatically detect and switch languages.