Help us improve
Share bugs, ideas, or general feedback.
From livekit-agents-py
Build LiveKit Agent backends in Python. Use this skill when creating voice AI agents, voice assistants, or any realtime AI application using LiveKit's Python Agents SDK (livekit-agents). Covers AgentSession, Agent class, function tools, STT/LLM/TTS models, turn detection, and multi-agent workflows.
npx claudepluginhub codestackr/livekit-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/livekit-agents-py:skills/agents-pyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build voice AI agents with LiveKit's Python Agents SDK.
Build voice AI agents with LiveKit Agents SDK using Cloud Inference or self-hosted setups and lk CLI. Includes checklists for credentials, docs, and testing.
Builds real-time voice AI applications and agents using OpenAI Realtime API, Vapi, Deepgram for transcription, ElevenLabs for synthesis, LiveKit, and WebRTC fundamentals. Optimizes latency and audio quality.
Builds ElevenLabs conversational AI voice agents: configure via CLI/dashboard, add tools/knowledge, integrate React/React Native/Swift/JS SDKs, test/deploy. For voice AI, phone systems, or ElevenLabs errors.
Share bugs, ideas, or general feedback.
Build voice AI agents with LiveKit's Python Agents SDK.
This skill works alongside the LiveKit MCP server, which provides direct access to the latest LiveKit documentation, code examples, and changelogs. Use these tools when you need up-to-date information that may have changed since this skill was created.
Available MCP tools:
docs_search - Search the LiveKit docs siteget_pages - Fetch specific documentation pages by pathget_changelog - Get recent releases and updates for LiveKit packagescode_search - Search LiveKit repositories for code examplesget_python_agent_example - Browse 100+ Python agent examplesWhen to use MCP tools:
When to use local references:
Use MCP tools and local references together for the best experience.
Consult these resources as needed:
uv add "livekit-agents[silero,turn-detector]~=1.3" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"
Use the LiveKit CLI to load your credentials into a .env.local file:
lk app env -w
Or manually create a .env.local file:
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
load_dotenv(".env.local")
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful voice AI assistant.
Keep responses concise, 1-3 sentences. No markdown or emojis.""",
)
server = AgentServer()
@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
stt="assemblyai/universal-streaming:en",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_options=room_io.RoomOptions(
audio_input=room_io.AudioInputOptions(
noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
else noise_cancellation.BVC(),
),
),
)
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(server)
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import openai, noise_cancellation
load_dotenv(".env.local")
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="You are a helpful voice AI assistant."
)
server = AgentServer()
@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
llm=openai.realtime.RealtimeModel(voice="coral")
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_options=room_io.RoomOptions(
audio_input=room_io.AudioInputOptions(
noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
else noise_cancellation.BVC(),
),
),
)
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(server)
Define agent behavior by subclassing Agent:
from livekit.agents import Agent, function_tool
class MyAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions="Your system prompt here",
)
async def on_enter(self) -> None:
"""Called when agent becomes active."""
await self.session.generate_reply(
instructions="Greet the user"
)
async def on_exit(self) -> None:
"""Called before agent hands off to another agent."""
pass
@function_tool()
async def my_tool(self, param: str) -> str:
"""Tool description for the LLM."""
return f"Result: {param}"
The session orchestrates the voice pipeline:
session = AgentSession(
stt="assemblyai/universal-streaming:en",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-3:voice_id",
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
Key methods:
session.start(room, agent) - Start the sessionsession.say(text) - Speak text directlysession.generate_reply(instructions) - Generate LLM responsesession.interrupt() - Stop current speechsession.update_agent(new_agent) - Switch to different agentUse the @function_tool decorator:
from livekit.agents import function_tool, RunContext
@function_tool()
async def get_weather(self, context: RunContext, location: str) -> str:
"""Get the current weather for a location."""
return f"Weather in {location}: Sunny, 72°F"
# Development mode with auto-reload
uv run agent.py dev
# Console mode (local testing)
uv run agent.py console
# Production mode
uv run agent.py start
# Download required model files
uv run agent.py download-files
Use model strings for simple configuration without API keys:
STT (Speech-to-Text):
"assemblyai/universal-streaming:en" - AssemblyAI streaming"deepgram/nova-3:en" - Deepgram Nova"cartesia/ink" - Cartesia STTLLM (Large Language Model):
"openai/gpt-4.1-mini" - GPT-4.1 mini (recommended)"openai/gpt-4.1" - GPT-4.1"openai/gpt-5" - GPT-5"gemini/gemini-3-flash" - Gemini 3 Flash"gemini/gemini-2.5-flash" - Gemini 2.5 FlashTTS (Text-to-Speech):
"cartesia/sonic-3:{voice_id}" - Cartesia Sonic 3"elevenlabs/eleven_turbo_v2_5:{voice_id}" - ElevenLabs"deepgram/aura:{voice}" - Deepgram Auralk app env -w to load LiveKit Cloud credentials into your environment.