Use this skill when the user asks about Amazon Nova Sonic, Nova 2 Sonic, AWS speech-to-speech, Bedrock Realtime audio, Bedrock bidirectional streaming, InvokeModelWithBidirectionalStream, or wants to build voice apps with Amazon's Nova Sonic model. Provides complete API reference for the @aws-sdk/client-bedrock-runtime bidirectional streaming, audio configuration, voice selection, and tool calling.
From computernpx claudepluginhub chendren/computerThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
Real-time speech-to-speech via Amazon Bedrock's bidirectional HTTP/2 streaming API.
| Model | Model ID | Notes |
|---|---|---|
| Nova Sonic v1 | amazon.nova-sonic-v1:0 | Original, 11 voices |
| Nova 2 Sonic | amazon.nova-2-sonic-v1:0 | Latest, 16+ voices, polyglot |
import {
BedrockRuntimeClient,
InvokeModelWithBidirectionalStreamCommand,
} from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
Package: @aws-sdk/client-bedrock-runtime v3.787+
Auth: AWS credentials (IAM, env vars, profiles — NOT API keys)
Nova Sonic uses HTTP/2 bidirectional streaming — NOT WebSocket.
The request body is an AsyncIterable<chunk> that yields JSON events.
The response is also a stream of events.
Client ──AsyncIterable──→ Bedrock (HTTP/2) ──Stream──→ Client
│ sessionStart │ contentStart
│ audioInputEvent (base64 PCM) │ audioOutput (base64 PCM)
│ textInput │ textOutput
│ toolResult │ toolUse
│ sessionEnd │ completionEnd
async function* createRequestStream() {
// First event: configure the session
yield {
chunk: {
bytes: Buffer.from(JSON.stringify({
event: {
sessionStart: {
inferenceConfiguration: {
maxTokens: 1024,
topP: 0.9,
temperature: 0.7,
},
audioInputConfiguration: {
mediaType: "audio/lpcm",
sampleRateHertz: 16000,
sampleSizeBits: 16,
channelCount: 1,
audioType: "SPEECH",
encoding: "base64",
},
audioOutputConfiguration: {
mediaType: "audio/lpcm",
sampleRateHertz: 24000,
sampleSizeBits: 16,
channelCount: 1,
voiceId: "matthew",
encoding: "base64",
audioType: "SPEECH",
},
toolUse: {
tools: [/* tool definitions */],
toolChoice: { auto: {} },
},
systemPrompt: [{ text: "You are a helpful assistant." }],
},
},
})),
},
};
}
// Stream audio as base64-encoded PCM chunks (16kHz, Int16 LE, mono)
yield {
chunk: {
bytes: Buffer.from(JSON.stringify({
event: {
audioInputEvent: {
audioChunk: pcmBuffer.toString('base64'),
},
},
})),
},
};
yield {
chunk: {
bytes: Buffer.from(JSON.stringify({
event: {
textInput: {
text: "Hello, what time is it?",
},
},
})),
},
};
yield {
chunk: {
bytes: Buffer.from(JSON.stringify({
event: {
toolResult: {
toolUseId: "tool-call-id",
content: [{ text: JSON.stringify(result) }],
status: "success",
},
},
})),
},
};
yield {
chunk: {
bytes: Buffer.from(JSON.stringify({
event: { sessionEnd: {} },
})),
},
};
const command = new InvokeModelWithBidirectionalStreamCommand({
modelId: "amazon.nova-2-sonic-v1:0",
body: createRequestStream(), // AsyncIterable
});
const response = await client.send(command);
for await (const event of response.body) {
if (event.chunk?.bytes) {
const msg = JSON.parse(Buffer.from(event.chunk.bytes).toString());
if (msg.event?.audioOutput) {
// Base64-encoded PCM audio chunk (24kHz, Int16 LE, mono)
const pcm = Buffer.from(msg.event.audioOutput.audioChunk, 'base64');
// Play or buffer
}
if (msg.event?.textOutput) {
console.log('Text:', msg.event.textOutput.text);
}
if (msg.event?.toolUse) {
// Model wants to call a tool
const { toolName, toolUseId, content } = msg.event.toolUse;
// Execute tool, then yield toolResult event
}
if (msg.event?.contentStart) {
// New content block starting (role info)
}
if (msg.event?.contentEnd) {
// Content block finished
}
if (msg.event?.completionEnd) {
// Full response complete
}
}
}
| Language | Feminine | Masculine |
|---|---|---|
| English (US) | tiffany | matthew |
| English (GB) | amy | — |
| French | ambre | florian |
| Italian | beatrice | lorenzo |
| German | greta | lennart |
| Spanish | lupe | carlos |
All v1 voices plus additional voices across 8 languages. Polyglot support: Tiffany can speak all languages in a single conversation.
Default recommendation: tiffany (polyglot, versatile) or matthew (clear, authoritative).
| Direction | Format | Sample Rate | Encoding |
|---|---|---|---|
| Input | PCM 16-bit LE mono | 16 kHz | base64 audio/lpcm |
| Output | PCM 16-bit LE mono | 24 kHz | base64 audio/lpcm |
Note: Same rates as Gemini (16kHz in, 24kHz out). Different from OpenAI (24kHz both).
{
tools: [{
toolSpec: {
name: "get_time",
description: "Get the current time and date.",
inputSchema: {
json: {
type: "object",
properties: {},
},
},
},
}],
toolChoice: { auto: {} }, // or { any: {} } or { tool: { name: "..." } }
}
| Limit | Value |
|---|---|
| Max session duration | 8 minutes |
| Context window | 300K tokens |
| Audio chunk size | 1024 samples |
| Continuation | Reconnect with chat history |
| Aspect | Nova Sonic | Gemini Live | OpenAI Realtime |
|---|---|---|---|
| Protocol | HTTP/2 bidi stream | WebSocket | WebSocket |
| SDK | @aws-sdk/client-bedrock-runtime | @google/genai | openai/realtime/ws |
| Auth | AWS IAM credentials | API key | API key |
| Audio in rate | 16kHz | 16kHz | 24kHz |
| Audio out rate | 24kHz | 24kHz | 24kHz |
| Body format | AsyncIterable of JSON events | SDK methods | JSON messages |
| VAD | Built-in | Explicit signals | Built-in semantic |
| Session limit | 8 min | 15 min | 30 min |
| Tool calling | toolUse/toolResult events | sendToolResponse() | conversation.item.create |
| Voices | 16 (polyglot) | 30 | 13 |