Skill

openrouter-stt

Transcribes speech to text via OpenRouter's API using curl or bash scripts. Encodes audio as base64 JSON input for models like google/chirp-3 or openai/whisper-1; returns transcript and usage in JSON.

Bash

OpenAI

ai-ml

cli-tools

npx claudepluginhub openrouterteam/skills --plugin openrouter

Popularity

Stars

143

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/openrouter:openrouter-stt

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Transcribe audio via `POST /api/v1/audio/transcriptions` using `curl`. Requires `OPENROUTER_API_KEY` (get one at https://openrouter.ai/keys). If unset, stop and ask.

Supporting Files

README.md

SKILL.md

203 lines · ~1.9k tokens

Similar Skills

deepgram-hello-world

2.2k

Creates minimal Deepgram speech-to-text examples in TypeScript/Node.js and Python for transcribing audio URLs or local files. Use for quick starts, API testing, or setup validation.

1 file3 tools

deepgram-pack

stt-integration

ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.

17 files4 tools

elevenlabs

openrouter-tts

143

Generates speech audio from text via OpenRouter's OpenAI-compatible TTS API using curl or SDKs. Lists models/voices; outputs MP3/PCM bytes for voiceovers, narration, audiobooks.

1 file

openrouter

Stats

LanguageTypeScript

Stars143

Forks14

MaintenanceExcellent

Last CommitMay 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

OpenRouter Speech-to-Text

Transcribe audio via POST /api/v1/audio/transcriptions using curl. Requires OPENROUTER_API_KEY (get one at https://openrouter.ai/keys). If unset, stop and ask.

This endpoint is not OpenAI-compatible. The body is JSON with base64 audio under input_audio: { data, format } — not multipart/form-data with a file field the way OpenAI's /v1/audio/transcriptions works. Do not point the OpenAI SDK at this endpoint; it will send the wrong shape. Use curl, fetch, or requests directly.

One call, JSON back

Both request and response are JSON. The response body carries:

text — the transcript.
usage — always includes cost. Providers additionally report either seconds of audio billed or a token breakdown (total_tokens, input_tokens, output_tokens), depending on how they price the request. Don't assume both are present.

Sample response (duration-priced provider, e.g. google/chirp-3):

{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}

Sample response (token-priced provider):

{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}

Drop-in workflow

#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"

# --data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).
http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  --output "$BODY" \
  -w '%{http_code}' \
  --data-binary @"$PAYLOAD")

if [[ "$http_code" != "200" ]]; then
  echo "STT failed (HTTP $http_code):" >&2
  cat "$BODY" >&2
  rm -f "$BODY" "$PAYLOAD"
  exit 1
fi

jq -r '.text' "$BODY"
rm -f "$BODY" "$PAYLOAD"

Discovering STT models

Filter the models endpoint by output modality to list transcription models.

curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'

Models are provider-namespaced — use the full slug (google/chirp-3, openai/whisper-1, openai/whisper-large-v3), not the short name.

Parameters

Field	Required	Notes
`model`	yes	Full model slug from `/api/v1/models?output_modalities=transcription`.
`input_audio.data`	yes	Base64-encoded raw audio bytes. Not a data URI — just the base64 payload, no `data:audio/...;base64,` prefix.
`input_audio.format`	yes	`wav`, `mp3`, `flac`, `m4a`, `ogg`, `webm`, or `aac`. Must match the actual bytes. Support varies by provider.
`language`	no	ISO-639-1 code (`en`, `ja`, `fr`). Auto-detected if omitted.
`temperature`	no	0–1. Lower is more deterministic.
`provider`	no	Provider passthrough — see below.

Picking an audio format

wav / flac — uncompressed or lossless. Highest quality; largest uploads.
mp3 / m4a / aac — compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.
webm / ogg — typical for browser recordings (MediaRecorder).

The format field must match the actual container/codec of the bytes. A file saved as .wav that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with ffprobe <file>.

Provider-specific options

Provider passthrough goes under provider.options.<slug> and is only forwarded when that provider handles the request. Example — Groq's prompt for vocabulary hinting:

{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}

Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.

TypeScript (fetch)

import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);

Python (requests)

import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])

Troubleshooting

Garbled or empty text — the format field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with ffprobe audio.wav.

400 with "Invalid base64" or silent failure — data must be just base64, not a data URI (data:audio/wav;base64,...). Strip the prefix if you copied it from a browser FileReader.

400 with a ZodError — a required field is missing or the wrong type. The body looks like {"success":false,"error":{"name":"ZodError","message":"[...]"}} — the nested message JSON string names the bad path (commonly input_audio.data or input_audio.format).

413 / request too large — base64 inflates bytes by ~33%, so a large raw file becomes an even larger JSON payload. Use a smaller source file (compressed format, lower sample rate, or trimmed clip).

Model not found — use the full slug from /api/v1/models?output_modalities=transcription (google/chirp-3, not chirp-3).

openrouter-stt

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

openrouter-stt

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

OpenRouter Speech-to-Text

One call, JSON back

Drop-in workflow

Discovering STT models

Parameters

Picking an audio format

Provider-specific options

TypeScript (fetch)

Python (requests)

Troubleshooting

References

Similar Skills

Help us improve

OpenRouter Speech-to-Text

One call, JSON back

Drop-in workflow

Discovering STT models

Parameters

Picking an audio format

Provider-specific options

TypeScript (fetch)

Python (requests)

Troubleshooting

References