Skill

venice-audio-transcription

Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.

npx claudepluginhub veniceai/skills

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/venice:venice-audio-transcription

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

`POST /api/v1/audio/transcriptions` takes an audio file and returns text. It's OpenAI-compatible with `multipart/form-data` — the OpenAI SDK's `audio.transcriptions.create()` works unchanged.

SKILL.md

107 lines · ~1.2k tokens

Similar Skills

lai-transcribe

Transcribe audio/video to timestamped captions with Gemini (100+ languages) or local Parakeet / SenseVoice models. Trigger on "transcribe", "speech to text", "转录", "语音转文字", "generate captions from audio", or when the user provides an audio/video file with no text. If the YouTube video already has captions, prefer `/lai-youtube`.

2 tools

lattifai-skills

deepgram-performance-tuning

2.2k

Optimizes Deepgram transcription via ffmpeg preprocessing, model selection, streaming, parallel processing, and Redis caching for lower latency and higher throughput.

1 file5 tools

deepgram-pack

openrouter-stt

143

Transcribes speech to text via OpenRouter's API using curl or bash scripts. Encodes audio as base64 JSON input for models like google/chirp-3 or openai/whisper-1; returns transcript and usage in JSON.

1 file

openrouter

Stats

LanguagePython

Stars73

Forks8

MaintenanceExcellent

Last CommitApr 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Venice Transcription (`/audio/transcriptions`)

POST /api/v1/audio/transcriptions takes an audio file and returns text. It's OpenAI-compatible with multipart/form-data — the OpenAI SDK's audio.transcriptions.create() works unchanged.

Use when

You need STT (speech-to-text) for voice notes, meetings, podcasts, short audio.
You need timestamps for subtitles / chapters.
You want to pick between fast local-style models (Parakeet) and large multilingual ones (Whisper, Wizper, Scribe).

For long video / YouTube transcription, see venice-video's /video/transcriptions (takes a public video URL directly).

Minimal request

curl https://api.venice.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -F "file=@./meeting.m4a" \
  -F "model=nvidia/parakeet-tdt-0.6b-v3" \
  -F "response_format=json" \
  -F "timestamps=false"

{ "text": "Alright everyone, let's kick off the meeting..." }

With timestamps=true, json format also returns segment/word timings (schema is model-specific).

Request (`multipart/form-data`)

Field	Type	Default	Notes
`file`	binary	—	Required. Audio file. Supported: `wav`, `wave`, `flac`, `m4a`, `aac`, `mp4`, `mp3`, `ogg`, `webm`. Base64 is not accepted — upload as a real file.
`model`	enum	`nvidia/parakeet-tdt-0.6b-v3`	See models below.
`response_format`	`json` / `text`	`json`	`text` returns `text/plain` body.
`timestamps`	bool	`false`	Include segment/word timestamps (JSON only).
`language`	string	—	ISO 639-1 hint (e.g. `en`, `ja`). Only Whisper-family models honor it; others auto-detect.

Models

Model ID	Notes
`nvidia/parakeet-tdt-0.6b-v3`	Default. Fast, English-first, great for real-time-ish flows.
`openai/whisper-large-v3`	Large multilingual, honors `language` hint.
`fal-ai/wizper`	Whisper variant, competitive on quality/latency tradeoff.
`elevenlabs/scribe-v2`	ElevenLabs Scribe, strong on noisy audio.
`stt-xai-v1`	xAI Speech-to-Text.

GET /models?type=asr returns the current catalog. ASR pricing is pricing.per_audio_second.usd — cost scales with audio duration.

OpenAI SDK

import OpenAI from 'openai'
import fs from 'node:fs'

const client = new OpenAI({
  apiKey: process.env.VENICE_API_KEY,
  baseURL: 'https://api.venice.ai/api/v1',
})

const out = await client.audio.transcriptions.create({
  file: fs.createReadStream('meeting.m4a'),
  model: 'openai/whisper-large-v3',
  response_format: 'json',
  language: 'en',
  // @ts-expect-error — Venice-specific extra, passes through multipart
  timestamps: true,
})

console.log(out.text)

Batch / long files

Venice doesn't expose native chunking. For files > ~30 min, split client-side on silence with ffmpeg or pydub, transcribe each chunk, then concatenate with offset timestamps.

ffmpeg -i long.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3

Errors

Code	Meaning
`400`	Bad params, unsupported audio format, empty file, or file larger than 25 MB (this endpoint returns `400` with `"Maximum size is 25MB"`, not `413`).
`401`	Auth / Pro-only.
`402`	Insufficient balance.
`415`	Wrong `Content-Type` — must be `multipart/form-data`.
`422`	Validation / upstream ASR error (e.g. zero-length audio, upstream provider 422). Not a "content policy" code on this path.
`429`	Rate limited.
`500` / `503`	Transient; retry with jitter.

Gotchas

file must be uploaded as a real multipart file part. JSON + base64 is not supported here.
Timestamps are only surfaced in the JSON response shapes (json, verbose_json, srt, vtt). With response_format: text the handler returns a plain text/plain body containing just the transcript — you'll lose any timestamp data, so pick verbose_json / srt / vtt when you need timings.
language is Whisper-specific. Parakeet / Scribe ignore it and auto-detect.
Peak concurrency limits apply — on 429, back off; big batches should throttle to ~5 parallel requests.
Content-policy rejection on the transcript is returned as 422 with an error string; it does not surface suggested_prompt on this path.

venice-audio-transcription

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

venice-audio-transcription

Popularity

Invocation

Context Preview

SKILL.md

Venice Transcription (/audio/transcriptions)

Use when

Minimal request

Request (multipart/form-data)

Models

OpenAI SDK

Batch / long files

Errors

Gotchas

Similar Skills

Help us improve

Venice Transcription (/audio/transcriptions)

Use when

Minimal request

Request (multipart/form-data)

Models

OpenAI SDK

Batch / long files

Errors

Gotchas

Venice Transcription (`/audio/transcriptions`)

Request (`multipart/form-data`)

Venice Transcription (`/audio/transcriptions`)

Request (`multipart/form-data`)