Skill

parakeet-stt

From sundial-org-awesome-openclaw-skills-4

Transcribes audio files to text locally using NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, auto-detects 25 languages, provides OpenAI-compatible API for offline processing.

Python

Install

npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-2 --plugin sundial-org-awesome-openclaw-skills-4

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Local transcription using NVIDIA Parakeet TDT 0.6B v3 with ONNX Runtime.

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.2k

mcp-builder

9 files

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

anthropics-skills-13

124.2k

canvas-design

20 files

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

anthropics-skills-13

124.2k

Stats

Stars586

Forks75

Last CommitFeb 1, 2026

Actions

View Source View Plugin View on GitHub View README

Parakeet TDT (Speech-to-Text)

Local transcription using NVIDIA Parakeet TDT 0.6B v3 with ONNX Runtime. Runs on CPU — no GPU required. ~30x faster than realtime.

Installation

# Clone the repo
git clone https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai.git
cd parakeet-tdt-0.6b-v3-fastapi-openai

# Run with Docker (recommended)
docker compose up -d parakeet-cpu

# Or run directly with Python
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 5000

Default port is 5000. Set PARAKEET_URL to override (e.g., http://localhost:5092).

API Endpoint

OpenAI-compatible API at $PARAKEET_URL (default: http://localhost:5000).

Quick Start

# Transcribe audio file (plain text)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=text"

# Get timestamps and segments
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=verbose_json"

# Generate subtitles (SRT)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=srt"

Python / OpenAI SDK

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("PARAKEET_URL", "http://localhost:5000") + "/v1",
    api_key="not-needed"
)

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="parakeet-tdt-0.6b-v3",
        file=f,
        response_format="text"
    )
print(transcript)

Response Formats

Format	Output
`text`	Plain text
`json`	`{"text": "..."}`
`verbose_json`	Segments with timestamps and words
`srt`	SRT subtitles
`vtt`	WebVTT subtitles

Supported Languages (25)

English, Spanish, French, German, Italian, Portuguese, Polish, Russian, Ukrainian, Dutch, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Bulgarian, Slovak, Croatian, Lithuanian, Latvian, Estonian, Slovenian

Language is auto-detected — no configuration needed.

Web Interface

Open $PARAKEET_URL in a browser for drag-and-drop transcription UI.

Docker Management

# Check status
docker ps --filter "name=parakeet"

# View logs
docker logs -f <container-name>

# Restart
docker compose restart

# Stop
docker compose down

Why Parakeet over Whisper?

Speed: ~30x faster than realtime on CPU
Accuracy: Comparable to Whisper large-v3
Privacy: Runs 100% locally, no cloud calls
Compatibility: Drop-in replacement for OpenAI's transcription API