From sundial-org-awesome-openclaw-skills-4
Transcribes audio files (OGG, MP3, WAV, M4A) using Google's Gemini API or Vertex AI via Python CLI. Auto-detects gcloud ADC or API key for fast transcription of voice messages.
npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-2 --plugin sundial-org-awesome-openclaw-skills-4This skill uses the workspace's default tool permissions.
Transcribe audio files using Google's Gemini API or Vertex AI. Default model is `gemini-2.0-flash-lite` for fastest transcription.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Transcribe audio files using Google's Gemini API or Vertex AI. Default model is gemini-2.0-flash-lite for fastest transcription.
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
The script will automatically detect and use ADC when available.
Set GEMINI_API_KEY in environment (e.g., ~/.env or ~/.clawdbot/.env)
.ogg / .opus (Telegram voice messages).mp3.wav.m4a# Auto-detect auth (tries ADC first, then GEMINI_API_KEY)
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg
# Force Vertex AI
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex
# With a specific model
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro
# Vertex AI with specific project and region
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1
# With Clawdbot media
python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg
| Option | Description |
|---|---|
<audio_file> | Path to the audio file (required) |
--model, -m | Gemini model to use (default: gemini-2.0-flash-lite) |
--vertex, -v | Force use of Vertex AI with ADC |
--project, -p | GCP project ID (for Vertex, defaults to gcloud config) |
--region, -r | GCP region (for Vertex, default: us-central1) |
Any Gemini model that supports audio input can be used. Recommended models:
| Model | Notes |
|---|---|
gemini-2.0-flash-lite | Default. Fastest transcription speed. |
gemini-2.0-flash | Fast and cost-effective. |
gemini-2.5-flash-lite | Lightweight 2.5 model. |
gemini-2.5-flash | Balanced speed and quality. |
gemini-2.5-pro | Higher quality, slower. |
gemini-3-flash-preview | Latest flash model. |
gemini-3-pro-preview | Latest pro model, best quality. |
See Gemini API Models for the latest list.
For Clawdbot voice message handling:
# Transcribe incoming voice message
TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py "$AUDIO_PATH")
echo "User said: $TRANSCRIPT"
The script exits with code 1 and prints to stderr on: