This skill should be used when the user asks to "transcribe this audio", "transcribe this recording", "convert speech to text", "transcribe voice memo", "transcribe this file", "dictation", "speech recognition", "speech-to-text", "STT", or needs to transcribe audio files, voice memos, interviews, or recordings. Provides a resilient default trio transcription pipeline (AssemblyAI Universal-3 + ElevenLabs Scribe v2 + Cohere local) with Claude-powered merge, manual-merge fallback, resumable runs, and a learning correction dictionary.
npx claudepluginhub oliverames/ames-claude --plugin ames-standalone-skillsThis skill is limited to using the following tools:
Dual-engine audio transcription with Claude-powered interactive merge and dictionary learning. Optionally scales to an eight-model ensemble pipeline with Opus 4.6 consensus merge and structural review for maximum accuracy.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Dual-engine audio transcription with Claude-powered interactive merge and dictionary learning. Optionally scales to an eight-model ensemble pipeline with Opus 4.6 consensus merge and structural review for maximum accuracy.
Invoke by saying "transcribe this audio", "transcribe [file]", "fix this transcript", or "batch transcribe [folder]". For setup, run bash ${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/setup.sh.
| Flag | Description |
|---|---|
--fix-transcript FILE | Correct an existing transcript (.srt, .vtt, .txt, .md) — skips audio engines entirely |
--context NAME | Load a named per-project context overlay (e.g. --context bcbs-vt). First use triggers a short interview. Omit NAME to list saved contexts. |
--review | Interactively review applied corrections before saving — dispute false positives to log them for dictionary cleanup |
--engines E1,E2,... | Choose transcription engines (default: assemblyai-u3-pro,scribe-v2,cohere-transcribe). Run --list-engines for all IDs. |
--list-engines | Print all available engine IDs and aliases, then exit |
--speakers "A,B" | Comma-separated speaker names to help identification |
--no-diarization | Disable speaker diarization (faster, single-speaker recordings) |
--doctor | Verify Python runtime, API key resolution, ffmpeg/ffprobe, SDK imports, HF token presence, and Claude merge availability |
--check-engine scribe-v2 | Run an engine startup self-test without transcribing |
--merge-mode manual | Skip Claude, save normalized per-engine outputs, and generate a comparison bundle with a recommended base transcript |
--resume | Reuse completed per-engine outputs from the run directory |
--rerun-engine ENGINE_ID | Re-run just one engine while resuming |
--use-system-python / --engine-python scribe-v2=/path/to/python | Escape hatches for runtime selection |
--fix-transcript FILE)Accepts an existing transcript file (.srt, .vtt, .txt, .md) and runs it through the dictionary + LLM review pipeline — no audio re-transcription. Useful for correcting outputs from Whisper, YouTube, Riverside, Descript, or any other STT tool.
Workflow:
--context if provided.md + copy of original alongside it (never modifies source)Three-engine pipeline. Engines run in parallel (cloud) or sequentially (local), then Claude Code headless merges all transcripts with per-engine weighting: AssemblyAI for speaker structure, ElevenLabs for word accuracy, Cohere as tiebreaker. If headless Claude is unavailable or rate-limited, the script falls back to headless Codex when it is installed.
Default engines: AssemblyAI Universal-3 Pro (SLAM-1, diarization + chapters + entities) + ElevenLabs Scribe v2 (best cloud accuracy) + Cohere Transcribe (local, #1 HF-WER, free). If one engine fails, the run continues, marks the failure clearly, and emits default trio degraded. It can optionally retry with Voxtral Small as a fallback.
An eight-model pipeline for maximum accuracy. Invoked via the transcribe command when the user requests "ensemble", "maximum accuracy", "full pipeline", or selects a multi-model preset.
Phases:
Available Models:
Benchmarks use two different methodologies — results are not directly comparable across systems:
| # | Model | Type | Cost/1K min | Notes |
|---|---|---|---|---|
| 1 | ElevenLabs Scribe v2 | Cloud | $6.67 | 2.3% AA-WER (#1), 5.83% HF-WER (#6); best overall for cloud accuracy |
| 2 | Mistral Voxtral Small | Cloud | $4.00 | 2.9% AA-WER; context biasing via prompt |
| 3 | Google Gemini 3 Pro | Cloud | $18.40 | 2.9% AA-WER; multimodal, most expensive cloud option |
| 4 | AssemblyAI Universal-3 Pro | Cloud | $3.50 | 3.2% AA-WER; best speaker diarization (used for speaker scaffolding) |
| 5 | OpenAI GPT-4o Transcribe | Cloud | ~$6.00 | ~2.46% WER (OpenAI self-reported); RL-trained ASR |
| 6 | OpenAI GPT-4o Mini Transcribe | Cloud | ~$3.00 | Decorrelated errors from GPT-4o full; budget option |
| 7 | Voxtral Mini Realtime | Local | Free | 4B params, mlx-audio on Apple Silicon; 7.68% HF-WER |
| 8 | Cohere Transcribe | Local | Free | Apache 2.0; 5.42% HF-WER (#1 on leaderboard); 524x RTFx, 3x faster than Whisper; 14 languages |
Presets:
Detailed strengths and weaknesses for each engine. Use these to choose the right engines for your recording type.
1. ElevenLabs Scribe v2 (scribe-v2)
2. Mistral Voxtral Small (voxtral-small)
3. Google Gemini 3 Pro (gemini-pro)
4. AssemblyAI Universal-3 Pro (assemblyai-u3-pro)
5. OpenAI GPT-4o Transcribe (gpt4o-transcribe)
6. OpenAI GPT-4o Mini Transcribe (gpt4o-mini-transcribe)
7. Voxtral Mini Realtime (voxtral-mini-local) — local
8. Cohere Transcribe (cohere-transcribe) — local
Every LLM merge/review pass now produces a structured transparency report appended to the output .md:
The report is also printed to the terminal after every run. With --review, the APPLIED section becomes interactive: accept, dispute (logs to suggestions file for later dict cleanup), or skip each item.
Named contexts (e.g. bcbs-vt) live in ~/.config/smart-transcribe/contexts/<name>.json. They use the same schema as the main dictionary and deep-merge on top of it:
keyterms_prompt.keyterms.claude -p --model opus --effort medium — reads per-engine capability profiles and produces structured 4-section output (metadata, transcript, transparency report, suggestions). If Claude is rate-limited or unavailable, fallback is codex exec in headless mode.All 8 engines available via --engines. Cloud engines run in parallel; local engines run sequentially. Same Claude Code headless merge step.
${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/smart-transcribe.py${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/ensemble.py${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/batch-transcribe-folder.py${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/setup.sh${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/data/transcription-dictionary.json~/.config/smart-transcribe/suggested-additions.jsonlAll keys resolved at runtime from 1Password (Development vault) via op item get. Environment variables are also checked first as a fallback.
Required for standard 3-engine mode:
ASSEMBLYAI_API_KEY — AssemblyAI Universal-3 / SLAM-1ELEVENLABS_API_KEY — Scribe v2Optional (additional engines):
MISTRAL_API_KEY — Voxtral SmallGOOGLE_API_KEY — Gemini 3 ProOPENAI_API_KEY — GPT-4o Transcribe + MiniHF_TOKEN — Required for Cohere Transcribe (gated HuggingFace repo)Merge (always required):
claude) should be authenticated — merge uses claude -p --model opus --effort mediumcodex) is the automatic fallback merge runner when Claude headless is unavailable or rate-limitedffmpeg — Audio format conversion (installed via Homebrew); handles .qta, .m4a, .mp3, etc.scripts/requirements.txt). setup.sh auto-detects the highest available Python >= 3.13 — re-run it after upgrading Python to rebuild venvs.torch, transformers, soundfile, librosa — For Cohere Transcribe (local); model cached at ~/.cache/huggingface/hub/Run bash ${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/setup.sh to create dedicated Python 3.13 engine runtimes and install Python deps.
API keys are resolved from 1Password at runtime — no keys.env configuration needed.
The plugin uses a seed + user dictionary architecture:
${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/data/transcription-dictionary.json) - read-only reference that ships with the skill.~/.config/smart-transcribe/transcription-dictionary.json by default) - the personal, evolving copy that the learning loop writes to.The dictionary contains:
After each transcription, new terms are identified and presented to the user for approval before being added to the user dictionary.
Scripts at ${CLAUDE_PLUGIN_ROOT}/skills/smart-transcribe/scripts/. Skill source: plugins/ames-standalone-skills/skills/smart-transcribe/ in the ames-claude repo.