End-to-end audio transcription pipeline: preprocess (denoise, VAD, format normalization, speaker sampling), transcribe (Gemini, AssemblyAI, local Whisper), post-process (clean fillers, structure, blog/summary/notes), combine versions, and export.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-transcriptionRemove filler words, false starts, and immediate repetitions from an existing transcript file. Preserves meaning, tone, and structure. Use when the user asks to clean a transcript, remove fillers, or tidy up a raw transcription.
Concatenate multiple audio recordings into one file and send as a single transcription job. Preserves a segment map so timestamps in the unified transcript can be traced back to their source files. Use when the user has several voice memos / chapters / interview parts that belong to one session and wants a single combined transcript instead of N separate ones.
Concatenate multiple transcript variants (raw, cleaned, structured, blog, summary, notes) into a single combined document — markdown with variant headers, or Typst-rendered PDF with page numbers and variant name in the footer. Use when the user asks to combine versions, merge transcript variants, or produce a single multi-version document.
Onboarding for Claude-Transcription — registers the user's transcription and denoise backends (cloud APIs, MCP servers, local binaries, custom endpoints) into a user-data config file. No provider hardcoded; user picks what they have. Use when the user asks to configure claude-transcription, set up the plugin, add a new transcription backend, or pick a default provider.
Remove background noise from an audio file. Cloud providers (Auphonic default, ElevenLabs, Dolby.io) or local (DeepFilterNet ML, ffmpeg afftdn non-ML). Use when the user asks to denoise, clean up noise, remove hum/hiss, or enhance speech audio before transcription.
Use when the user wants to download audio from a YouTube URL (or YouTube playlist). Triggers on phrases like "download audio from youtube", "grab the audio from this yt link", "yt-dlp this", or any YouTube URL paired with a request to save audio. Saves to ~/audio/yt-raw using yt-dlp.
Export a transcript (or any pipeline output) by emailing it, uploading to Google Drive, or copying to clipboard. Use when the user asks to send, email, upload, save to Drive, or copy a transcript.
Mine a transcript for durable personal context — facts about the speaker (location, role, projects, preferences, relationships, ongoing work) — and write them to a context file for later reuse. De-duplicates against existing context so the same facts aren't re-extracted on every memo. Use when the user wants to build a knowledge base from voice memos, or asks to "pull out context from this".
Cluster unique voices in an audio recording and extract a short sample of each to a file, then prompt the user to label them. Feeds diarization in downstream transcription. Use when the user asks to identify speakers, extract voice samples, prep for diarization, or label voices.
Stepwise transcript refinement that never overwrites prior stages. Stage 1 removes filler words and adds paragraphs; stage 2 adds subheadings; stage 3 asks the user what's next (notes, blog, summary, PDF, translation, or stop). Use when the user hands over a raw transcript and wants it progressively improved with all intermediate versions preserved.
Diagnose-and-treat workflow for very messy recordings where the standard preprocessor isn't enough. Generates a spectrogram, identifies problematic frequency bands (AC hum, rumble, narrow-band whines, broadband hiss, clipping), and proposes targeted ffmpeg filters. ONLY use when the user explicitly invokes it — the normal pipeline runs `preprocess-for-transcription` directly and skips diagnostic work. Trigger phrases the user might use — "this audio is messy", "transcription failed because of noise", "fix this rough recording", "the audio is bad and the transcript is garbage".
Convert audio to a transcription-friendly format — stereo to mono, resample to 16kHz, compress WAV to opus or mp3, reduce bitrate. Use when the user asks to normalize audio format, downmix, resample, compress, or prepare audio for transcription.
Prepare audio for transcription — format normalise (mono/16kHz/opus), loudness normalise (EBU R128), and collapse long silences (silero-vad). Optional denoise pass. Use before sending to AssemblyAI or any ASR for cleaner results, smaller uploads, and lower cost. Use when the user asks to "preprocess audio", "prep for transcription", "clean up a recording before sending it", or just hands over a raw voice memo to be transcribed.
One-time setup for local Whisper transcription — installs faster-whisper and downloads a default model. Use when the user asks to set up whisper, install whisper, or prepare for local transcription.
Segment a mixed-intent transcript into separate files by category — prompts, context, action items, background, decisions. Useful when the user has dictated a dev brief, code review, or planning session that interleaves multiple kinds of content. Use when the user says "split this transcript", "separate the prompts from the context", or hands over a transcript that is clearly multi-purpose.
Add section headers, logical groupings, and paragraph breaks to a cleaned transcript, while preserving the speaker's wording. Use when the user asks to structure a transcript, add headers, organize by topic, or make a transcript readable without rewriting it.
Transcribe audio via AssemblyAI with word-level timestamps and speaker diarization. Best for meetings, interviews, and anything needing speaker labels or timecodes. Use when the user asks for a timestamped transcript, diarized transcript, or multi-speaker transcription.
Transcribe audio via Gemini with filler words and pause-words removed at transcription time. Produces a lightly cleaned transcript in one pass. Use when the user asks for a cleaned transcript, filler-free transcription, or a readable first-pass transcript via Gemini.
Transcribe an audio file to a raw verbatim transcript using the gemini-transcription MCP server. Preserves every word including fillers. Use when the user asks for a raw transcript, verbatim transcription, or unedited text via Gemini.
Transcribe a podcast episode and produce two outputs in one pass — (1) the raw API transcript exactly as the provider returned it, and (2) a podcast-formatted reading version with filler words removed, section headers added, and speaker labels preserved. Use whenever the user provides a podcast audio file or says "transcribe this podcast".
Transcribe audio locally using Whisper (offline, no cloud). Use when the user asks for a local transcription, offline transcription, privacy-preserving transcription, or explicitly requests Whisper.
Rewrite a transcript into a publishable blog post — narrative flow, tightened prose, engaging title, and intro/outro. Use when the user asks to turn a transcript into a blog post, convert recording to article, or draft a post from an interview.
Convert a transcript into structured meeting or study notes with headers, bullets, and callouts. Use when the user asks for notes from a recording, study notes, meeting minutes, or structured notes from a transcript.
Render a single transcript to a PDF using Typst, with the source file's modification timestamp and page numbers in the footer. Use when the user asks to "make a PDF of this transcript", "render to PDF", "transcript as PDF", or otherwise wants a printable single-version document. For multi-version (raw + cleaned + structured + blog) bundles, use `combine-versions` instead.
Produce an executive summary and bullet-point highlights from a transcript. Use when the user asks for a summary, TL;DR, exec summary, key takeaways, or highlights of a recording.
Apply a stylistic / formatting transformation to a transcript using a named prompt from the user's Text-Transformation-Prompt-Library (206 transformations covering blog outlines, briefs, business correspondence, analysis documents, meeting notes, and more). Use when the user asks to "convert this to <style>", "format like a <type>", or names a specific transformation. Fetches the catalog on demand and caches it; never commits the prompts into this plugin.
Translate a transcript from its source language into a target language while preserving structure (paragraphs, subheadings, speaker labels, timestamps). Use when the user wants a transcript in another language — for sharing with someone who doesn't speak the source language, for publishing in a target market, or for review.
Remove long silences from an audio file using VAD (voice activity detection). Use when the user asks to truncate silence, remove gaps, trim pauses, or apply VAD to an audio recording.
Sanity-check the transcription plugin's environment — confirm required API keys are set and reachable via cheap ping. Use when the user asks "is transcription working?", reports a transcription failure, or after changing API keys / shell config.
End-to-end audio-to-transcript pipeline as a Claude Code plugin. Preprocess long-form recordings, transcribe via cloud or local engines, post-process into clean/structured/blog/summary formats, combine versions, and export.
audio ──► preprocess ──► transcribe ──► post-process ──► combine ──► export
denoise — remove background noise. Cloud: Auphonic (default, cheapest), ElevenLabs Voice Isolator, Dolby.io. Local: DeepFilterNet (ML) or ffmpeg afftdn (non-ML).truncate-silence — VAD-based silence removal (silero-vad or ffmpeg silenceremove).normalize-format — stereo→mono, 16kHz resample, compress WAV to opus/mp3.extract-speaker-samples — cluster unique voices in the audio, emit short per-speaker samples for user to label (feeds diarization).transcribe-gemini-raw — verbatim via gemini-transcription MCP.transcribe-gemini-cleaned — filler words removed at transcription time.transcribe-assemblyai — timestamped + diarization via AssemblyAI.transcribe-whisper-local — offline transcription via local Whisper.setup-whisper — one-time local Whisper installation.clean-transcript — strip fillers (ums, likes, repetitions) from an existing transcript.structure-transcript — add headers, logical sections, paragraph breaks.transcript-to-blog — rewrite as a publishable blog post.transcript-to-summary — executive summary + bullet highlights.transcript-to-notes — structured meeting/study notes.combine-versions — concatenate raw + cleaned + structured into one doc. Markdown (variant headers) or Typst → PDF (page numbers, variant in footer).export-transcript — email, upload to Google Drive, or copy to clipboard.configure — onboarding: pick denoise + transcription providers, set default output directory, toggle cloud/local preference.configure or per-invocation.See CLAUDE.md for config file schema and conventions.
claude plugins marketplace add danielrosehill/Claude-Code-Plugins
claude plugins install claude-transcription@danielrosehill
Then restart Claude Code.
ffmpeg — preprocessingpython3 + uv — for skills that shell out to Python (DeepFilterNet, silero-vad, pyannote, whisper)gemini-transcription, gws-personal (for Drive export)AUPHONIC_API_KEY, ELEVENLABS_API_KEY, ASSEMBLYAI_API_KEY, DOLBY_API_KEYMIT
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.
Requires secrets
Needs API keys or credentials to function
Share bugs, ideas, or general feedback.
Memory compression system for Claude Code - persist context across sessions
Streamline people operations — recruiting, onboarding, performance reviews, compensation analysis, and policy guidance. Maintain compliance and keep your team running smoothly.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim