From claude-transcription
Diagnose-and-treat workflow for very messy recordings where the standard preprocessor isn't enough. Generates a spectrogram, identifies problematic frequency bands (AC hum, rumble, narrow-band whines, broadband hiss, clipping), and proposes targeted ffmpeg filters. ONLY use when the user explicitly invokes it — the normal pipeline runs `preprocess-for-transcription` directly and skips diagnostic work. Trigger phrases the user might use — "this audio is messy", "transcription failed because of noise", "fix this rough recording", "the audio is bad and the transcript is garbage".
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-transcriptionThis skill uses the workspace's default tool permissions.
Deliberate, opt-in workflow for audio that the standard pipeline can't handle. Diagnoses what's wrong, recommends targeted remediations, and only applies them once the user confirms.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Deliberate, opt-in workflow for audio that the standard pipeline can't handle. Diagnoses what's wrong, recommends targeted remediations, and only applies them once the user confirms.
preprocess-for-transcription is the right tool.preprocess-for-transcription.Render a spectrogram and run a numerical analysis:
scripts/render-spectrogram.sh INPUT.opus
uv run --quiet --with numpy --with soundfile --with scipy \
scripts/analyse-spectrum.py INPUT.opus
The analyser produces a JSON report covering:
-af strings, ready to drop into a pipelineShow the user:
Do NOT apply anything yet.
Ask the user which suggestions to apply. They may want all, some, or none. They may also want to skip the messy-audio-fix entirely after seeing the diagnosis ("looks fine to me, just transcribe it").
If multiple narrow-band whines are detected, treat them as a single block — applying or skipping all of them together is fine; cherry-picking individual whines is over-engineering for ASR prep.
Build the ffmpeg filter chain in order:
highpass=f=90equalizer=... filters with t=q:w=2:g=-25loudnorm=I=-16:TP=-1.5:LRA=11-ac 1 -ar 16000-c:a libopus -b:a 24kIf broadband HF hiss was flagged, EQ won't fix it — invoke the standalone denoise skill (DeepFilterNet or Auphonic) before building the filter chain. Then run this skill's filter chain on the denoised output.
If clipping was flagged, warn the user that no amount of filtering can recover the clipped peaks. Proceed anyway — ASR usually copes — but flag that the source recording was already damaged.
Output: <stem>.fixed.opus in audio/processed/.
After applying, re-render the spectrogram on the output and present it side-by-side with the original. Visual confirmation that the targeted bands actually got attenuated.
Then either:
transcribe-assemblyai (or whichever ASR), ORdenoise with a stronger backend, OR<stem>.spectrogram.png and <stem>.spectrogram.fixed.png in audio/processed/<stem>.spectrum-analysis.json (the diagnostic report)<stem>.fixed.opus (final cleaned output)pyannote-audio / ElevenLabs Voice Isolator), out of scope for this skill.