From claude-transcription
Concatenate multiple audio recordings into one file and send as a single transcription job. Preserves a segment map so timestamps in the unified transcript can be traced back to their source files. Use when the user has several voice memos / chapters / interview parts that belong to one session and wants a single combined transcript instead of N separate ones.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-transcriptionThis skill uses the workspace's default tool permissions.
Stitch together multiple audio files and run them through preprocessing + ASR as a single job. Cheaper (one upload, one job) and produces a coherent single transcript.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Stitch together multiple audio files and run them through preprocessing + ASR as a single job. Cheaper (one upload, one job) and produces a coherent single transcript.
Before concatenation, all files must share the same sample rate, channel count, and codec. If they don't, normalise each one first via the preprocess-for-transcription skill (or just the format-normalise pass of it). Cheapest: normalise all to mono / 16kHz / opus, then concat.
Use ffmpeg's concat demuxer (lossless when codecs match):
# Build the concat list
{
for f in "${FILES[@]}"; do
printf "file '%s'\n" "$(realpath "$f")"
done
} > /tmp/concat-list.txt
ffmpeg -f concat -safe 0 -i /tmp/concat-list.txt -c copy COMBINED.opus
If a boundary marker is desired (helps the user / ASR see file breaks), insert a short silence between segments. Easiest approach: insert a 1s silent opus snippet between every two files in the concat list.
Write <combined-stem>.segments.json capturing where each source file lives in the combined timeline:
{
"combined": "audio/processed/session.combined.opus",
"boundary_silence_seconds": 1.0,
"segments": [
{"index": 0, "source": "memo-01.opus", "start": 0.0, "end": 312.4},
{"index": 1, "source": "memo-02.opus", "start": 313.4, "end": 654.1},
{"index": 2, "source": "memo-03.opus", "start": 655.1, "end": 901.2}
]
}
Durations come from ffprobe -i <file> -show_entries format=duration. This sidecar is what makes diarisation timestamps and "where in the recording" references traceable back to the original files.
Invoke preprocess-for-transcription on the combined file (silence-trim is fine here — boundary silences are 1s, well under the 2.5s threshold, so they'll be preserved as boundary cues). Then invoke transcribe-assemblyai (or whichever ASR the user picked).
After transcription, offer to split the resulting transcript back into per-source files using the segment map and AssemblyAI's word-level timestamps. Output: transcripts/<source-stem>.from-combined.md for each source.
audio/raw/ if user is organising)audio/processed/<session-name>.combined.opusaudio/processed/<session-name>.segments.jsontranscripts/<session-name>.combined.raw.md (then iterative-refine takes over)Inputs: 3 files, 26m 48s total
Combined: audio/processed/session.combined.opus (mono 16kHz, 4.1 MB)
Segment map: audio/processed/session.segments.json
Boundaries: 1.0s silence between each source
Next: send to AssemblyAI? (y/n)
preprocess-for-transcription + transcribe; no concat needed.