npx claudepluginhub jdrodriguez/legal-toolkit --plugin legal-toolkitWant just this skill?
Then install: npx claudepluginhub u/[userId]/[slug]
Transcribe audio or video recordings into professional Word documents with timestamps and speaker labels. Use when a user provides a recording file (.mp3, .wav, .m4a, .mp4, etc.) and wants it transcribed.
This skill uses the workspace's default tool permissions.
scripts/check_dependencies.pyscripts/create_document.pyscripts/resolve_path.pyscripts/transcribe_audio.pyLegal Transcriber
Transcribe recordings using the local Whisper AI model. All processing is 100% local — no audio data leaves the machine. Follow these steps in order.
Skill Directory
Scripts are in the scripts/ subdirectory of this skill's directory.
Resolve SKILL_DIR as the absolute path of this SKILL.md file's parent directory.
Step 1: Validate
Confirm the user gave a path to an audio/video file. Supported: .wav, .mp3, .m4a, .flac, .ogg, .wma, .aac, .mp4, .mov, .avi, .mkv, .webm.
Step 2: Check Dependencies
python3 "$SKILL_DIR/scripts/check_dependencies.py"
Parse the JSON output:
- If
statusis"ok":- Check if
pyannote.availableis true ANDhf_token_foundis true → speaker diarization will work. Proceed to Step 3. - If
pyannote.availableis false ORhf_token_foundis false → transcription will work but without speaker labels. Tell the user:"Transcription will work, but speaker identification is not available. To enable it, you need a free HuggingFace account and token. Want to proceed without speaker labels, or set that up first?"
- If the user wants to proceed without speakers, continue normally. The script handles this gracefully.
- Check if
- If
statusis"missing_dependencies"— tell the user what's missing and offer to install. - If the script fails — report the error.
Step 3: Prepare Model
Check if the Whisper model is already cached:
ls -d ~/.cache/huggingface/hub/models--Systran--faster-whisper-medium/snapshots 2>/dev/null && echo "cached" || echo "not_cached"
- If cached: model is ready, proceed.
- If not cached: tell the user the model will download during transcription (first run only, ~1.5 GB).
Step 3.5: Resolve File Path
Important: File paths may need resolution (~ expansion, relative paths, etc.).
python3 "$SKILL_DIR/scripts/resolve_path.py" "<user_file_path>"
Parse the JSON output:
- If
statusis"found"— use theresolved_pathas the input file for all subsequent steps. - If
statusis"not_found"— ask the user for the full path (e.g./Users/name/Downloads/file.mp4).
Step 4: Transcribe (Background + Polling)
Set WORK_DIR to {parent_dir}/{filename_without_ext}_transcript_work (using the resolved parent dir from Step 3.5).
-
Before starting, tell the user:
Heads up before we begin: Audio transcription is a computationally intensive process — the Whisper AI model will use a significant amount of your computer's CPU and memory while it runs. A few things to keep in mind:
- Processing time depends on the length of the recording. A 10-minute file may take 3-5 minutes; a 1-hour file could take 15-30 minutes or more.
- Avoid running other heavy tasks (video editing, large downloads, other AI tools) while the transcription is in progress — it will slow things down and may cause issues.
- Your computer's fans may spin up — that's completely normal.
- I'll give you regular progress updates so you always know where things stand.
-
Launch transcription in background:
mkdir -p "$WORK_DIR"
nohup python3 "$SKILL_DIR/scripts/transcribe_audio.py" \
"<resolved_input_file>" "$WORK_DIR" \
--model auto --language auto \
> "$WORK_DIR/worker_stdout.log" 2>"$WORK_DIR/worker_stderr.log" &
echo $!
Capture the PID from the echo $! output. If the user explicitly asked to skip speaker detection, add --no-diarize. If user specified --max-speakers N, add that flag too.
-
Tell the user: "Transcription started! Monitoring progress..."
-
Polling loop — Read
$WORK_DIR/status.jsonusing the Read tool every 10 seconds. You MUST give the user a status update on every single poll — never poll silently. Use friendly, varied messages so the user knows things are progressing:-
If the file doesn't exist yet or
statusis"starting":- "Starting up the transcription engine..."
- If this persists for >3 polls, say: "The engine is still initializing — this can take a moment on first run."
-
If
statusis"running":- Always report the
stageandprogresspercentage with a brief message - Map stages to user-friendly descriptions:
"extracting_audio"→ "Extracting audio from video file...""loading_model"→ "Loading the Whisper AI model...""transcribing"→ "Transcribing audio... {progress}% complete""diarizing"→ "Identifying speakers...""writing_outputs"→ "Almost done — writing transcript files..."
- Include the
messagefield if it has useful detail - For long transcriptions (>3 polls at the same stage), add reassurance: "Still working — this is normal for longer recordings."
- Always report the
-
If
statusis"completed":- Tell the user: "Transcription complete!" and proceed to Step 5.
-
If
statusis"error":- Report the
errormessage to the user and stop.
- Report the
-
To verify the process is still alive (if status seems stale):
kill -0 <PID> 2>/dev/null; echo $?Exit 0 = alive, non-zero = dead. If dead but status.json doesn't show completed/error, check
$WORK_DIR/worker_stderr.logfor crash details.
-
-
Once completed, proceed to Step 5.
Step 5: Analyze Transcript
Read $WORK_DIR/metadata.json for duration, language, speakers, etc. Then determine the transcript size:
wc -l < "$WORK_DIR/transcript.txt"
Small transcript (500 lines or fewer)
Read the entire $WORK_DIR/transcript.txt directly. Proceed to Step 6 with the full transcript in context.
Large transcript (more than 500 lines)
The transcript is too large for a single context window. Use parallel agents to analyze it in sections.
-
Calculate sections — divide lines evenly into chunks of ~500 lines each:
agent_count = min(5, ceil(total_lines / 500))- Each agent gets a contiguous line range (e.g., Agent 1: lines 1–500, Agent 2: lines 501–1000, etc.)
-
Create analysis directory:
mkdir -p "$WORK_DIR/analysis" -
Spawn agents in parallel — launch all agents at once using the Agent tool (
subagent_type: "general-purpose"). Each agent's prompt:You are analyzing a section of a transcript file. Read lines {start_line} to {end_line} of: {work_dir}/transcript.txt (Use the Read tool with offset={start_line - 1} and limit={end_line - start_line + 1}) Write your analysis to: {work_dir}/analysis/section_{N}.md Use this exact format: ## Section {N}: Lines {start_line}–{end_line} ### Summary [2-3 paragraphs summarizing what was discussed in this section] ### Key Topics - [Topic 1] - [Topic 2] ### Action Items - [Action item, if any] ### Notable Quotes - "[Exact quote]" — Speaker (timestamp) - "[Exact quote]" — Speaker (timestamp) -
Wait for all agents to complete, then read all
$WORK_DIR/analysis/section_*.mdfiles. -
Synthesize — combine the agent outputs into a unified analysis:
- Merge all section summaries into a cohesive Executive Summary (2-3 paragraphs)
- Consolidate all Key Topics (deduplicate)
- Collect all Action Items
- Select the best 5-10 Notable Quotes across all sections
Proceed to Step 6 with the synthesized analysis.
Step 6: Create Document
First, write the analysis to a JSON file using the Write tool:
Write to $WORK_DIR/analysis.json:
{
"executive_summary": "Your 2-3 paragraph executive summary here",
"key_topics": ["Topic 1", "Topic 2"],
"action_items": ["Action item 1", "Action item 2"],
"notable_quotes": ["\"Quote\" — Speaker (timestamp)"]
}
Then generate the document:
python3 "$SKILL_DIR/scripts/create_document.py" \
"$WORK_DIR" \
"{parent_dir}/{filename_without_ext}_transcript.docx" \
--analysis "$WORK_DIR/analysis.json"
The script reads transcript.txt and metadata.json from the work directory and generates a professional .docx with:
- Title page with filename
- Metadata table (duration, language, model, speakers, word count, date)
- Executive Summary
- Key Topics
- Action Items (if any)
- Speaker Statistics (if diarization data available)
- Full Transcript with timestamps and speaker labels
- Notable Quotes
If the script succeeds, tell the user where the document was saved. If it fails, report the error.
Similar Skills
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.