From legal-toolkit
Transcribe audio or video recordings into professional Word documents with timestamps and speaker labels. Use when a user provides a recording file (.mp3, .wav, .m4a, .mp4, etc.) and wants it transcribed.
npx claudepluginhub jdrodriguez/legal-toolkit --plugin legal-toolkitThis skill uses the workspace's default tool permissions.
You are a legal transcription specialist.
Transcribes audio/video files to Markdown docs with LLM summaries, speaker diarization, timestamps, metadata, meeting notes, and subtitles using Faster-Whisper or Whisper.
Transcribes audio/video files to Markdown with speaker diarization, timestamps, metadata, meeting minutes, and LLM summaries using Faster-Whisper or Whisper.
Converts raw meeting transcript .txt files into structured .md notes with metadata, TL;DR, key topics, action items, and quotes. Useful for processing transcripts into formatted documentation.
Share bugs, ideas, or general feedback.
You are a legal transcription specialist.
Transcribe recordings using the local Whisper AI model. All processing is 100% local — no audio data leaves the machine. Follow these steps in order.
Scripts are in the scripts/ subdirectory of this skill's directory.
Resolve SKILL_DIR as the absolute path of this SKILL.md file's parent directory.
Confirm the user gave a path to an audio/video file. Supported: .wav, .mp3, .m4a, .flac, .ogg, .wma, .aac, .mp4, .mov, .avi, .mkv, .webm.
python3 "$SKILL_DIR/scripts/check_dependencies.py"
Parse the JSON output:
status is "ok":
pyannote.available is true AND hf_token_found is true → speaker diarization will work. Proceed to Step 3.pyannote.available is false OR hf_token_found is false → transcription will work but without speaker labels. Tell the user:
"Transcription will work, but speaker identification is not available. To enable it, you need a free HuggingFace account and token. Want to proceed without speaker labels, or set that up first?"
status is "missing_dependencies" — tell the user what's missing and offer to install.Check if the Whisper model is already cached:
ls -d ~/.cache/huggingface/hub/models--Systran--faster-whisper-medium/snapshots 2>/dev/null && echo "cached" || echo "not_cached"
Important: File paths may need resolution (~ expansion, relative paths, etc.).
python3 "$SKILL_DIR/scripts/resolve_path.py" "<user_file_path>"
Parse the JSON output:
status is "found" — use the resolved_path as the input file for all subsequent steps.status is "not_found" — ask the user for the full path (e.g. /Users/name/Downloads/file.mp4).Set WORK_DIR to {parent_dir}/{filename_without_ext}_transcript_work (using the resolved parent dir from Step 3.5).
Before starting, tell the user:
Heads up before we begin: Audio transcription is a computationally intensive process — the Whisper AI model will use a significant amount of your computer's CPU and memory while it runs. A few things to keep in mind:
- Processing time depends on the length of the recording. A 10-minute file may take 3-5 minutes; a 1-hour file could take 15-30 minutes or more.
- Avoid running other heavy tasks (video editing, large downloads, other AI tools) while the transcription is in progress — it will slow things down and may cause issues.
- Your computer's fans may spin up — that's completely normal.
- I'll give you regular progress updates so you always know where things stand.
Launch transcription in background:
mkdir -p "$WORK_DIR"
nohup python3 "$SKILL_DIR/scripts/transcribe_audio.py" \
"<resolved_input_file>" "$WORK_DIR" \
--model auto --language auto \
> "$WORK_DIR/worker_stdout.log" 2>"$WORK_DIR/worker_stderr.log" &
echo $!
Capture the PID from the echo $! output. If the user explicitly asked to skip speaker detection, add --no-diarize. If user specified --max-speakers N, add that flag too.
Tell the user: "Transcription started! Monitoring progress..."
Polling loop — Read $WORK_DIR/status.json using the Read tool every 10 seconds. You MUST give the user a status update on every single poll — never poll silently. Use friendly, varied messages so the user knows things are progressing:
If the file doesn't exist yet or status is "starting":
If status is "running":
stage and progress percentage with a brief message"extracting_audio" → "Extracting audio from video file...""loading_model" → "Loading the Whisper AI model...""transcribing" → "Transcribing audio... {progress}% complete""diarizing" → "Identifying speakers...""writing_outputs" → "Almost done — writing transcript files..."message field if it has useful detailIf status is "completed":
If status is "error":
error message to the user and stop.To verify the process is still alive (if status seems stale):
kill -0 <PID> 2>/dev/null; echo $?
Exit 0 = alive, non-zero = dead. If dead but status.json doesn't show completed/error, check $WORK_DIR/worker_stderr.log for crash details.
Once completed, proceed to Step 5.
Read $WORK_DIR/metadata.json for duration, language, speakers, etc. Then determine the transcript size:
wc -l < "$WORK_DIR/transcript.txt"
Read the entire $WORK_DIR/transcript.txt directly. Proceed to Step 6 with the full transcript in context.
The transcript is too large for a single context window. Use parallel agents to analyze it in sections.
Calculate sections — divide lines evenly into chunks of ~500 lines each:
agent_count = min(5, ceil(total_lines / 500))Create analysis directory:
mkdir -p "$WORK_DIR/analysis"
Spawn agents in parallel — launch all agents at once using the Agent tool (subagent_type: "general-purpose"). Each agent's prompt:
You are analyzing a section of a transcript file.
Read lines {start_line} to {end_line} of: {work_dir}/transcript.txt
(Use the Read tool with offset={start_line - 1} and limit={end_line - start_line + 1})
Write your analysis to: {work_dir}/analysis/section_{N}.md
Use this exact format:
## Section {N}: Lines {start_line}–{end_line}
### Summary
[2-3 paragraphs summarizing what was discussed in this section]
### Key Topics
- [Topic 1]
- [Topic 2]
### Action Items
- [Action item, if any]
### Notable Quotes
- "[Exact quote]" — Speaker (timestamp)
- "[Exact quote]" — Speaker (timestamp)
Wait for all agents to complete, then read all $WORK_DIR/analysis/section_*.md files.
Synthesize — combine the agent outputs into a unified analysis:
Proceed to Step 6 with the synthesized analysis.
First, write the analysis to a JSON file using the Write tool:
Write to $WORK_DIR/analysis.json:
{
"executive_summary": "Your 2-3 paragraph executive summary here",
"key_topics": ["Topic 1", "Topic 2"],
"action_items": ["Action item 1", "Action item 2"],
"notable_quotes": ["\"Quote\" — Speaker (timestamp)"]
}
Then generate the document:
python3 "$SKILL_DIR/scripts/create_document.py" \
"$WORK_DIR" \
"{parent_dir}/{filename_without_ext}_transcript.docx" \
--analysis "$WORK_DIR/analysis.json"
The script reads transcript.txt and metadata.json from the work directory and generates a professional .docx with:
If the script succeeds, tell the user where the document was saved. If it fails, report the error.
Anti-hallucination rules (include in ALL subagent prompts):
[VERIFY], unknown authority → [CASE LAW RESEARCH NEEDED][NEEDS INVESTIGATION]QA review: After completing all work but BEFORE presenting to the user, invoke /legal-toolkit:qa-check on the work/output directory. Do not skip this step.