Help us improve
Share bugs, ideas, or general feedback.
From alltuu-downloader
Transcribes local audio/video files to Markdown and SRT subtitles using MLX/Whisper (Apple Silicon) or faster-whisper (others).
npx claudepluginhub chujianyun/skills --plugin photoplus-downloaderHow this skill is triggered — by the user, by Claude, or both
Slash command
/alltuu-downloader:local-audio-transcriberThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
把用户提供的本地录音、音频或视频文件转成文字。核心目标是:用户发来音频后,直接返回转写文本;同时在本地保存 Markdown 文稿和 SRT 字幕。
Transcribes audio/video files to Markdown with speaker diarization, timestamps, metadata, meeting minutes, and LLM summaries using Faster-Whisper or Whisper.
Transcribes audio/video files to Markdown documentation with LLM summaries, speaker diarization, timestamps, and meeting minutes using Faster-Whisper or Whisper.
Records microphone and system audio, transcribes to timestamped Markdown with speaker labels using whisper.cpp and Metal acceleration on macOS 14+. For meetings, notes, dictation.
Share bugs, ideas, or general feedback.
把用户提供的本地录音、音频或视频文件转成文字。核心目标是:用户发来音频后,直接返回转写文本;同时在本地保存 Markdown 文稿和 SRT 字幕。
.m4a、.mp3、.wav、.aac、.flac、.ogg、.opus、.mp4、.mov、.mkv 等已有文件python3 - <<'PY'
import platform, sys
print(sys.platform, platform.machine())
try:
import mlx.core as mx
print("mlx", mx.default_device())
except Exception as e:
print(type(e).__name__ + ": " + str(e))
PY
python3.13 -m venv /tmp/local-audio-transcriber-mlx
/tmp/local-audio-transcriber-mlx/bin/python -m pip install -U pip mlx-whisper
python3 -m pip install -U faster-whisper
--language zh;不确定语言时省略语言参数:/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "input.m4a" --language zh --print-text
.md 和 .srt 文件;不要生成 txt、json 或 vtt。# Apple Silicon 中文长录音:优先使用 MLX + Apple GPU + turbo-q4
/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "recording.m4a" --language zh --print-text
# 明确指定 MLX 和模型
/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "recording.m4a" --engine mlx --model mlx-community/whisper-large-v3-turbo-q4 --language zh --print-text
# 非 Apple Silicon / CUDA / CPU:使用 faster-whisper
python3 {skill_dir}/scripts/transcribe.py "recording.m4a" --engine faster-whisper --model small --language zh --print-text
# 自动识别语言,生成 Markdown 和 SRT
python3 {skill_dir}/scripts/transcribe.py "video.mp4" --print-text
# 多个文件批量转写到指定目录
python3 {skill_dir}/scripts/transcribe.py *.m4a --language zh --output-dir ./transcripts
# faster-whisper 质量更高但更慢
python3 {skill_dir}/scripts/transcribe.py "meeting.m4a" --engine faster-whisper --model medium --language zh --print-text
mlx,默认模型:mlx-community/whisper-large-v3-turbo-q4faster-whisper,默认模型:smallwhisper-large-v3-turbo-q4whisper-large-v3-turbo-q4 是量化模型,适合 M1 16GB 这类统一内存机器;首次运行会下载模型,之后走本地缓存--device mps 在 M1 上可能极慢;本地 Apple GPU 路线优先用 MLX,不优先用 PyTorch MPScondition_on_previous_text,避免 Whisper 在中文口语录音里进入重复幻觉循环--condition-on-previous-text--model mediumint8,更省内存float16--no-vad-filtersmall 出现明显错词,优先升级到 whisper-large-v3-turbo-q4,不要只靠后处理硬校对--condition-on-previous-textModuleNotFoundError: faster_whisper:运行 python3 -m pip install -U faster-whisperModuleNotFoundError: mlx_whisper:在虚拟环境中运行 /tmp/local-audio-transcriber-mlx/bin/python -m pip install -U mlx-whisperexternally-managed-environment:不要加 --break-system-packages,改用 python3.13 -m venv /tmp/local-audio-transcriber-mlxmlx 默认设备不是 Device(gpu, 0):说明 MLX/Metal 没走通,降级到 faster-whisper 或检查系统环境ffmpeg,或让用户换成 .m4a/.mp3/.wav--model small 或 --model base--device mps,优先使用本脚本 --engine mlx--language zh--model medium,或关闭 VAD:--no-vad-filter--condition-on-previous-text,必要时换 whisper-large-v3-turbo-q4