Skill

lai-transcribe

Transcribe audio/video to timestamped captions with Gemini (100+ languages) or local Parakeet / SenseVoice models. Trigger on "transcribe", "speech to text", "转录", "语音转文字", "generate captions from audio", or when the user provides an audio/video file with no text. If the YouTube video already has captions, prefer `/lai-youtube`.

npx claudepluginhub lattifai/lattifai-skills --plugin lattifai-skills

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lattifai-skills:lai-transcribe

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadBash(lai:*)

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generates timestamped text from audio/video. Default is Gemini (fast, broad language coverage); local models run offline on GPU.

SKILL.md

76 lines · ~705 tokens

Similar Skills

omnicaptions-transcribe

Transcribes audio/video from YouTube URLs or local files to structured markdown with timestamps, speaker labels, and chapters using Google Gemini API.

3 tools

omnicaptions

lai-align

Align existing captions to audio/video with word-level precision using the Lattice-1 model. Trigger when the user has both a media file AND a caption/transcript that need to be synchronized, or says "fix caption timing", "字幕对不上", "对齐字幕", "word-level timestamps", "karaoke timing", "timestamps are off". Do NOT trigger without existing text — use `/lai-transcribe` first.

2 tools

lattifai-skills

transcribe-video

Generates SRT/VTT subtitles and plain text transcripts from video or audio files using AWS Transcribe and ffmpeg. Useful for captions, extracting speech, notes, or searchable content.

5 tools

startup

Stats

LanguagePython

Stars24

Forks1

MaintenanceExcellent

Last CommitMay 15, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

LattifAI Transcription

Generates timestamped text from audio/video. Default is Gemini (fast, broad language coverage); local models run offline on GPU.

Prerequisites

Gemini needs an API key (free at https://aistudio.google.com/apikey):

lai config set GEMINI_API_KEY <your-key>

Basic Command

Pick a <base> (media stem or YouTube ID) and reuse for the rest of the pipeline; outputs land in the current directory:

# <base> = podcast (from podcast.mp3)
lai transcribe run podcast.mp3 podcast.transcript.json
# shortcut:
lai-transcribe podcast.mp3 podcast.transcript.json

Gemini accepts YouTube URLs directly — no download needed:

# <base> = la0CaZ2R8EY (the YouTube video ID)
lai transcribe run "https://youtu.be/la0CaZ2R8EY" la0CaZ2R8EY.transcript.json

Output naming: prefer <base>.transcript.json so it pipes cleanly into /lai-align (which writes <base>.aligned.json). Use <base>.srt etc. when the transcript itself is the final deliverable and no alignment step follows.

Models

Model	Languages	Requires
`gemini-3-flash-preview` (default)	100+	Gemini API key
`gemini-3.1-pro-preview`	100+, highest quality	Gemini API key
`nvidia/parakeet-tdt-0.6b-v3`	24, offline	GPU + `nemo_toolkit`
`FunAudioLLM/SenseVoiceSmall`	zh / en / ja / ko / cantonese, offline	GPU

Switch model:

lai transcribe run audio.mp4 output.srt transcription.model_name=gemini-3.1-pro-preview

Common Options

transcription.language=zh — force language (otherwise auto-detect)
media.streaming_chunk_secs=300 — chunk long audio
Output format is inferred from extension: .srt / .vtt / .ass / .json / .txt. Use .json when you plan to follow up with /lai-align.

Common Issues

Problem	Fix
`GEMINI_API_KEY not set`	`lai config set GEMINI_API_KEY <your-key>`
Upload timeout / file >2 GB	Split the audio or switch to a local model
Wrong language detected	Force with `transcription.language=en`
Timestamps are coarse	Follow up with `/lai-align`

Related Skills

/lai-align — sharpen timestamps after transcription
/lai-diarize — add speaker labels
/lai-translate — translate the transcript
/lai-youtube — YouTube end-to-end (download + caption + align)
/lai-caption — convert output format

lai-transcribe

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

lai-transcribe

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

LattifAI Transcription

Prerequisites

Basic Command

Models

Common Options

Common Issues

Related Skills

Similar Skills

Help us improve

LattifAI Transcription

Prerequisites

Basic Command

Models

Common Options

Common Issues

Related Skills