Skill

youtube-to-docs

Processes YouTube videos to extract transcripts, generate infographics, create TTS audio summaries and videos. Useful for video-to-document conversion and visualization.

Python

ai-ml

documentation

npx claudepluginhub doit-artificial-intelligence/youtube-to-docs --plugin youtube-to-docs

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/youtube-to-docs:youtube-to-docs

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill allows you to process YouTube videos to extract transcripts, generate AI summaries, create infographics, and even produce video summaries. You have access to the `youtube-to-docs:process_video` tool which handles these operations.

SKILL.md

154 lines · ~2.7k tokens

Similar Skills

youtube-summarizer

Extracts transcripts from YouTube videos via youtube-transcript-api Python library or web scraping and generates detailed STAR + R-I-S-E summaries for educational content documentation.

7 files

claude-superskills

youtube-digest

771

Extracts YouTube transcripts/metadata, generates Korean summaries/insights Markdown docs, tests comprehension with 9 tiered quizzes (easy/medium/hard), and enables deep research follow-ups.

4 files

youtube-digest

Infographic

Generates a summary and infographic for a YouTube video using the youtube-to-docs tool, with optional Gemini model selection.

youtube-to-docs

Stats

LanguagePython

Parent stars46

Parent forks7

MaintenanceGood

Last CommitMar 14, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

YouTube to Docs

Overview

This skill allows you to process YouTube videos to extract transcripts, generate AI summaries, create infographics, and even produce video summaries. You have access to the youtube-to-docs:process_video tool which handles these operations.

Requirements & Dependencies

The youtube-to-docs:process_video tool is a high-level interface that relies on several optional libraries ("extras") and system binaries to function. These are managed automatically when running via the provided MCP configuration or uv.

Python Libraries: Many features (audio extraction, video generation, cloud storage) require specific extras.
System Binaries: Features like video creation (combine_infographic_audio) require ffmpeg (handled by the static-ffmpeg library).
Automatic Setup: The MCP server (configured in .mcp.json) uses uv run --all-extras to ensure all necessary libraries are installed in a managed environment before execution.

Workflows

1. Transcript Extraction

Use this when the user simply wants the text transcript of a video, without additional AI processing.

Goal: Get the raw text from a YouTube video.
Tool: youtube-to-docs:process_video
Required Argument: url (The YouTube link)
Defaults: By default, process_video fetches the transcript from YouTube.
Example Prompt: "Get the transcript for https://www.youtube.com/watch?v=..."

2. Infographic Generation

Use this when the user wants a visual summary or "infographic" representing the video's content.

Goal: Create a visual summary (image).
Tool: youtube-to-docs:process_video
Required Argument: url (The YouTube link)
Optional Arguments:
- infographic_model: The image generation model to use.
- model: The text model for summarization (required context for the image).
Model Selection Strategy:
- Pro (High Quality): Use if "gemini pro" is requested.
  - model='gemini-3.1-pro-preview'
  - infographic_model='gemini-3-pro-image-preview'
- Flash (Default/Speed/Cost): Use if "gemini flash" is requested or no preference is stated.
  - model='gemini-3-flash-preview'
  - infographic_model='gemini-3.1-flash-image-preview'

Alt Text (Accessibility): By default, process_video generates multimodal alt text using the summary model (image-to-text) for any created infographic. Use alt_text_model to override the model for this step.
Confirmation: Proceed without asking for extra confirmation unless parameters are missing.

3. Kitchen Sink (Comprehensive Processing)

Use this when the user asks for "everything", a "kitchen sink" run, or a "video summary". This generates transcripts, text summaries, Q&A, audio summaries (TTS), infographics, and combines them into a video file.

Goal: Generate all possible artifacts, including a video file.
Tool: youtube-to-docs:process_video
Required Argument: url (The YouTube link)
Optional Arguments:
- all_suite: Shortcut to set models ('gemini-flash' or 'gemini-pro').
- combine_infographic_audio: Set to True to create the final video (Requires video extra).
- verbose: Set to True for detailed logging.
- translate: Translate all outputs to a target language. Format: {model}-{language} e.g. gemini-3-flash-preview-es, or aws-translate-{language} e.g. aws-translate-es to use AWS Translate directly, or gcp-translate-{language} e.g. gcp-translate-es to use Google Cloud Translation directly.
Model Selection Strategy:
- Pro (Default): all_suite='gemini-pro' (best for video quality).
- Flash: all_suite='gemini-flash' (faster).
Language Handling:
- "spanish" or "es" -> translate='gemini-3-flash-preview-es'
- "french" or "fr" -> translate='gemini-3-flash-preview-fr'
- Default -> omit translate (English only)

4. Custom / Advanced Usage

Use this when the user specifies particular models or output locations.

Output Locations:
- Local: Default.
- Google Drive: output_file='workspace' (Requires workspace extra).
- SharePoint: output_file='sharepoint' (Requires m365 extra).
- Memory: output_file='memory' (keeps artifacts in memory, no files on disk).
Transcription Source:
- Default is YouTube captions.
- To use AI for transcription (STT), set transcript_source to a model name (e.g., 'gemini-3-flash-preview' or 'gcp-chirp3').
- Note: gcp- models require PROJECT_ID and optional YTD_GCS_BUCKET_NAME environment variables.

Tool Reference: `youtube-to-docs:process_video`

Argument	Description	Required Extra	Examples
`url`	Required. YouTube URL, ID, Playlist ID, or Channel Handle.	-	`https://youtu.be/...`, `@channel`
`model`	LLM for summaries/Q&A.	`gcp` / `azure`	`gemini-3-flash-preview`
`infographic_model`	Model for generating the infographic image.	`gcp`	`gemini-3-pro-image-preview`
`alt_text_model`	Model for generating multimodal alt text for the infographic.	`gcp`	`gemini-3-flash-preview`
`tts_model`	Model for text-to-speech audio.	`gcp`	`gemini-2.5-flash-preview-tts-Kore`, `gcp-chirp3-Kore`
`all_suite`	Shortcut to apply a suite of models.	`gcp`, `audio`, `video`	`gemini-pro`, `gemini-flash`
`combine_infographic_audio`	Boolean. If True, creates an MP4 video.	`video`	`True`
`translate`	Translate all outputs to a target language. Format: `{model}-{language}`, `aws-translate-{language}`, or `gcp-translate-{language}`.	-	`gemini-3-flash-preview-es`, `aws-translate-es`, `gcp-translate-es`
`post_process`	Post-process the transcript with JSON operations. Results added as CSV columns.	-	`'{"word count": "apple"}'`, `'{"word count": ["apple", "banana"]}'`
`output_file`	Destination for the CSV report.	`workspace` / `m365`	`workspace`, `sharepoint`, `memory`
`transcript_source`	Source for transcript (default: 'youtube').	`audio`, `gcp` (for Chirp)	`gemini-3-flash-preview`, `gcp-chirp3`

Examples

User: "Get me a transcript of this video." Action: Call youtube-to-docs:process_video(url='...')

User: "Make an infographic for this video using Gemini Pro." Action: Call youtube-to-docs:process_video(url='...', model='gemini-3.1-pro-preview', infographic_model='gemini-3-pro-image-preview')

User: "Do a kitchen sink run on this video in Spanish." Action: Call youtube-to-docs:process_video(url='...', all_suite='gemini-pro', combine_infographic_audio=True, verbose=True, translate='gemini-3-flash-preview-es')

User: "Summarize this playlist and save it to Drive." Action: Call youtube-to-docs:process_video(url='PL...', model='gemini-3-flash-preview', output_file='workspace')

User: "Count how many times 'apple' appears in this video's transcript." Action: Call youtube-to-docs:process_video(url='...', post_process='{"word count": "apple"}')

Development & CLI Usage

While this skill primarily uses the youtube-to-docs:process_video tool, you can also run the underlying CLI manually for testing or development.

Note on CLI Syntax: The video URL/ID is a positional argument and is required. Do NOT use --url.

Always use uv to run the tool (do not use python directly) to ensure dependencies are correctly resolved:

# General Syntax:
uv run youtube-to-docs <video_url_or_id> [options]

# Example: Get transcript
uv run youtube-to-docs https://www.youtube.com/watch?v=B0x2I_doX9o

# Example: Kitchen sink with gemini-pro suite
uv run youtube-to-docs B0x2I_doX9o --all gemini-pro --verbose

# Example: Translate to Spanish
uv run youtube-to-docs B0x2I_doX9o -m gemini-3-flash-preview -tr gemini-3-flash-preview-es

# Example: Post-process transcript to count word occurrences
uv run youtube-to-docs B0x2I_doX9o -pp '{"word count": ["apple", "banana"]}'

See docs/usage.md for full documentation and docs/development.md for setup details.

MCP Configuration: The MCP server definition is located in .mcp.json. It is explicitly configured to use uv with --all-extras to ensure the correct environment and dependencies are used:

"command": "uv",
"args": [ ..., "run", "--all-extras", "python", "-m", "youtube_to_docs.mcp_server" ]

youtube-to-docs

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

youtube-to-docs

Popularity

Invocation

Context Preview

SKILL.md

YouTube to Docs

Overview

Requirements & Dependencies

Workflows

1. Transcript Extraction

2. Infographic Generation

3. Kitchen Sink (Comprehensive Processing)

4. Custom / Advanced Usage

Tool Reference: youtube-to-docs:process_video

Examples

Development & CLI Usage

Similar Skills

Help us improve

YouTube to Docs

Overview

Requirements & Dependencies

Workflows

1. Transcript Extraction

2. Infographic Generation

3. Kitchen Sink (Comprehensive Processing)

4. Custom / Advanced Usage

Tool Reference: youtube-to-docs:process_video

Examples

Development & CLI Usage

Tool Reference: `youtube-to-docs:process_video`

Tool Reference: `youtube-to-docs:process_video`