Processes YouTube videos to extract transcripts, generate infographics, create TTS audio summaries and videos. Useful for video-to-document conversion and visualization.
From youtube-to-docsnpx claudepluginhub doit-artificial-intelligence/youtube-to-docs --plugin youtube-to-docsThis skill uses the workspace's default tool permissions.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Uses ctx7 CLI to fetch current library docs, manage AI coding skills (install/search/generate), and configure Context7 MCP for AI editors.
This skill allows you to process YouTube videos to extract transcripts, generate AI summaries, create infographics, and even produce video summaries. You have access to the youtube-to-docs:process_video tool which handles these operations.
The youtube-to-docs:process_video tool is a high-level interface that relies on several optional libraries ("extras") and system binaries to function. These are managed automatically when running via the provided MCP configuration or uv.
combine_infographic_audio) require ffmpeg (handled by the static-ffmpeg library)..mcp.json) uses uv run --all-extras to ensure all necessary libraries are installed in a managed environment before execution.Use this when the user simply wants the text transcript of a video, without additional AI processing.
youtube-to-docs:process_videourl (The YouTube link)process_video fetches the transcript from YouTube.Use this when the user wants a visual summary or "infographic" representing the video's content.
youtube-to-docs:process_videourl (The YouTube link)infographic_model: The image generation model to use.model: The text model for summarization (required context for the image).model='gemini-3.1-pro-preview'infographic_model='gemini-3-pro-image-preview'model='gemini-3-flash-preview'infographic_model='gemini-3.1-flash-image-preview'process_video generates multimodal alt text using the summary model (image-to-text) for any created infographic. Use alt_text_model to override the model for this step.Use this when the user asks for "everything", a "kitchen sink" run, or a "video summary". This generates transcripts, text summaries, Q&A, audio summaries (TTS), infographics, and combines them into a video file.
youtube-to-docs:process_videourl (The YouTube link)all_suite: Shortcut to set models ('gemini-flash' or 'gemini-pro').combine_infographic_audio: Set to True to create the final video (Requires video extra).verbose: Set to True for detailed logging.translate: Translate all outputs to a target language. Format: {model}-{language} e.g. gemini-3-flash-preview-es, or aws-translate-{language} e.g. aws-translate-es to use AWS Translate directly, or gcp-translate-{language} e.g. gcp-translate-es to use Google Cloud Translation directly.all_suite='gemini-pro' (best for video quality).all_suite='gemini-flash' (faster).translate='gemini-3-flash-preview-es'translate='gemini-3-flash-preview-fr'translate (English only)Use this when the user specifies particular models or output locations.
output_file='workspace' (Requires workspace extra).output_file='sharepoint' (Requires m365 extra).output_file='memory' (keeps artifacts in memory, no files on disk).transcript_source to a model name (e.g., 'gemini-3-flash-preview' or 'gcp-chirp3').gcp- models require PROJECT_ID and optional YTD_GCS_BUCKET_NAME environment variables.youtube-to-docs:process_video| Argument | Description | Required Extra | Examples |
|---|---|---|---|
url | Required. YouTube URL, ID, Playlist ID, or Channel Handle. | - | https://youtu.be/..., @channel |
model | LLM for summaries/Q&A. | gcp / azure | gemini-3-flash-preview |
infographic_model | Model for generating the infographic image. | gcp | gemini-3-pro-image-preview |
alt_text_model | Model for generating multimodal alt text for the infographic. | gcp | gemini-3-flash-preview |
tts_model | Model for text-to-speech audio. | gcp | gemini-2.5-flash-preview-tts-Kore, gcp-chirp3-Kore |
all_suite | Shortcut to apply a suite of models. | gcp, audio, video | gemini-pro, gemini-flash |
combine_infographic_audio | Boolean. If True, creates an MP4 video. | video | True |
translate | Translate all outputs to a target language. Format: {model}-{language}, aws-translate-{language}, or gcp-translate-{language}. | - | gemini-3-flash-preview-es, aws-translate-es, gcp-translate-es |
post_process | Post-process the transcript with JSON operations. Results added as CSV columns. | - | '{"word count": "apple"}', '{"word count": ["apple", "banana"]}' |
output_file | Destination for the CSV report. | workspace / m365 | workspace, sharepoint, memory |
transcript_source | Source for transcript (default: 'youtube'). | audio, gcp (for Chirp) | gemini-3-flash-preview, gcp-chirp3 |
User: "Get me a transcript of this video."
Action: Call youtube-to-docs:process_video(url='...')
User: "Make an infographic for this video using Gemini Pro."
Action: Call youtube-to-docs:process_video(url='...', model='gemini-3.1-pro-preview', infographic_model='gemini-3-pro-image-preview')
User: "Do a kitchen sink run on this video in Spanish."
Action: Call youtube-to-docs:process_video(url='...', all_suite='gemini-pro', combine_infographic_audio=True, verbose=True, translate='gemini-3-flash-preview-es')
User: "Summarize this playlist and save it to Drive."
Action: Call youtube-to-docs:process_video(url='PL...', model='gemini-3-flash-preview', output_file='workspace')
User: "Count how many times 'apple' appears in this video's transcript."
Action: Call youtube-to-docs:process_video(url='...', post_process='{"word count": "apple"}')
While this skill primarily uses the youtube-to-docs:process_video tool, you can also run the underlying CLI manually for testing or development.
Note on CLI Syntax: The video URL/ID is a positional argument and is required. Do NOT use --url.
Always use uv to run the tool (do not use python directly) to ensure dependencies are correctly resolved:
# General Syntax:
uv run youtube-to-docs <video_url_or_id> [options]
# Example: Get transcript
uv run youtube-to-docs https://www.youtube.com/watch?v=B0x2I_doX9o
# Example: Kitchen sink with gemini-pro suite
uv run youtube-to-docs B0x2I_doX9o --all gemini-pro --verbose
# Example: Translate to Spanish
uv run youtube-to-docs B0x2I_doX9o -m gemini-3-flash-preview -tr gemini-3-flash-preview-es
# Example: Post-process transcript to count word occurrences
uv run youtube-to-docs B0x2I_doX9o -pp '{"word count": ["apple", "banana"]}'
See docs/usage.md for full documentation and docs/development.md for setup details.
MCP Configuration:
The MCP server definition is located in .mcp.json. It is explicitly configured to use uv with --all-extras to ensure the correct environment and dependencies are used:
"command": "uv",
"args": [ ..., "run", "--all-extras", "python", "-m", "youtube_to_docs.mcp_server" ]