Help us improve
Share bugs, ideas, or general feedback.
From obsidian-vault-agent
Crawls online course pages to extract lecture schedules, slides, and videos, then synthesizes structured vault notes per lecture. Also refines existing course notes.
npx claudepluginhub tuan3w/obsidian-vault-agent --plugin obsidian-vault-agentHow this skill is triggered — by the user, by Claude, or both
Slash command
/obsidian-vault-agent:course <course-url> [lecture range, e.g. 1-3]<course-url> [lecture range, e.g. 1-3]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
<Purpose>
Generates structured course documents from transcripts or literature, archives to knowledge base, and extracts customized training content with term correction, image handling, and source cleanup. Use for organizing training materials.
Runs CLI workflow for Tongji Look: stores IAM credentials, lists recent courses, transcribes lecture videos to SRT/TXT, downloads slide snapshots, generates Markdown study notes from transcripts and slides.
Transforms textbook chapters into Google Slides lectures with learning outcomes, narrative design, active learning activities via Google Docs MCP. For university instructors.
Share bugs, ideas, or general feedback.
Process mode (default): Crawl an online course page, extract the lecture schedule with all materials (slides PDFs, YouTube videos, readings), then process each lecture into a detailed vault-formatted note with embedded slide images and synthesized transcripts. Creates a Course index note linking everything together.
Refine mode: Re-read existing course lecture notes and improve them — deepen equation explanations, add missing analogies, fix structure, strengthen cross-references. Triggered when user says "refine", "improve", "fix", or references existing course notes rather than a URL.
This is an orchestrator — it coordinates downloading, extraction, and synthesis across multiple lectures, producing a complete course knowledge package in the vault.
<Use_When>
<Do_Not_Use_When>
Parse the course URL from $ARGUMENTS. If no URL, ask the user.
Fetch the course page and extract structured lecture data:
WebFetch(
url="COURSE_URL",
prompt="Extract the complete course structure as JSON. For each lecture/session, include:
- number (int)
- title (string)
- date (string, if available)
- slides_url (string or null — look for PDF links to slides/lecture notes)
- video_url (string or null — look for YouTube links)
- readings (array of {title, url} — papers, blog posts, textbook chapters)
- description (string or null — any summary text)
Also extract:
- course_title (string)
- course_code (string or null)
- instructors (array of strings)
- course_url (string — the page URL)
- course_notes_url (string or null — if there's a single PDF of all course notes)
Return ONLY valid JSON, no markdown fencing."
)
Parse the JSON response. If the page has relative URLs for slides/videos, resolve them against the course URL's base.
Handle edge cases:
../docs/lecture_01.pdf): resolve to absolute URLsGenerate a short, memorable course hashtag from the course code or title. This tag will be used consistently across ALL notes for this course.
Rules for the tag:
#mit-diffusion, #cs231n-vision, #stanford-rl, #fast-ai-dlinstitution-topic or code-topic — lowercase, hyphensAlso derive a COURSE_SLUG for asset folder naming (same as tag without #).
Example: assets folder assets/mit-diffusion/, tag #mit-diffusion.
Present the extracted lecture list to the user in a clear table:
Found N lectures for "Course Title":
Course tag: #6s184
Assets folder: assets/6s184/
| # | Title | Slides | Video | Readings |
|---|-------|--------|-------|----------|
| 1 | Topic | PDF | YT | 2 papers |
| 2 | Topic | PDF | — | 1 paper |
...
Which lectures should I process? (default: all)
Options: "all", "1-3", "1,3,5", or specific numbers
If $ARGUMENTS includes a range (e.g., "1-3"), skip confirmation and use that range.
Not all sources are equal. A 50-minute video transcript where the instructor explains intuition, tells stories, and works through examples is 10x richer than a terse slide deck with equations and bullet points. The skill must be smart about which sources to use:
Priority order (use the best available, not just one):
Key rule: When a YouTube video exists, ALWAYS extract its transcript even if slides are also available. The transcript is the primary content source; slides are supplementary visual aids.
For each lecture in scope, spawn a parallel subagent. Each subagent does:
SKILL_DIR="${CLAUDE_SKILL_DIR}"
COURSE_SLUG="mit-diffusion" # derived from course tag
LECTURE_NUM="01"
SLIDES_DIR="temp/course-slides-${COURSE_SLUG}"
mkdir -p "$SLIDES_DIR"
# Download PDF
curl -sL "SLIDES_PDF_URL" -o "$SLIDES_DIR/lecture-${LECTURE_NUM}.pdf"
# Convert PDF pages to images
uv run "$SKILL_DIR/scripts/extract_pdf_slides.py" \
"$SLIDES_DIR/lecture-${LECTURE_NUM}.pdf" \
--output-dir "$SLIDES_DIR/lecture-${LECTURE_NUM}-frames" \
--prefix "${COURSE_SLUG}-L${LECTURE_NUM}"
The script outputs images and a manifest JSON. Copy selected frames to a
course-specific subfolder in assets/:
ASSETS_DIR="assets/${COURSE_SLUG}"
mkdir -p "$ASSETS_DIR"
cp "$SLIDES_DIR/lecture-${LECTURE_NUM}-frames"/*.png "$ASSETS_DIR/"
All slides for this course live in assets/6s184/, keeping them organized
and easy to find. The embed syntax still works: ![[6s184-L01-03.png]]
(Obsidian resolves short names across subfolders).
Reuse the youtube skill's fetch script:
YT_SKILL_DIR="${CLAUDE_PLUGIN_ROOT}/skills/youtube"
YT_OUTPUT="temp/course-yt-${COURSE_SLUG}-L${LECTURE_NUM}.json"
uv run "$YT_SKILL_DIR/scripts/fetch_youtube.py" "VIDEO_URL" --lang en > "$YT_OUTPUT"
Read the JSON output. If transcript fails, proceed with slides only.
If course notes PDF exists: Read the relevant section from the course notes PDF. This is often richer than individual slide PDFs — it's written prose with explanations, not just bullet points. Pass this to the noter agent as primary text content alongside any transcript.
If no video transcript and no course notes: Read the slide PDF directly via the Read tool. This is the weakest source — the noter agent must reconstruct meaning from terse bullets and equations. Flag this in the agent prompt so it knows to add more explanatory context.
Read the agent definition:
Read("${CLAUDE_SKILL_DIR}/agents/course-noter.md")
Launch the course-noter agent:
Agent(
subagent_type="general-purpose",
model="sonnet",
run_in_background=true,
prompt="You are Course Noter. Follow these instructions exactly:
[INSERT FULL CONTENT OF agents/course-noter.md HERE]
COURSE CONTEXT:
- Course: [course_title]
- Lecture [number] of [total]: [lecture_title]
- Date: [date]
- Instructors: [instructors]
- Other lectures in this course: [list of other lecture titles for cross-referencing]
SOURCE LINKS (include these in the note's Source Materials section):
- Video: [YouTube URL or null]
- Slides PDF: [PDF URL or null]
- Course page: [course URL]
CONTENT QUALITY TIER: [one of: transcript+slides, transcript-only, notes+slides, slides-only]
(If slides-only: you're working from terse bullet points and equations.
Work harder to explain WHY each concept matters, add intuitive analogies,
and fill in the reasoning the instructor would have spoken aloud.)
SLIDE FRAMES (for ![[embedding]]):
[frame manifest — filename, page number, brief description of each slide]
EXISTING VAULT NOTES ON RELATED TOPICS:
[search results from vault]
TRANSCRIPT (if available — this is your PRIMARY content source):
[full transcript text]
COURSE NOTES TEXT (if available — richer than slides alone):
[relevant section from course notes PDF]
PDF TEXT CONTENT (fallback when no transcript or course notes):
[extracted PDF text]
READINGS LISTED:
[titles and URLs of any readings for this lecture]
Produce the note body following the Output Format. Do NOT include frontmatter."
)
For each processed lecture, create a note file:
---
id: YYYYMMDDHHMMSS
type: lecture
processing_status: inbox
link: "COURSE_URL"
created_date: YYYY-MM-DD
updated_date: YYYY-MM-DD
---
[AGENT OUTPUT — starts with # title]
Naming convention: (Lecture) Short Tag - L01 Topic Title.md
Example: (Lecture) MIT Diffusion - L01 Flow and Diffusion Models.md
The short tag in the filename matches the course tag (capitalized for readability).
Place in notes/ml/ for ML/AI courses, notes/ + appropriate subfolder for others.
---
id: YYYYMMDDHHMMSS
type: course
processing_status: inbox
link: "COURSE_URL"
created_date: YYYY-MM-DD
updated_date: YYYY-MM-DD
---
# (Course) Course Title
- **🏷️Tags** : #course #mit-diffusion #diffusion #flow-matching #MM-YYYY
## Overview
- **Institution**: MIT / Stanford / etc.
- **Instructors**: Names
- **Lectures processed**: N of M
- **Course tag**: `#mit-diffusion` — use this to find all notes from this course
## Lectures
- [[(Lecture) MIT Diffusion - L01 Flow and Diffusion Models]] — one-line summary
- [[(Lecture) MIT Diffusion - L02 Flow Matching]] — one-line summary
- ...
## Course Materials
- [Course page](URL)
- [Course notes PDF](URL) (if available)
## Key Concepts Across Lectures
- [[(Term) Concept]] — appears in L01, L03, L05
- [[(Term) Another Concept]] — introduced in L02, applied in L04
## Related links
- [[(Type) Related Vault Note]] — connection
Show:
<Tool_Usage>
<Escalation_And_Stop_Conditions>
<Refine_Mode>
If $ARGUMENTS contains NO URL and instead references existing notes ("refine", "improve", "fix", mentions a course name or lecture number, or complains about note quality), enter refine mode instead of process mode.
Search the vault for the course index note and all lecture notes:
Glob(pattern="notes/**/*Course*KEYWORD*.md")
Glob(pattern="notes/**/*Lecture*KEYWORD*.md")
If ambiguous, ask the user which course.
For each lecture in scope:
Present a brief diagnosis to the user: "Here's what I'd fix in each note..." Get confirmation before proceeding.
Spawn one subagent per lecture note. Each subagent:
The agent prompt should include:
Show the user what changed in each note — section-level summary, not diffs. Suggest running /process on individual notes for deeper engagement.
</Refine_Mode>
$ARGUMENTS