Help us improve
Share bugs, ideas, or general feedback.
From yt-is
YouTube transcript extraction via Selenium Firefox browser automation (bypasses bot detection)
npx claudepluginhub enduser123/yt-isHow this skill is triggered — by the user, by Claude, or both
Slash command
/yt-is:yt-seleniumThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract YouTube transcripts using Selenium Firefox browser automation. Bypasses YouTube's TLS fingerprinting bot detection by running a real Firefox browser with your authenticated session.
Creates p5.js generative art with seeded randomness, noise fields, and interactive parameter exploration. Use for algorithmic art, flow fields, or particle systems.
Share bugs, ideas, or general feedback.
Extract YouTube transcripts using Selenium Firefox browser automation. Bypasses YouTube's TLS fingerprinting bot detection by running a real Firefox browser with your authenticated session.
Selenium is a fallback method when faster approaches fail due to bot detection:
| Method | Speed | Reliability | When to Use |
|---|---|---|---|
| yt-dlp (WEB client) | Fast (~5s) | High | Public videos, no bot detection |
| yt-dlp + cookies | Medium (~10s) | Medium | Age-restricted videos |
| Selenium Firefox | Slow (~15-30s) | High | Bot-check failures, TLS blocking |
| Whisper | Very slow (~60s) | Very High | No captions available |
# Dry run: show pending videos
python -m csf.csf_selenium
# Extract transcripts (all channels)
python -m csf.csf_selenium --run
# Specific channel only
python -m csf.csf_selenium --run --channel "https://youtube.com/@channel"
# Language preference
python -m csf.csf_selenium --run --lang es
# Parallel workers (default: 1)
python -m csf.csf_selenium --run --workers 2
For each video:
transcripts.sqlite via csf.cacheThe skill searches for Firefox profiles in this order:
*.Profile 1* (dedicated download profile, preferred).default/.default-releaseOption 1: Use existing profile
# Find your Firefox profile
ls "$APPDATA/Mozilla/Firefox/Profiles/"
# Use profile path in skill invocation
python -m csf.csf_selenium --run --profile "ProfileForDownloading"
Option 2: Create dedicated profile
about:profilesSelenium has built-in rate limit protection:
| Metric | Value |
|---|---|
| Jitter range | 2-10 seconds between requests |
| Circuit breaker | Opens after 3 consecutive 429s |
| Cooldown duration | 5 minutes (300 seconds) |
| Backoff multiplier | 2x per consecutive failure (max 32x) |
batch_status.sqlite (pending videos marked by /yt-is)transcripts.sqlite via csf.cacheBatchScheduler/yt-dlp (can run both, compare results)| File | Purpose |
|---|---|
csf/csf_selenium.py | CLI entry point and main loop |
csf/transcript.py | _fetch_via_selenium_firefox() implementation |
csf/cache.py | Transcript caching (set_cached_transcript()) |
csf/batch_scheduler.py | Round-robin scheduling and rate limit tracking |
/yt-is sync
↓
batch_status.sqlite (pending videos)
↓
python -m csf.csf_selenium --run
↓
For each video:
1. Launch Firefox with profile
2. Navigate to YouTube page
3. Click transcript button
4. Extract transcript text
5. Cache to transcripts.sqlite
↓
Complete: all videos processed
pip install selenium)# Windows (using winget)
winget install Mozilla.GeckoDriver
# Or download manually
# https://github.com/mozilla/geckodriver/releases
| Error | Cause | Resolution |
|---|---|---|
selenium not installed | Missing Python package | pip install selenium |
geckodriver not found | WebDriver not on PATH | Install geckodriver |
transcript button not found | Video has no captions | Skip, no transcript available |
transcript panel was empty | Transcript loading failed | Retry or use different method |
rate limited (429) | Too many requests | Circuit breaker opens, waits 5 minutes |
| Metric | Value |
|---|---|
| Per-video time | 15-30 seconds (browser overhead) |
| Throughput (1 worker) | ~2-4 videos/minute |
| Throughput (2 workers) | ~4-8 videos/minute |
| Memory per worker | ~500MB (Firefox process) |
| CPU usage | Moderate (browser rendering) |
Note: Selenium is slower than API-based methods due to browser overhead, but more reliable against bot detection.
/yt-dlp — Fast transcript download (try first)/yt-nlm — NotebookLM transcript ingestion (high quality)/yt-is — Video discovery and trackingSee fallback chain documentation in csf/transcript.py:
# Chain order (fetch_transcript_chain):
# 1. yt-dlp (WEB client, curl_cffi TLS)
# 2. yt-dlp with English fallback
# 3. yt-dlp with any available language
# 4. yt-dlp with cookies (age-restricted)
# 5. Selenium Firefox ← This skill
# 6. Selenium Firefox with English fallback
# 7. Selenium Firefox with any available language