English | 한국어
insane-search
The scraper that's too stubborn to quit.
403. WAF. CAPTCHA. Empty SPA. Login wall. When every normal tool taps out, insane-search is just getting started. Five probe phases. Auto-installs TLS impersonation. Discovers hidden APIs through a real browser. Tries everything — and for every site that claimed to be "blocked," something always works.
No API keys. No signup. No config. Install, and watch Claude Code stop giving up.
Quick Start • How it works • What's in the index • References • Requirements
Quick Start
1. Add the marketplace
/plugin marketplace add https://github.com/fivetaku/gptaku_plugins.git
2. Install the plugin
/plugin install insane-search
3. Restart Claude Code
That's it. No config, no API keys, no env vars.
4. Start asking
Just talk normally. Blocked sites will be unblocked automatically.
"Show me what's trending on r/LocalLLaMA"
"What did @openclaw post on X recently?"
"Search X for posts about insane-search"
"Summarize this YouTube video"
"Search Coupang for under ₩100,000 keyboards"
"Read this Naver blog post for me"
"네이버에서 클로드코드 관련 뉴스 찾아줘"
"Find LinkedIn articles about Claude Code plugins"
Why insane-search?
- It doesn't know the word "blocked" — No pre-judged "this site can't be accessed" labels. Every site gets the full chain. Coupang? Coupang falls. LinkedIn? Full article body extracted. Yozm? Chrome UA and done
- Identity spoofing built in — Phase 2 doesn't just swap TLS fingerprints. It builds a full browser identity: homepage cookie warming, referrer chains, locale-matched headers. Sites like fmkorea (HTTP 430) and LinkedIn (login wall) fall to this alone
- Intent routing — "Fetch this URL" and "Search X for this keyword" are different problems. insane-search routes keywords through WebSearch or Naver Search first, gets URLs, then fetches content. Two-stage pipeline, automatic
- Installs its own weapons — Missing
curl_cffi for TLS fingerprint bypass? Installs it. Missing feedparser? Installs it. Missing yt-dlp? Installs it. You don't even notice
- 5 probe phases, not 1 — WebFetch → Jina → curl UA/URL variants → TLS impersonation with identity spoofing → real browser. Each phase escalates only when the previous hits a wall
- Finds hidden APIs — Phase 3 doesn't just render the page. It watches the browser's network traffic, catches the actual JSON API the site uses internally, and hands it back for reuse
- Zero setup friction — No API keys, no OAuth, no developer portals. Everything runs on public endpoints and auto-installable libraries
How it works
When Claude Code needs to fetch a URL, insane-search runs a 4-phase adaptive scheduler. Each phase only runs if the previous phase failed or detected specific blocking signals.
Phase 0: Special endpoint index
↓ not in index or failed
Phase 1: Lightweight probes (parallel)
• WebFetch + Jina Reader
• curl with Chrome / mobile / Googlebot UAs
• URL variants: m.{domain}, .json, /rss, /feed
• Sidecar: AMP cache, archive.today, Wayback (low-trust)
↓ 403/429/WAF headers/challenge body detected
Phase 2: TLS impersonation + identity spoofing
• curl_cffi with safari → chrome → firefox
• Identity spoofing: homepage cookie warming → referrer chain → locale headers
• Behavioral challenge detection (Akamai _abck) → skip to Phase 3
• Auto-installs if missing: pip install curl_cffi
↓ TLS bypass failed or JS challenge detected
Phase 3: Full browser
• Playwright MCP (browser_navigate → snapshot → evaluate)
• Also discovers hidden APIs via network_requests
↓ login/paywall detected
Exit: "authentication required" — no amount of phases will fix this
Core principle: don't pre-exclude any method. Don't skip a method because a dependency is missing — install it and try. Don't skip because a site is "known to be hard" — the site changes, and the method might work now.
Every HTML response is also scanned for OGP tags and JSON-LD structured data — so even partial responses yield titles, summaries, prices, or profile info.
What's in the index
Only special endpoints that the generic chain can't discover on its own. Everything else — Naver blogs, Coupang, LinkedIn, Medium, Korean news sites, Substack, most forums — is handled by the adaptive scheduler without explicit entries.
Platform-specific APIs