From grabber-development
Comprehensive Python web scraping knowledge base covering stealth browser automation (Patchright, Camoufox, Nodriver), TLS/HTTP fingerprint impersonation (curl_cffi, primp), anti-bot bypass (Cloudflare, DataDome, PerimeterX), CAPTCHA solving, proxy architecture, AI-assisted extraction (Crawl4AI, Firecrawl, ScrapeGraphAI), framework selection (Scrapy, Crawlee), rate limiting, and production observability. TRIGGER WHEN: building, optimizing, or debugging Python web scrapers. DO NOT TRIGGER WHEN: the task is outside the specific scope of this component.
npx claudepluginhub acaprino/alfio-claude-plugins --plugin grabber-developmentThis skill uses the workspace's default tool permissions.
Knowledge base for building production-grade Python web scraping systems. Covers the full stack from target assessment through production observability.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Knowledge base for building production-grade Python web scraping systems. Covers the full stack from target assessment through production observability.
For every scraping task, follow this sequence:
page.on("request") and page.on("response")<script> JSON before parsing DOMimpersonate="chrome" -- done| Target Profile | HTTP Client | Browser | Framework |
|---|---|---|---|
| No JS, no protection | curl_cffi | none | Scrapy / httpx |
| JS-rendered, no protection | none | Playwright | Crawlee |
| Basic Cloudflare | curl_cffi + cf_clearance | Patchright (for cookie) | Scrapy |
| Heavy Cloudflare | none | Patchright persistent | Crawlee |
| DataDome | none | Camoufox + ghost-cursor | custom |
| PerimeterX | none | Nodriver / Patchright | custom |
| AI extraction needed | none | Crawl4AI / Firecrawl | standalone |
| Tier | Type | Price Range | Use When |
|---|---|---|---|
| 0 | No proxy | free | Unprotected targets, development |
| 1 | Datacenter | $0.10-0.50/GB | Light protection, high volume |
| 2 | ISP (static residential) | $0.53-1.47/IP | Account management, login flows |
| 3 | Residential | $0.49-8.00/GB | Anti-bot bypass, geo-targeting |
| 4 | Mobile | $4-13/GB | Highest trust, last resort |
field-guide.md -- full 2025-2026 Python web scraping field guide covering browser stealth, TLS fingerprinting, behavioral biometrics, anti-bot bypass, CAPTCHA solving, proxy landscape, frameworks, AI-assisted scraping, GraphQL reverse engineering, rate limiting, and observability