Help us improve
Share bugs, ideas, or general feedback.
From obscura
Multi-step web data collection pipeline using Obscura. Combines fetch + scrape for discover-then-collect workflows. Use when you need to crawl a site systematically.
npx claudepluginhub epicsagas/obscura-plugin --plugin obscuraHow this skill is triggered — by the user, by Claude, or both
Slash command
/obscura:obscura-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Multi-step data collection with Obscura. Combines `obscura fetch` (discover) and `obscura scrape` (collect) into a pipeline.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Multi-step data collection with Obscura. Combines obscura fetch (discover) and obscura scrape (collect) into a pipeline.
/obscura-pipeline <index-url> [--extract-links-selector <css>] [--eval <js>] [--concurrency <N>]
1. fetch index/listing page → extract target URLs
2. filter/deduplicate URLs
3. scrape target URLs in parallel → extract data
4. aggregate results
obscura fetch <index-url> --quiet --dump links
Or with selector for specific link groups:
obscura fetch <index-url> --quiet \
--eval "JSON.stringify(Array.from(document.querySelectorAll('<selector> a')).map(a => a.href))"
obscura scrape <url1> <url2> ... \
--eval "<extraction expression>" \
--concurrency 5 \
--format json
Parse JSON output and structure the result as the user needs (table, list, file).
# Discover
obscura fetch https://example.com/blog --quiet --dump links
# Collect (after filtering to post URLs)
obscura scrape https://example.com/blog/post-1 https://example.com/blog/post-2 \
--eval "JSON.stringify({title: document.title, body: document.querySelector('article')?.innerText})" \
--format json
obscura fetch https://news.ycombinator.com --quiet \
--eval "JSON.stringify(Array.from(document.querySelectorAll('.athing')).map(el => ({
title: el.querySelector('.titleline > a')?.textContent,
url: el.querySelector('.titleline > a')?.href,
score: document.getElementById('score_' + el.id)?.textContent
})))"
# Step 1: get product links
obscura fetch https://shop.example.com/products --quiet \
--eval "JSON.stringify(Array.from(document.querySelectorAll('.product-link')).map(a => a.href))"
# Step 2: scrape each product
obscura scrape <product-url-1> <product-url-2> ... \
--eval "JSON.stringify({name: document.querySelector('h1')?.textContent, price: document.querySelector('.price')?.textContent})" \
--format json
| Situation | Action |
|---|---|
| Login required at any step | Stop — use Playwright instead |
| Infinite scroll / load-more button | Use --selector to wait, but click not possible |
| > 50 URLs | Split into batches of 20–30 |
| Rate limiting / 429 errors | Drop --concurrency to 2, add delay between batches |
| CAPTCHA | Stop — obscura cannot solve CAPTCHAs |
For index pages with pagination (e.g., blog page 1, 2, 3...):
rel="next", /page/N patterns)# Detect pagination
obscura fetch https://example.com/blog --quiet \
--eval "JSON.stringify(Array.from(document.querySelectorAll('.pagination a, a[rel=next]')).map(a => ({text: a.textContent.trim(), href: a.href})))"
For deeper discovery (e.g., category pages → product listing → product pages):
Level 0: fetch index → extract category URLs
Level 1: scrape categories → extract item URLs
Level 2: scrape items → extract data
At each level, apply URL deduplication and same-domain filtering before proceeding.