Skill

obscura-pipeline

Multi-step web data collection pipeline using Obscura. Combines fetch + scrape for discover-then-collect workflows. Use when you need to crawl a site systematically.

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/obscura:obscura-pipeline

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Multi-step data collection with Obscura. Combines `obscura fetch` (discover) and `obscura scrape` (collect) into a pipeline.

SKILL.md

122 lines · ~974 tokens

Stats

LanguageRust

Stars2

Forks1

MaintenanceExcellent

Last CommitJun 25, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Usage

/obscura-pipeline <index-url> [--extract-links-selector <css>] [--eval <js>] [--concurrency <N>]

Core pattern

1. fetch index/listing page  →  extract target URLs
2. filter/deduplicate URLs
3. scrape target URLs in parallel  →  extract data
4. aggregate results

Step-by-step instructions

Step 1 — Discover URLs

obscura fetch <index-url> --quiet --dump links

Or with selector for specific link groups:

obscura fetch <index-url> --quiet \
  --eval "JSON.stringify(Array.from(document.querySelectorAll('<selector> a')).map(a => a.href))"

Step 2 — Filter

Remove duplicates
Remove off-domain links
Remove pagination/nav links if not needed

Step 3 — Collect

obscura scrape <url1> <url2> ... \
  --eval "<extraction expression>" \
  --concurrency 5 \
  --format json

Step 4 — Aggregate

Parse JSON output and structure the result as the user needs (table, list, file).

Concrete examples

Blog post index → all post titles + content

# Discover
obscura fetch https://example.com/blog --quiet --dump links

# Collect (after filtering to post URLs)
obscura scrape https://example.com/blog/post-1 https://example.com/blog/post-2 \
  --eval "JSON.stringify({title: document.title, body: document.querySelector('article')?.innerText})" \
  --format json

Hacker News front page → titles + scores

obscura fetch https://news.ycombinator.com --quiet \
  --eval "JSON.stringify(Array.from(document.querySelectorAll('.athing')).map(el => ({
    title: el.querySelector('.titleline > a')?.textContent,
    url: el.querySelector('.titleline > a')?.href,
    score: document.getElementById('score_' + el.id)?.textContent
  })))"

Product listing → all product pages → prices

# Step 1: get product links
obscura fetch https://shop.example.com/products --quiet \
  --eval "JSON.stringify(Array.from(document.querySelectorAll('.product-link')).map(a => a.href))"

# Step 2: scrape each product
obscura scrape <product-url-1> <product-url-2> ... \
  --eval "JSON.stringify({name: document.querySelector('h1')?.textContent, price: document.querySelector('.price')?.textContent})" \
  --format json

Limitations & when to stop

Situation	Action
Login required at any step	Stop — use Playwright instead
Infinite scroll / load-more button	Use `--selector` to wait, but click not possible
> 50 URLs	Split into batches of 20–30
Rate limiting / 429 errors	Drop `--concurrency` to 2, add delay between batches
CAPTCHA	Stop — obscura cannot solve CAPTCHAs

Pagination handling

For index pages with pagination (e.g., blog page 1, 2, 3...):

Fetch first index page
Extract pagination links (look for "Next", ">", rel="next", /page/N patterns)
Fetch each subsequent index page to discover more URLs
Collect ALL discovered URLs across all pagination pages
Then run the scrape step on the complete URL set

# Detect pagination
obscura fetch https://example.com/blog --quiet \
  --eval "JSON.stringify(Array.from(document.querySelectorAll('.pagination a, a[rel=next]')).map(a => ({text: a.textContent.trim(), href: a.href})))"

Multi-level pipeline

For deeper discovery (e.g., category pages → product listing → product pages):

Level 0: fetch index → extract category URLs
Level 1: scrape categories → extract item URLs
Level 2: scrape items → extract data

At each level, apply URL deduplication and same-domain filtering before proceeding.

obscura-pipeline

Popularity

Invocation

Context Preview

SKILL.md

obscura-pipeline

Popularity

Invocation

Context Preview

SKILL.md

Usage

Core pattern

Step-by-step instructions

Step 1 — Discover URLs

Step 2 — Filter

Step 3 — Collect

Step 4 — Aggregate

Concrete examples

Blog post index → all post titles + content

Hacker News front page → titles + scores

Product listing → all product pages → prices

Limitations & when to stop

Pagination handling

Multi-level pipeline

Similar Skills

Usage

Core pattern

Step-by-step instructions

Step 1 — Discover URLs

Step 2 — Filter

Step 3 — Collect

Step 4 — Aggregate

Concrete examples

Blog post index → all post titles + content

Hacker News front page → titles + scores

Product listing → all product pages → prices

Limitations & when to stop

Pagination handling

Multi-level pipeline

Similar Skills