From ScraperAPI
Guides setup and usage of ScraperAPI's DataPipeline for scheduled, managed scraping projects with webhook or dashboard delivery.
How this skill is triggered — by the user, by Claude, or both
Slash command
/scraperapi:scraperapi-datapipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
DataPipeline is a managed scraping product. You define a project (what to scrape, how often,
DataPipeline is a managed scraping product. You define a project (what to scrape, how often, where to send results), and ScraperAPI runs it on your schedule without you managing proxies, retries, or infrastructure.
Use DataPipeline when: scraping runs on a fixed schedule, the input list is large (up to 100,000 items), results should flow to a webhook automatically, or you want email notifications on job completion.
Base URL: https://datapipeline.scraperapi.com/api
Auth: ?api_key=YOUR_KEY (query parameter on every request)
Set projectType in the create request to choose what to scrape:
| Type | Input |
|---|---|
urls | Raw HTML from any URL |
urls_with_js | Same but with JavaScript rendering |
google_search | Search queries |
google_news | Search queries |
google_jobs | Search queries |
google_shopping | Search queries |
google_maps | Search queries |
amazon_product | ASINs |
amazon_search | Search queries |
amazon_offers | ASINs |
walmart_product | Product IDs |
walmart_search | Search queries |
walmart_category | Category IDs |
walmart_reviews | Product IDs |
ebay_product | 12-digit product IDs |
ebay_search | Search queries |
redfin_listing_for_sale | Listing URLs |
redfin_listing_for_rent | Listing URLs |
redfin_listing_search | Search result URLs |
redfin_agent_details | Agent profile URLs |
import os, requests
API_KEY = os.environ["SCRAPERAPI_API_KEY"]
BASE = "https://datapipeline.scraperapi.com/api"
project = requests.post(
f"{BASE}/projects",
params={"api_key": API_KEY},
json={
"name": "Weekly Amazon price monitor",
"projectType": "amazon_product",
"schedulingEnabled": True,
"scrapingInterval": "weekly",
"scheduledAt": "now",
"projectInput": {
"type": "list",
"list": ["B09V3KXJPB", "B08N5WRWNW"] # ASINs
},
"apiParams": {
"country_code": "us"
},
"webhookOutput": {
"url": "https://yourapp.com/pipeline-results",
"webhookEncoding": "multipart_form_data_encoding"
},
"notificationConfig": {
"notifyOnSuccess": "with_every_run",
"notifyOnFailure": "with_every_run"
}
}
).json()
print(f"Project created: id={project['id']}")
| Field | Required | Description |
|---|---|---|
name | No | Human-readable project name |
projectType | Yes | What to scrape (see table above) |
schedulingEnabled | No | true to enable recurring schedule |
scrapingInterval | Yes (if scheduled) | See scheduling options below |
scheduledAt | No | "now" to run immediately on create |
projectInput | Yes | Input data (see input methods below) |
apiParams | No | Standard ScraperAPI parameters |
webhookOutput | No | Webhook delivery config |
notificationConfig | No | Email notification settings |
{
"projectInput": {
"type": "list",
"list": ["query one", "query two", "B09V3KXJPB"]
}
}
Upload a CSV with one URL/query/ASIN per line — no header rows, no commas. Do this through the dashboard when creating a project; the API accepts list inputs only.
{
"projectInput": {
"type": "webhook",
"webhookUrl": "https://yourapp.com/input-items"
}
}
ScraperAPI polls your webhook URL for the item list when the job starts. One item per line; no commas. Useful for dynamically generated lists (e.g., new ASINs added since the last run).
scrapingInterval | Description |
|---|---|
"once" | Run a single job immediately |
"hourly" | Every hour |
"daily" | Once per day |
"weekly" | Once per week |
"monthly" | Once per month |
"cron" | Custom cron expression (use cron field instead of interval) |
Recurring schedules (hourly, daily, weekly, monthly, cron) require a paid plan.
Set "scheduledAt": "now" to trigger the first run immediately when the project is created.
Results are POSTed to your webhook URL as they complete. The webhookEncoding field controls
the format:
{
"webhookOutput": {
"url": "https://yourapp.com/results",
"webhookEncoding": "multipart_form_data_encoding"
}
}
Omit webhookOutput and results are saved for download in the
DataPipeline dashboard. Results are retained for
30 days then automatically deleted.
Output formats by project type:
urls / urls_with_js → HTML wrapped in JSONL# List all projects
projects = requests.get(f"{BASE}/projects", params={"api_key": API_KEY}).json()
# Get a single project
project = requests.get(f"{BASE}/projects/525", params={"api_key": API_KEY}).json()
# Update (partial update — only include fields to change)
requests.patch(
f"{BASE}/projects/525",
params={"api_key": API_KEY},
json={
"scrapingInterval": "daily",
"apiParams": {"premium": True},
"notificationConfig": {"notifyOnSuccess": "never"}
}
)
# Delete / archive (irreversible without support)
requests.delete(f"{BASE}/projects/525", params={"api_key": API_KEY})
Updatable fields: scrapingInterval, scheduledAt, outputFormat, apiParams, notificationConfig.
# List jobs for a project
jobs = requests.get(
f"{BASE}/projects/525/jobs",
params={"api_key": API_KEY}
).json()
# Cancel a running job
requests.delete(
f"{BASE}/projects/525/jobs/{job_id}",
params={"api_key": API_KEY}
)
# Running requests within the job finish first; final status becomes "Cancelled"
A new job can only start if no other job for that project is currently running.
{
"notificationConfig": {
"notifyOnSuccess": "with_every_run",
"notifyOnFailure": "with_every_run"
}
}
Options for both fields: "never", "with_every_run", "daily", "weekly".
apiParams ReferenceAll standard ScraperAPI parameters are supported inside apiParams:
| Parameter | Purpose |
|---|---|
country_code | Geotarget (e.g. "us", "gb") |
render | JavaScript rendering |
premium | Premium residential proxies |
ultra_premium | Ultra-premium proxies (mutually exclusive with premium) |
device_type | "desktop" or "mobile" |
output_format | "text" or "markdown" for LLM pipelines |
autoparse | Structured JSON extraction for supported sites |
keep_headers | Forward custom headers |
follow_redirect | Control redirect handling |
wait_for_selector | Wait for CSS selector (requires render: true) |
screenshot | Capture screenshot (auto-enables rendering) |
retry_404 | Retry 404 responses |
DataPipeline uses the same underlying credit rates as the Standard API. Cost is the sum of all requests in a job run. Preview the estimated cost before launching a project from the dashboard.
Only successful 200 and 404 responses are charged; failed requests are not.
| Limit | Value |
|---|---|
| Max input items | 100,000 per job |
| Direct list input | 500 items |
| Data retention | 30 days |
| Free plan concurrency | 5 connections |
| Free plan scheduling | One-time runs only |
Offers UI/UX design guidance for web and mobile with 50+ styles, 161 color palettes, 57 font pairings, and 99 UX guidelines across 10 stacks. Use for designing pages, components, color systems, or reviewing UI code.
Mines projects and conversations into a searchable memory palace. Activates on queries about MemPalace, memory palace, mining, searching, palace setup, wings, rooms, drawers, or recalling past work.
npx claudepluginhub scraperapi/scraperapi-skills --plugin scraperapi