Help us improve
Share bugs, ideas, or general feedback.
From brightdata-plugin
Use Bright Data's Discover API for intent-ranked semantic web search. Trigger jobs via REST API, CLI, or Python/JS SDKs. Retrieve relevance-scored results with optional content. For RAG and research.
npx claudepluginhub brightdata/skills --plugin brightdata-pluginHow this skill is triggered — by the user, by Claude, or both
Slash command
/brightdata-plugin:discover-apiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Discover is **intent-ranked semantic web search**. You give it a `query` plus an
Searches the web via Bright Data CLI: `bdata search` for Google/Bing/Yandex SERP results, `bdata discover` for intent-ranked semantic discovery with optional page content. Hands off to `scrape` or `data-feeds` when appropriate.
Executes web searches via Tavily, filters and extracts content inside Python so only curated results enter the context window. Activates on research queries like "search for", "look up", or "find".
Orchestrates deep research using Exa search API for lead generation, literature reviews, competitive analysis, and exhaustive searches.
Share bugs, ideas, or general feedback.
Discover is intent-ranked semantic web search. You give it a query plus an
intent, and it returns results scored by AI relevance — optionally with the full
parsed page content. It is the right primitive when result quality/relevance
matters more than raw keyword rank, and the building block for retrieval (RAG),
research, and knowledge-base pipelines.
Discover vs. the neighbors:
search skill (bdata search).data-feeds.live-research or rag-pipeline (both call this API).task_id.task_id until status is "done" (intermediate: "processing").results[].The CLI and SDKs do the trigger+poll for you; the raw REST flow is shown below for
when you need parameters the wrappers don't expose (notably mode).
| You are… | Use |
|---|---|
| In a terminal, one-off or scripted | CLI: bdata discover |
| Writing Node/TS code | JS SDK: client.discover() — see js-sdk-best-practices |
| Writing Python code | Python SDK: client.discover() — see python-sdk-best-practices |
Need mode (deep/fast/zeroRanking) or include_images | Raw REST (wrappers don't expose these yet) |
bdata discoverSetup gate first:
command -v bdata >/dev/null 2>&1 || echo "CLI missing — see bright-data-best-practices/references/cli-setup.md"
bdata zones >/dev/null 2>&1 || echo "not authenticated — run: bdata login"
# Intent-ranked discovery, JSON
bdata discover "enterprise LLM platforms" \
--intent "vendor pages with pricing" \
--num-results 15 --json --pretty
# With parsed page content in one pass (for RAG / research)
bdata discover "webhook retry best practices" \
--include-content --num-results 10 -o results.json
# Date-bounded
bdata discover "react server components" \
--start-date 2025-01-01 --end-date 2025-12-31 --num-results 20 --json
Results live at .results[]; each has title, link, description,
relevance_score, and content when --include-content. Full CLI flag list:
search skill → references/flags.md.
// JS — see js-sdk-best-practices for all options.
// VERIFIED v1.1.0: discover() returns a WRAPPER { success, data:[...], totalResults, cost, taskId, ... }
const res = await client.discover('Tesla battery tech', { intent: 'EV battery breakthroughs', numResults: 10, includeContent: true });
const rows = res.data; // ← rows are in .data (NOT a bare array, NOT .results)
# Python — see python-sdk-best-practices (confirm whether rows come back directly, under .data, or .results)
out = client.discover(query="Tesla battery tech", intent="EV battery breakthroughs")
mode)# 1) Trigger
task_id=$(curl -s -X POST https://api.brightdata.com/discover \
-H "Authorization: Bearer $BRIGHTDATA_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query":"post-quantum cryptography adoption","intent":"enterprise migration guides","mode":"deep","num_results":20,"include_content":true}' \
| jq -r '.task_id')
# 2) Poll until done
while :; do
resp=$(curl -s "https://api.brightdata.com/discover?task_id=$task_id" -H "Authorization: Bearer $BRIGHTDATA_API_TOKEN")
[ "$(echo "$resp" | jq -r '.status')" = "done" ] && break
sleep 3
done
echo "$resp" | jq '.results'
query is required; everything else is optional. The CLI/SDK expose a subset
(see references/api-reference.md for the exact per-surface matrix).
| Param | Type | Default | Notes |
|---|---|---|---|
query | string | — | required, ≤ 1500 chars |
intent | string | — | goal descriptor, ≤ 3000 chars; strongly recommended — drives ranking |
mode | enum | standard | standard | zeroRanking | deep | fast (REST-only) |
num_results | int | — | 1–20; ignored in zeroRanking |
filter_keywords | string[] | — | exact keywords that must appear |
include_content | bool | false | parsed page/PDF content (PDF ≤ 50 MB, 30s); unsupported in zeroRanking |
include_images | bool | false | image array (REST-only) |
format | enum | json | json | md (SDK accepts only json) |
country | string | US | 2-letter ISO |
city | string | — | SERP city targeting |
language | string | en | 31 languages |
start_date / end_date | string | — | YYYY-MM-DD (REST-only) |
remove_duplicates | bool | true | dedupe results (REST-only) |
| Mode | What it does | Use for |
|---|---|---|
standard (default) | balanced depth + AI ranking | general intent search |
deep | exhaustive, broader search; slower | live-research, comprehensive topic coverage |
fast | optimized for low latency | time-sensitive / interactive |
zeroRanking | no AI ranking, max raw volume; ignores num_results, no include_content | bulk corpus collection for rag-pipeline |
modeis currently REST-only — the CLI and SDKs don't expose it. Fordeepcoverage via the CLI/SDK, approximate with a highnum_results+ a sharpintent; for truedeep/zeroRanking, use the raw REST flow above.
REST + CLI (verified) — rows live under results:
{
"status": "done",
"duration_seconds": 12.4,
"timestamp": "2026-06-08T08:36:55.709Z",
"results": [
{ "link": "https://…", "title": "…", "description": "…",
"relevance_score": 0.87, "content": "…(when --include-content)…" }
]
}
JS SDK (verified v1.1.0) — rows live under data, inside a wrapper:
{ success: true, data: [ {link, title, description, relevance_score, content?} ],
totalResults, cost, taskId, query, intent, durationSeconds, triggerSentAt, dataFetchedAt }
⚠️ Cross-surface gotcha: REST/CLI return rows under .results; the JS SDK
returns them under .data (and wraps everything in {success, ...} — check
success before reading data). Don't assume one shape across surfaces.
relevance_score is a float (snake_case). Higher = more relevant to intent.
content is plain text by default (Markdown when REST format=md). A high
relevance_score does not guarantee good content — pages can be 404 stubs or
nav-only; gate on content length + "not found"/block-page signatures before use.
task_id (REST: status:"ok"). No id → check auth / that Discover is enabled on the account (403 if disabled).status:"done" before reading — never read results while processing.results[] non-empty — if empty, the query/intent is too narrow; loosen and retry. Don't claim success on empty.include_content bodies aren't block pages — grep content for captcha, Just a moment, Access Denied, cf-browser-verification (same list as scrape). Drop poisoned rows.relevance_scores are low or off-topic, sharpen intent (not just query).query with no intent — you lose the whole point (intent ranking). Always give an intent.search.num_results > 20 — capped at 20; for more, run multiple targeted queries and dedup (see live-research).zeroRanking then expecting include_content or num_results to apply — they don't.status:"done".success:false (SDK) / empty results (REST/CLI) as a hard error — it's often transient; retry once with backoff first (see references/api-reference.md → Transient failures).relevance_score or content when a call fails — report the failure instead.references/api-reference.md — full REST endpoint spec, the exact param matrix per surface (REST vs CLI vs JS SDK vs Python SDK), error codes, and limits.live-research — multi-query Discover → dedup → synthesized cited brief.rag-pipeline — Discover (include_content) → chunk → embed → retrieve.search — keyword SERP (bdata search) when you don't need intent ranking.js-sdk-best-practices / python-sdk-best-practices — client.discover() option details.