Skill
Community

nimble-crawl-reference

Install
1
Install the plugin
$
npx claudepluginhub anthropics/claude-plugins-official --plugin nimble

Want just this skill?

Then install: npx claudepluginhub u/[userId]/[slug]

Description

Reference for nimble crawl command. Load when bulk-crawling many pages asynchronously. Contains: async workflow (create → status → tasks results), all flags, polling guidelines, CRITICAL: use task_id (not crawl_id) for results, crawl vs map comparison.

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

nimble crawl — reference

Async bulk crawling — starts a crawl job, returns a crawl_id, then you poll for status and retrieve per-page content via individual task_ids.

Tip: For LLM use, prefer nimble map + nimble extract --format markdown (faster, cleaner). Use crawl when you need raw HTML archives of many pages at once.

Table of Contents


1. Create crawl (async)

Parameters:

ParameterCLI flagTypeDefaultDescription
url--urlstringrequiredStarting URL
limit--limitint5000Max pages (1–10,000) — always set this
name--namestringLabel for tracking
sitemap--sitemapstringincludeinclude | only | skip
include_path--include-pathstring[]URL path regex to include (repeatable)
exclude_path--exclude-pathstring[]URL path regex to exclude (repeatable)
max_discovery_depth--max-discovery-depthint5Max link depth from start (1–20)
crawl_entire_domain--crawl-entire-domainboolfalseFollow sibling/parent paths
allow_subdomains--allow-subdomainsboolfalseFollow links to subdomains
allow_external_links--allow-external-linksboolfalseFollow links to external domains
ignore_query_parameters--ignore-query-parametersboolfalseDeduplicate query-param variants
callback--callbackstringWebhook URL for real-time page notifications
extract_options--extract-optionsobjectExtraction config applied to each page

CLI:

nimble crawl run --url "https://docs.example.com" --limit 50 --name "docs-crawl"

Python SDK:

from nimble_python import Nimble
nimble = Nimble(api_key=os.environ["NIMBLE_API_KEY"])

resp = nimble.crawl.run(url="https://docs.example.com", limit=50, name="docs-crawl")
crawl_id = resp.crawl_id

Response fields: crawl_id, status (initial: queued), url, name, crawl_options, created_at

Crawl status values: queuedrunningsucceeded / failed / canceled


2. Get crawl status

Parameters:

ParameterCLI flagTypeDescription
id--idstringcrawl_id (required)

CLI:

nimble crawl status --id "abc-123"

Python SDK:

crawl = nimble.crawl.status(id=crawl_id)
print(crawl.status, crawl.completed, crawl.total)

Response adds: total, pending, completed, failed, tasks[]

FieldDescription
tasks[].task_idUse with nimble tasks results to fetch page content
tasks[].statuspending / processing / completed / failed

CRITICAL: Use tasks[].task_id — NOT crawl_id — to fetch page content. Using crawl_id with tasks returns 404.


3. List crawls

Parameters:

ParameterCLI flagTypeDescription
status--statusstringFilter: queued | running | completed | failed | canceled
limit--limitintResults per page
cursor--cursorstringPagination cursor

CLI:

nimble crawl list --status running

Python SDK:

result = nimble.crawl.list()
# result.data = list of crawl objects
# result.pagination = { has_next, next_cursor, total }

4. Cancel crawl

Parameters:

ParameterCLI flagTypeDescription
id--idstringcrawl_id (required)

CLI:

nimble crawl terminate --id "abc-123"

Python SDK:

nimble.crawl.terminate(id=crawl_id)
# Returns: {"status": "canceled"}

Polling guidelines

CRAWL_ID=$(nimble crawl run --url "https://example.com" --limit 100 --name "my-crawl" \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['crawl_id'])")

while true; do
  STATUS=$(nimble crawl status --id "$CRAWL_ID" \
    | python3 -c "import json,sys; print(json.load(sys.stdin).get('status',''))")
  echo "Status: $STATUS"
  [ "$STATUS" = "succeeded" ] || [ "$STATUS" = "failed" ] || [ "$STATUS" = "canceled" ] && break
  sleep 30
done
Crawl sizePoll interval
< 50 pagesevery 15–30s
50–500 pagesevery 30–60s
500+ pagesevery 60–120s

Retrieving page content

After crawl succeeded, fetch each completed task — see nimble-tasks reference:

nimble tasks results --task-id "task-456"

Crawl vs Map

mapcrawl
SpeedSeconds (sync)Minutes (async)
OutputURL list with metadataFull page content per URL
Best forDiscover URLs, then selectively extractArchive all pages at once
LLM use✅ Combine with extract --format markdown⚠️ Returns raw HTML
Stats
Stars13
Forks1
Last CommitMar 10, 2026

Similar Skills