From xberg-enterprise
Extracts documents via the cloud API (POST /v1/extract) with JSON, multipart, or URL crawl shapes. Use when the local CLI is missing or you need async/server-side extraction.
How this skill is triggered — by the user, by Claude, or both
Slash command
/xberg-enterprise:offloading-extractionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
`POST /v1/extract` is the single submit endpoint. It returns `202 Accepted`
POST /v1/extract is the single submit endpoint. It returns 202 Accepted
with job_ids (extraction) and crawl_job_ids (URL crawls) — never the
extraction result inline. Pair every submit with either a poll loop
(tracking-cloud-jobs skill) or a webhook.
kreuzberg CLI is not installed.presigned-uploads instead — the
base64 JSON body is too big.POST https://api.xberg.io/v1/extract
Authorization: Bearer $KREUZBERG_API_KEY
Content-Type: application/json | multipart/form-data
Returns 202 Accepted with ExtractResponse.
curl -X POST https://api.xberg.io/v1/extract \
-H "Authorization: Bearer $KREUZBERG_API_KEY" \
-H "Content-Type: application/json" \
-d @- <<JSON
{
"documents": [
{
"filename": "invoice.pdf",
"mime_type": "application/pdf",
"data": "$(base64 -w0 invoice.pdf)"
}
],
"options": {
"extraction_config": {
"output_format": "markdown",
"ocr": { "backend": "tesseract", "language": "eng" }
}
}
}
JSON
curl -X POST https://api.xberg.io/v1/extract \
-H "Authorization: Bearer $KREUZBERG_API_KEY" \
-F "[email protected];type=application/pdf" \
-F 'options={"extraction_config":{"output_format":"markdown"}};type=application/json'
Add a webhook part as a JSON string:
-F 'webhook={"url":"https://hooks.example.com/x","secret":"shh"};type=application/json'
curl -X POST https://api.xberg.io/v1/extract \
-H "Authorization: Bearer $KREUZBERG_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [{"url": "https://example.com/docs"}],
"crawl_config": {"max_depth": 2, "max_pages": 50, "stay_on_domain": true},
"webhook": {"url": "https://hooks.example.com/x"}
}'
URL crawls return crawl_job_ids instead of (or alongside) job_ids.
{
"job_ids": ["550e8400-e29b-41d4-a716-446655440000"],
"crawl_job_ids": [],
"status": "pending"
}
status is always pending at submit time; the per-job status is
retrieved via GET /v1/jobs/{id}.
options blockShape mirrors the local ExtractionConfig:
{
"extraction_config": {
"output_format": "markdown",
"ocr": { "backend": "tesseract", "language": "eng+deu" },
"extract_tables": true,
"extract_images": false,
"chunking": { "max_chars": 4000, "overlap": 200 }
}
}
Supported output_format values: markdown, text, json, djot,
html. Default is markdown.
webhook block{
"url": "https://hooks.example.com/x",
"secret": "shared-secret-32-bytes-min",
"metadata": { "request_id": "abc123", "user_id": "u_42" }
}
secret is the HMAC key used to sign the webhook payload — see
tracking-cloud-jobs for verification. metadata is echoed back in the
delivered payload, useful for correlating server-side requests.
import { KreuzbergCloud } from "@kreuzberg/cloud";
import { readFile } from "node:fs/promises";
const client = new KreuzbergCloud({ apiKey: process.env.KREUZBERG_API_KEY! });
const data = await readFile("invoice.pdf");
const job = await client.extract({
file: { name: "invoice.pdf", data, mimeType: "application/pdf" },
options: { extractionConfig: { outputFormat: "markdown" } },
});
console.log(job.id, job.status);
For submit + wait in one call:
const result = await client.extractAndWait({
file: { name: "invoice.pdf", data },
});
console.log(result.result?.content);
from pathlib import Path
from xberg_enterprise import KreuzbergCloud
with KreuzbergCloud(api_key=os.environ["KREUZBERG_API_KEY"]) as client:
job = client.extract(file=Path("invoice.pdf"))
print(job.id, job.status)
Submit + wait:
job = client.extract_and_wait(file=Path("invoice.pdf"))
print(job.result.content if job.result else job.status)
JSON: pass multiple entries in documents. Multipart: repeat the file
part. SDKs expose extractBatch / extract_batch helpers that fan out
correctly per platform (parallel HTTP for the async Python client,
sequential for the sync one).
| Status | Cause | Fix |
|---|---|---|
400 | Empty documents and urls | Provide at least one. |
400 | Bad MIME type | Use a real RFC 6838 type, e.g. application/pdf. |
401 | Missing Bearer | Set Authorization header. |
413 | Request body too large | Switch to presigned uploads. |
429 | Quota or rate limit | Backoff; check quota_remaining via /v1/usage. |
After every submit, hand off to the tracking-cloud-jobs skill — cloud
extraction is asynchronous and the result is delivered via either polling
or webhook callback. Never assume a result is ready immediately after the
202 response.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub xberg-io/plugins