Skill

xberg-enterprise

Managed document intelligence API for cloud extraction with webhook delivery, presigned uploads, versioning, and usage tracking. Use instead of the local kreuzberg CLI when files are remote, large, or need async processing.

api-development

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/xberg-enterprise:kreuzberg-cloud

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Xberg Enterprise is the managed extraction API hosted at

SKILL.md

348 lines · ~2.7k tokens

Stats

LanguageShell

Parent stars5

MaintenanceExcellent

Last CommitJun 25, 2026

Actions

View Source View Plugin View on GitHub View README

Xberg Enterprise

Xberg Enterprise is the managed extraction API hosted at https://api.xberg.io. It exposes the same Rust extraction engine as the local kreuzberg CLI, with two extras: jobs are asynchronous (webhook or polling delivery) and large files go through presigned uploads instead of in-band base64.

Use this skill when writing code that:

Hits api.xberg.io directly via HTTP.
Uses the @kreuzberg/cloud (npm) or kreuzberg-cloud-sdk (PyPI) SDKs.
Configures webhooks, sandbox keys, or usage queries.

v0.1.0 limitation

The xberg-enterprise plugin v0.1.0 ships skills only — no MCP server. The xberg-enterprise CLI binary that hosts the MCP server lands in plugin v0.2.0. Until then, prefer one of:

The TypeScript SDK (@kreuzberg/cloud) — ESM, tree-shakable, generated from the OpenAPI 3.1 spec.
The Python SDK (kreuzberg-cloud-sdk) — sync + async, from_sandbox() helper for evaluation.
Raw curl — every example below shows the curl form first.

When cloud vs local

Situation	Use
You already have the `kreuzberg` CLI installed and the file is on disk	Local (`kreuzberg` plugin)
File is on a remote URL or in S3 / GCS	Cloud
Need OCR for languages the local Tesseract install doesn't have	Cloud
File is larger than ~50 MB	Cloud (presigned uploads)
Want webhook delivery rather than blocking the caller	Cloud
Batch of mixed documents with shared options	Either; cloud parallelizes server-side
No network access, air-gapped environment	Local
Evaluating before committing to install	Cloud sandbox key

Getting an API key

Three options, in order of preference for production:

Production key — sign up at https://xberg.io/cloud, mint a key from the dashboard. Format: sk_live_*.
Sandbox key — no signup, 24-hour TTL, 50-page quota, rate-limited to 10 keys per IP per 24 hours. Format: sk_sandbox_*. See the sandbox-keys skill.
Local ~/.kreuzberg/cloud.toml — for shell sessions, put the key in:
```
# ~/.kreuzberg/cloud.toml
api_key = "sk_live_..."
```

The plugin's SessionStart hook checks KREUZBERG_API_KEY env var first, then ~/.kreuzberg/cloud.toml, and emits a setup reminder if neither is present.

Authentication

Every request — except POST /v1/sandbox/key — uses a Bearer token:

curl https://api.xberg.io/v1/usage \
  -H "Authorization: Bearer $KREUZBERG_API_KEY"

Both sk_live_* and sk_sandbox_* go in the same header. The server resolves project context from the key.

Base URL and versioning

Base: https://api.xberg.io
Path prefix: /v1/ for all extraction, jobs, sandbox, uploads, usage endpoints. Health endpoints (/healthz, /readyz) are unversioned.
The OpenAPI 3.1 spec is published at https://api.xberg.io/openapi.json; full reference at https://enterprise.xberg.io.

The 12 endpoints

Twelve operations across seven tag groups:

health (2)

Method	Path	Purpose
GET	`/healthz`	Liveness — returns 200 if the process is up.
GET	`/readyz`	Readiness — returns 200 only when downstream deps are healthy.

Neither requires auth. Use /readyz for uptime monitors and /healthz for load-balancer health checks.

extract (1)

Method	Path	Purpose
POST	`/v1/extract`	Submit one or more documents (or URLs) for extraction.

Accepts application/json (base64 documents) or multipart/form-data (binary file parts). Returns 202 Accepted with job_ids (extraction jobs) and crawl_job_ids (URL-crawl jobs). Pair with GET /v1/jobs/{id} to retrieve results — or supply a webhook block to receive them asynchronously. See the offloading-extraction skill.

jobs (1)

Method	Path	Purpose
GET	`/v1/jobs/{id}`	Get the current status and (if terminal) result of a job.

Accepts both extraction job IDs and crawl job IDs. Response shape varies: extraction jobs return JobResponse, crawl jobs return CrawlJobResponse. See the tracking-cloud-jobs skill.

documents (4)

Method	Path	Purpose
GET	`/v1/documents/{document_id}`	Latest version of a document with its extraction result.
GET	`/v1/documents/{document_id}/diff?from={v}&to={v}`	Compute a diff between two versions (sync, async fallback on budget).
GET	`/v1/documents/{document_id}/diff/{diff_job_id}`	Poll the status of an async diff job.
GET	`/v1/documents/{document_id}/versions`	List all versions of a document (paginated).

For applications that re-process the same document over time. Each extraction returns a document_id that's stable across versions.

uploads (2)

Method	Path	Purpose
POST	`/v1/uploads/presign`	Generate per-file presigned PUT URLs.
POST	`/v1/uploads/confirm`	Confirm the uploads and start processing.

Three-step flow for files larger than ~50 MB: presign → PUT to storage → confirm. See the presigned-uploads skill.

sandbox (1)

Method	Path	Purpose
POST	`/v1/sandbox/key`	Mint an ephemeral sandbox API key.

24-hour TTL, 50-page quota, 10 keys per IP per 24 hours. No auth required. See the sandbox-keys skill.

usage (1)

Method	Path	Purpose
GET	`/v1/usage`	Per-project usage statistics and remaining quota.

Accepts optional start and end ISO-8601 query params. Defaults to the current calendar month. See the managing-cloud-usage skill.

Key request / response shapes

`ExtractJsonRequest`

{
  "documents": [
    {
      "filename": "invoice.pdf",
      "mime_type": "application/pdf",
      "data": "<base64>"
    }
  ],
  "urls": [
    { "url": "https://example.com/docs" }
  ],
  "options": {
    "extraction_config": {
      "output_format": "markdown",
      "ocr": { "backend": "tesseract", "language": "eng" }
    }
  },
  "crawl_config": {
    "max_depth": 2,
    "max_pages": 50,
    "stay_on_domain": true
  },
  "webhook": {
    "url": "https://example.com/webhook",
    "secret": "shared-hmac-secret",
    "metadata": { "request_id": "abc123" }
  }
}

Either documents or urls is required (or both). webhook, options, and crawl_config are optional.

`ExtractResponse` (202)

{
  "job_ids": ["550e8400-e29b-41d4-a716-446655440000"],
  "crawl_job_ids": ["660e9400-f39c-51e5-b827-557766551111"],
  "status": "pending"
}

`JobResponse` (200)

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "filename": "invoice.pdf",
  "status": "completed",
  "created_at": "2025-12-21T10:00:00Z",
  "processing_time_ms": 1234,
  "result": {
    "content": "Invoice total: $1,234.56",
    "mime_type": "text/markdown",
    "tables": [],
    "images": [],
    "metadata": { "title": "Invoice #12345" }
  }
}

`JobStatus` enum

awaiting_upload | pending | processing | chunking | aggregating
                | completed | partial_success | failed | cancelled

Terminal states: completed, partial_success, failed, cancelled. Stop polling when any of those appears.

`UsageResponse` (200)

{
  "period_start": "2026-05-01",
  "period_end": "2026-06-01",
  "total_pages": 5432,
  "total_documents": 87,
  "total_failed": 2,
  "quota_limit": 100000,
  "quota_remaining": 94568,
  "by_mime_type": {
    "application/pdf": { "documents": 65, "pages": 3200, "failed": 1 }
  }
}

Error model

All errors are JSON with at least an error string field. Status codes follow REST conventions:

Status	Meaning	Typical cause
`400`	Bad request	Missing required field, malformed body, invalid UUID.
`401`	Unauthorized	Missing or invalid `Authorization` header.
`404`	Not found	Job / document ID doesn't exist in this project.
`429`	Rate limited	Sandbox-key IP throttle or per-key quota.
`500`	Server error	Database failure, worker crash — retry with backoff.
`503`	Service unavailable	Downstream dep unhealthy — retry.

The SDKs surface these as typed exceptions: AuthError, ValidationError, NotFoundError, RateLimitError (carries retry_after), ServerError, TimeoutError, all extending KreuzbergCloudError (Python) / KreuzbergError (TypeScript).

Concrete examples

Sandbox onboarding (no signup)

# Mint an ephemeral key.
curl -X POST https://api.xberg.io/v1/sandbox/key
# → { "api_key": "sk_sandbox_...", "expires_at": "...", "pages_remaining": 50 }

import { KreuzbergCloud } from "@kreuzberg/cloud";
const client = await KreuzbergCloud.fromSandbox();
const result = await client.extractAndWait({
  file: new Blob(["Hello world"], { type: "text/plain" }),
});
console.log(result.result?.content);

from xberg_enterprise import AsyncKreuzbergCloud
async with await AsyncKreuzbergCloud.from_sandbox() as client:
    job = await client.extract_and_wait(file=b"hello world")
    print(job.status, job.result and job.result.content)

Single-file extract → poll

# 1. Submit.
JOB_ID=$(curl -sX POST https://api.xberg.io/v1/extract \
  -H "Authorization: Bearer $KREUZBERG_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"filename": "invoice.pdf", "mime_type": "application/pdf",
       "data": "'"$(base64 -w0 invoice.pdf)"'"}
    ],
    "options": {"extraction_config": {"output_format": "markdown"}}
  }' | jq -r '.job_ids[0]')

# 2. Poll until terminal.
curl -s https://api.xberg.io/v1/jobs/$JOB_ID \
  -H "Authorization: Bearer $KREUZBERG_API_KEY"

Other skills

offloading-extraction — full POST /v1/extract workflow with options.
tracking-cloud-jobs — polling cadence, webhook signatures.
presigned-uploads — three-step flow for files >50 MB.
versioning-documents — retrieve a document, list its versions, diff two versions.
managing-cloud-usage — quota and per-MIME breakdown.
sandbox-keys — when to recommend sandbox over production keys.

References

API docs: https://enterprise.xberg.io
OpenAPI spec: https://api.xberg.io/openapi.json
TypeScript SDK: https://www.npmjs.com/package/@kreuzberg/cloud
Python SDK: https://pypi.org/project/kreuzberg-cloud-sdk/
Pricing and signup: https://xberg.io/cloud

xberg-enterprise

Popularity

Invocation

Context Preview

SKILL.md

xberg-enterprise

Popularity

Invocation

Context Preview

SKILL.md

Xberg Enterprise

v0.1.0 limitation

When cloud vs local

Getting an API key

Authentication

Base URL and versioning

The 12 endpoints

health (2)

extract (1)

jobs (1)

documents (4)

uploads (2)

sandbox (1)

usage (1)

Key request / response shapes

ExtractJsonRequest

ExtractResponse (202)

JobResponse (200)

JobStatus enum

UsageResponse (200)

Error model

Concrete examples

Sandbox onboarding (no signup)

Single-file extract → poll

Other skills

References

Similar Skills

Xberg Enterprise

v0.1.0 limitation

When cloud vs local

Getting an API key

Authentication

Base URL and versioning

The 12 endpoints

health (2)

extract (1)

jobs (1)

documents (4)

uploads (2)

sandbox (1)

usage (1)

Key request / response shapes

ExtractJsonRequest

ExtractResponse (202)

JobResponse (200)

JobStatus enum

UsageResponse (200)

Error model

Concrete examples

Sandbox onboarding (no signup)

Single-file extract → poll

Other skills

References

Similar Skills

`ExtractJsonRequest`

`ExtractResponse` (202)

`JobResponse` (200)

`JobStatus` enum

`UsageResponse` (200)

`ExtractJsonRequest`

`ExtractResponse` (202)

`JobResponse` (200)

`JobStatus` enum

`UsageResponse` (200)