From sundial-org-awesome-openclaw-skills-4
Processes PDFs via API to extract markdown text and structured JSON data with AI confidence scores and quality flags for human review. Free 2,000 pages/month tier.
npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-2 --plugin sundial-org-awesome-openclaw-skills-4This skill uses the workspace's default tool permissions.
OCR that never fails silently. Process PDFs and extract structured data with AI-powered confidence scoring that tells you exactly which fields need human review.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
OCR that never fails silently. Process PDFs and extract structured data with AI-powered confidence scoring that tells you exactly which fields need human review.
DeepRead is a production-grade document processing API that reduces human review from 100% to ~10% through intelligent quality assessment.
Core Features:
hil_flag)Sign up and create an API key:
# Visit the dashboard
https://www.deepread.tech/dashboard
# Or use this direct link
https://www.deepread.tech/dashboard/?utm_source=clawdhub
Save your API key:
export DEEPREAD_API_KEY="sk_live_your_key_here"
Add to your clawdbot.config.json5:
{
skills: {
entries: {
"deepread": {
enabled: true,
apiKey: "sk_live_your_key_here"
}
}
}
}
Option A: With Webhook (Recommended)
# Upload PDF with webhook notification
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-app.com/webhooks/deepread"
# Returns immediately
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
# Your webhook receives results when processing completes (2-5 minutes)
Option B: Poll for Results
# Upload PDF without webhook
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf"
# Returns immediately
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
# Poll until completed
curl https://api.deepread.tech/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
-H "X-API-Key: $DEEPREAD_API_KEY"
Extract text as clean markdown:
# With webhook (recommended)
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "webhook_url=https://your-app.com/webhook"
# OR poll for completion
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf"
# Then poll
curl https://api.deepread.tech/v1/jobs/JOB_ID \
-H "X-API-Key: $DEEPREAD_API_KEY"
Response when completed:
{
"id": "550e8400-...",
"status": "completed",
"result": {
"text": "# INVOICE\n\n**Vendor:** Acme Corp\n**Total:** $1,250.00..."
}
}
Extract specific fields with confidence scoring:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F 'schema={
"type": "object",
"properties": {
"vendor": {
"type": "string",
"description": "Vendor company name"
},
"total": {
"type": "number",
"description": "Total invoice amount"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in MM/DD/YYYY format"
}
}
}'
Response includes confidence flags:
{
"status": "completed",
"result": {
"text": "# INVOICE\n\n**Vendor:** Acme Corp...",
"data": {
"vendor": {
"value": "Acme Corp",
"hil_flag": false,
"found_on_page": 1
},
"total": {
"value": 1250.00,
"hil_flag": false,
"found_on_page": 1
},
"invoice_date": {
"value": "2024-10-??",
"hil_flag": true,
"reason": "Date partially obscured",
"found_on_page": 1
}
},
"metadata": {
"fields_requiring_review": 1,
"total_fields": 3,
"review_percentage": 33.3
}
}
}
Extract arrays and nested objects:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F 'schema={
"type": "object",
"properties": {
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}'
Get per-page OCR results with quality flags:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@contract.pdf" \
-F "include_pages=true"
Response:
{
"result": {
"text": "Combined text from all pages...",
"pages": [
{
"page_number": 1,
"text": "# Contract Agreement\n\n...",
"hil_flag": false
},
{
"page_number": 2,
"text": "Terms and C??diti??s...",
"hil_flag": true,
"reason": "Multiple unrecognized characters"
}
],
"metadata": {
"pages_requiring_review": 1,
"total_pages": 2
}
}
}
PDF → Convert → Rotate Correction → OCR → Multi-Model Validation → Extract → Done
The pipeline automatically handles:
AI compares extracted text to the original image and sets hil_flag:
hil_flag: false = Clear, confident extraction → Auto-processhil_flag: true = Uncertain extraction → Human review requiredAI flags extractions when:
This is multimodal AI determination, not rule-based.
Create reusable, optimized schemas for specific document types:
# List your blueprints
curl https://api.deepread.tech/v1/blueprints \
-H "X-API-Key: $DEEPREAD_API_KEY"
# Use blueprint instead of inline schema
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "blueprint_id=660e8400-e29b-41d4-a716-446655440001"
Benefits:
How to create blueprints:
# Create a blueprint from training data
curl -X POST https://api.deepread.tech/v1/optimize \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "utility_invoice",
"description": "Optimized for utility invoices",
"document_type": "invoice",
"initial_schema": {
"type": "object",
"properties": {
"vendor": {"type": "string", "description": "Vendor name"},
"total": {"type": "number", "description": "Total amount"}
}
},
"training_documents": ["doc1.pdf", "doc2.pdf", "doc3.pdf"],
"ground_truth_data": [
{"vendor": "Acme Power", "total": 125.50},
{"vendor": "City Electric", "total": 89.25}
],
"target_accuracy": 95.0,
"max_iterations": 5
}'
# Returns: {"job_id": "...", "blueprint_id": "...", "status": "pending"}
# Check optimization status
curl https://api.deepread.tech/v1/blueprints/jobs/JOB_ID \
-H "X-API-Key: $DEEPREAD_API_KEY"
# Use blueprint (once completed)
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "blueprint_id=BLUEPRINT_ID"
Get notified when processing completes instead of polling:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "webhook_url=https://your-app.com/webhooks/deepread"
Your webhook receives this payload when processing completes:
{
"job_id": "550e8400-...",
"status": "completed",
"created_at": "2025-01-27T10:00:00Z",
"completed_at": "2025-01-27T10:02:30Z",
"result": {
"text": "...",
"data": {...}
},
"preview_url": "https://preview.deepread.tech/abc1234"
}
Benefits:
Share OCR results without authentication:
# Request preview URL
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "include_images=true"
# Get preview URL in response
{
"result": {
"text": "...",
"data": {...}
},
"preview_url": "https://preview.deepread.tech/Xy9aB12"
}
Public Preview Endpoint:
# No authentication required
curl https://api.deepread.tech/v1/preview/Xy9aB12
Upgrade: https://www.deepread.tech/dashboard/billing?utm_source=clawdhub
Every response includes quota information:
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 1847
X-RateLimit-Used: 153
X-RateLimit-Reset: 1730419200
✅ Recommended: Webhook notifications
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-app.com/webhook"
Only use polling if:
✅ Good: Descriptive field descriptions
{
"vendor": {
"type": "string",
"description": "Vendor company name. Usually in header or top-left of invoice."
}
}
❌ Bad: No description
{
"vendor": {"type": "string"}
}
Only if you can't use webhooks, poll every 5-10 seconds:
import time
import requests
def wait_for_result(job_id, api_key):
while True:
response = requests.get(
f"https://api.deepread.tech/v1/jobs/{job_id}",
headers={"X-API-Key": api_key}
)
result = response.json()
if result["status"] == "completed":
return result["result"]
elif result["status"] == "failed":
raise Exception(f"Job failed: {result.get('error')}")
time.sleep(5)
Separate confident fields from uncertain ones:
def process_extraction(data):
confident = {}
needs_review = []
for field, field_data in data.items():
if field_data["hil_flag"]:
needs_review.append({
"field": field,
"value": field_data["value"],
"reason": field_data.get("reason")
})
else:
confident[field] = field_data["value"]
# Auto-process confident fields
save_to_database(confident)
# Send uncertain fields to review queue
if needs_review:
send_to_review_queue(needs_review)
quota_exceeded{"detail": "Monthly page quota exceeded"}
Solution: Upgrade to PRO or wait until next billing cycle.
invalid_schema{"detail": "Schema must be valid JSON Schema"}
Solution: Ensure schema is valid JSON and includes type and properties.
file_too_large{"detail": "File size exceeds 50MB limit"}
Solution: Compress PDF or split into smaller files.
failed{"status": "failed", "error": "PDF could not be processed"}
Common causes:
{
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "Unique invoice ID"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in MM/DD/YYYY format"
},
"vendor": {
"type": "string",
"description": "Vendor company name"
},
"total": {
"type": "number",
"description": "Total amount due including tax"
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"merchant": {
"type": "string",
"description": "Store or merchant name"
},
"date": {
"type": "string",
"description": "Transaction date"
},
"total": {
"type": "number",
"description": "Total amount paid"
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"parties": {
"type": "array",
"items": {"type": "string"},
"description": "Names of all parties in the contract"
},
"effective_date": {
"type": "string",
"description": "Contract start date"
},
"term_length": {
"type": "string",
"description": "Duration of contract"
},
"termination_clause": {
"type": "string",
"description": "Conditions for termination"
}
}
}
Ready to start? Get your free API key at https://www.deepread.tech/dashboard/?utm_source=clawdhub