Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By orbruno
Toolkit for document extraction using Docling - convert PDFs and HTML into structured, citation-rich JSONL for AI processing pipelines
npx claudepluginhub orbruno/docling-ccpluginInitialize Docling extraction project structure with directories, config, and scripts
Generate production-ready Python script for processing documents with Docling
Validate Docling extract quality and metadata completeness
This agent should be used when the user asks "how should I use Docling", "what Docling configuration do I need", "design Docling workflow", "which chunker should I use", "Granite or standard Docling", "best approach for processing documents", or mentions designing document extraction infrastructure with Docling.
This agent should be used when the user asks "debug Docling script", "customize extraction script", "script not working", "modify processing script", "add feature to script", "script error", or mentions troubleshooting or enhancing generated Docling processing scripts.
This skill should be used when the user asks about "Granite model", "Docling for scanned PDFs", "OCR with Docling", "performance optimization Docling", "complex documents", "table extraction", "multi-column layout", "advanced Docling configuration", or mentions handling challenging document processing scenarios.
This skill should be used when the user asks about "Docling chunking", "HybridChunker", "HierarchicalChunker", "structure-aware chunking", "Docling metadata extraction", "export modes", "DOC_CHUNKS vs MARKDOWN", "chunking strategies", or mentions preparing documents for RAG with Docling.
This skill should be used when the user asks "how to use Docling", "what is Docling", "install Docling", "Docling tutorial", "when to use Docling", "Docling vs PyPDF2", "document extraction with Docling", or mentions getting started with Docling for document processing.
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Official Anthropic skills for PDF, Word, PowerPoint, and Excel document processing.
PDF data extraction with multi-backend support (markitdown, docling, pdfplumber, etc.)
Annotated research paper collection management — retrieve, read, extract, cross-reference scientific papers
Pull docs from any URL into Claude Code. Indexes static docs sites in seconds with conditional-GET caching, then exposes them as MCP tools (fetch_url, ensure_docs, list_sources, list_indexed, grep_docs, read_doc, add_source, remove_source). Local, browser-free, no API keys.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Personal knowledge management system for organizing files, cataloging literature, and maintaining consistent knowledge architecture. Configurable per-project via .claude/knowledge-management.local.md.
Comprehensive toolkit for BAML development: project scaffolding, code generation, testing, framework integration, and Gemini batch processing
Converts Claude Code plugins into Gemini CLI extensions with full automation
Complete WooCommerce and WordPress API integration for managing sites without the dashboard. Includes backup management, security auditing, and multi-site support.
Execute Gemini-specific workflows: long context processing, image/video analysis, and batch data processing
Expert guidance and tooling for document extraction using IBM's Docling library.
The Docling Toolkit plugin provides comprehensive support for using Docling to extract structured data from documents. It helps you convert PDFs, HTML, and other document formats into clean, citation-rich JSONL files ready for downstream AI processing.
Docling is an open-source document processing library developed at IBM Research and donated to the LF AI & Data Foundation. It transforms complex documents into structured, machine-readable data with:
Claude will automatically help with Docling when you:
/docling-scaffold-processor - Generate production-ready document processing script/docling-init-project - Initialize Docling extraction project structure/docling-validate-extracts - Validate extract quality and metadata completenessuv package manageruv add docling
# or
pip install docling
# From the Claude-Plugins directory
claude plugin install ./docling-toolkit --scope user
# Or use absolute path
claude plugin install /Users/orlandobruno/Documents/Dev/Claude-Plugins/docling-toolkit --scope user
claude plugin list
# Should show "docling-toolkit" in the list
# In your project directory
/docling-init-project my-document-extraction
cd my-document-extraction
This creates:
my-document-extraction/
├── README.md
├── config/
│ └── docling-config.yaml
├── data/
│ ├── raw/ # Place your PDFs/HTML here
│ └── processed/
├── extracts/ # Docling output (JSONL)
├── scripts/
│ ├── process_documents.py
│ └── validate_extracts.py
├── logs/
└── .env.example
/docling-scaffold-processor process_documents --input-types pdf,html
This generates a production-ready Python script with:
# Place PDFs in data/raw/ then run:
uv run python scripts/process_documents.py \
--input-dir data/raw \
--output-file extracts/output.jsonl
/docling-validate-extracts extracts/output.jsonl
Gets a quality report with:
# Initialize project
/docling-init-project research-extraction
# Generate processor
/docling-scaffold-processor extract_papers
# Process PDFs
uv run python scripts/extract_papers.py \
--input-dir data/papers/ \
--output-file extracts/papers.jsonl
# Generate processor with Granite support
/docling-scaffold-processor process_scans --granite
# Process with Granite model for better OCR
uv run python scripts/process_scans.py \
--input-dir data/scanned/ \
--output-file extracts/scanned.jsonl \
--granite
# Generate processor for HTML
/docling-scaffold-processor extract_html --input-types html
# Process HTML files
uv run python scripts/extract_html.py \
--input-dir data/web_content/ \
--output-file extracts/web.jsonl
Docling extracts (JSONL format) work seamlessly with: