Help us improve
Share bugs, ideas, or general feedback.
From docling-toolkit
Initialize Docling extraction project structure with directories, config, and scripts
npx claudepluginhub orbruno/docling-ccpluginHow this command is triggered — by the user, by Claude, or both
Slash command
/docling-toolkit:init-project [project-name] [--path /custom/path]This command is limited to the following tools:
The summary Claude sees in its command listing — used to decide when to auto-load this command
# Initialize Docling Project Create a complete project structure for Docling document extraction work. ## Task Initialize a new project directory for document extraction using Docling. The user has requested: "$ARGUMENTS" ## Steps 1. **Parse arguments**: - Extract project name from arguments (first positional argument or default to "docling-extraction") - Check for `--path` flag to specify custom directory path - Validate project name (alphanumeric, hyphens, underscores only) 2. **Determine project location**: - If `--path` provided, use that directory - Otherwise, crea...
/initInitializes docs folder with minimal, standard, or full structure, creating directories and README files with navigation, placeholders, and project analysis. Supports --check preview and --force overwrite.
/initInitializes or re-boots llmdoc/ directory structure, runs multi-themed project investigations with investigator, and generates initial stable docs via recorder.
/sc-indexGenerates project documentation and knowledge base from a target directory with structure analysis, organization, cross-referencing, validation, and maintenance. Supports --type docs|api|structure|readme and --format md|json|yaml.
/indexGenerates project documentation, knowledge base, API docs, structure analysis, or README for a target path. Supports --type (docs|api|structure|readme) and --format (md|json|yaml).
/initBootstraps project.intent.md and project.glossary.json from existing codebase via deterministic scan, LLM synthesis, and interactive editing. Supports --force, --harness, --project-root flags.
Share bugs, ideas, or general feedback.
Create a complete project structure for Docling document extraction work.
Initialize a new project directory for document extraction using Docling. The user has requested: "$ARGUMENTS"
Parse arguments:
--path flag to specify custom directory pathDetermine project location:
--path provided, use that directory./<project-name>/Create directory structure: Use Bash tool to create folders:
mkdir -p <project-path>/{data/raw,data/processed,extracts,scripts,logs,config}
Directory structure:
<project-name>/
├── README.md
├── config/
│ └── docling-config.yaml
├── data/
│ ├── raw/ # Original PDFs/HTML
│ └── processed/ # Cleaned/organized
├── extracts/ # Docling JSONL output
├── scripts/ # Processing scripts
├── logs/ # Processing logs
└── .env.example # Environment variables template
Create README.md: Generate comprehensive README with:
data/raw/Create config/docling-config.yaml: Generate configuration file with:
# Docling Configuration
chunker_type: hybrid # or: hierarchical
export_mode: doc_chunks
use_granite_model: false # Set true for scanned PDFs
# Metadata fields to extract
metadata_fields:
- page_number
- section_title
- doc_items
- origin
# Output configuration
output_format: jsonl
output_directory: ./extracts
# Processing options
batch_size: 10 # Process N documents at a time
parallel_workers: 4 # Number of parallel workers
Create .env.example: Generate environment template:
# Docling Cache Directory (optional)
# DOCLING_CACHE_DIR=$HOME/.cache/docling
# Logging Level
# DOCLING_LOG_LEVEL=INFO
# API Keys (if using BAML or other tools)
# GOOGLE_API_KEY=your-key-here
# ANTHROPIC_API_KEY=your-key-here
Generate placeholder scripts:
scripts/.gitkeep to preserve directory/docling-toolkit:scaffold-processor to generate scriptsCreate .gitignore (optional):
# Extracted data
extracts/*.jsonl
logs/*.log
# Environment
.env
# Python
__pycache__/
*.py[cod]
.venv/
# OS
.DS_Store
Display success message:
tree or ls -R)cd <project-name>data/raw//docling-toolkit:scaffold-processor to create processing script/docling-toolkit:validate-extractsOffer to create initial scripts (optional):
/docling-toolkit:scaffold-processor process_documents# {Project Name}
Document extraction project using Docling.
## Directory Structure
- `data/raw/` - Place original PDFs and HTML files here
- `data/processed/` - Cleaned or organized documents
- `extracts/` - Docling output (JSONL format)
- `scripts/` - Processing scripts
- `logs/` - Processing logs
- `config/` - Configuration files
## Workflow
### 1. Add Documents
Place your PDF or HTML documents in `data/raw/`:
```bash
cp /path/to/documents/*.pdf data/raw/
/docling-toolkit:scaffold-processor process_documents
uv run python scripts/process_documents.py \\
--input-dir data/raw \\
--output-file extracts/output.jsonl
/docling-toolkit:validate-extracts extracts/output.jsonl
/baml-toolkit:batch-gemini GenerateProfile \\
extracts/output.jsonl \\
--output profiles.json
import json
with open("extracts/output.jsonl") as f:
for line in f:
extract = json.loads(line)
# Process extract
Edit config/docling-config.yaml to customize:
--granite flaglogs/ directory
## Notes
- Create all directories even if empty (use `.gitkeep` files)
- Make README comprehensive but focused
- Configuration should have sensible defaults
- Structure should match Orlando's project organization preferences (documented in his context)
## Success Criteria
- All directories created
- README with complete workflow
- Configuration file with defaults
- User understands next steps
- Project is ready for document processing