From documents
Processes production PDFs: form analysis/filling/validation, table extraction, OCR, batch ops, merging with Python scripts and error handling.
npx claudepluginhub henkisdabro/wookstar-claude-plugins --plugin documentsThis skill uses the workspace's default tool permissions.
Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.
Processes PDFs: extracts text/tables, merges/splits documents, fills forms, creates PDFs, adds annotations/watermarks using Python (pypdf, pdfplumber, reportlab) and CLI tools. For PDF manipulation tasks.
Processes PDFs: extracts text/tables with pdfplumber, merges/splits/rotates/extracts metadata with pypdf, handles forms. For Python-based PDF generation/analysis.
Processes PDFs: extracts text/tables with pdfplumber, merges/splits/rotates with pypdf, extracts metadata. For generating PDFs or filling forms via referenced guides.
Share bugs, ideas, or general feedback.
Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions
python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting
python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection
--help on all scripts)Complete form workflows including field analysis, dynamic filling, validation rules, multi-page forms, and checkbox/radio handling. See references/forms.md.
Complex table extraction including multi-page tables, merged cells, nested tables, custom detection, and CSV/Excel export. See references/tables.md.
Scanned PDFs and image-based documents including Tesseract integration, language support, image preprocessing, and confidence scoring. See references/ocr.md.
| Script | Purpose | Usage |
|---|---|---|
| analyze_form.py | Extract form field info | python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose] |
| fill_form.py | Fill PDF forms with data | python scripts/fill_form.py input.pdf data.json output.pdf [--validate] |
| validate_form.py | Validate form data before filling | python scripts/validate_form.py data.json schema.json |
| extract_tables.py | Extract tables to CSV/Excel | python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv|excel] |
| extract_text.py | Extract text with formatting | python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting] |
| merge_pdfs.py | Merge multiple PDFs | python scripts/merge_pdfs.py file1.pdf file2.pdf --output merged.pdf |
| split_pdf.py | Split PDF into pages | python scripts/split_pdf.py input.pdf --output-dir pages/ |
| validate_pdf.py | Validate PDF integrity | python scripts/validate_pdf.py input.pdf |
All scripts require:
pip install pdfplumber pypdf pillow pytesseract pandas
Optional for OCR:
# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases
| File | Contents |
|---|---|
| references/forms.md | Complete form processing guide |
| references/tables.md | Advanced table extraction |
| references/ocr.md | Scanned PDF processing |
| references/workflows.md | Common workflows, error handling, performance tips, best practices |
| references/troubleshooting.md | Troubleshooting common issues and getting help |