From lawvable-awesome-legal-skills
Extracts structured data from multiple PDF and DOCX documents into Excel tables with page citations. For tabular reviews, contract comparisons, and document matrices.
npx claudepluginhub joshuarweaver/cascade-business-ops --plugin lawvable-awesome-legal-skillsThis skill uses the workspace's default tool permissions.
Extract structured data from multiple documents into an Excel matrix with citations.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Extract structured data from multiple documents into an Excel matrix with citations.
Use AskUserQuestion to collect:
Example column definitions:
- Parties: Names of all parties to the agreement
- Effective Date: When the agreement becomes effective
- Term: Duration of the agreement
- Governing Law: Jurisdiction for disputes
Use Glob to find all documents:
Glob(pattern: "**/*.pdf", path: "<folder>")
Glob(pattern: "**/*.docx", path: "<folder>")
Launch background agents to process documents concurrently. Each agent:
Launch agents:
Task(
prompt: "<agent_prompt>",
subagent_type: "general-purpose",
run_in_background: true
)
Agent prompt template:
You are processing documents for a tabular review.
DOCUMENTS TO PROCESS:
<list of document paths>
COLUMNS TO EXTRACT:
<column definitions>
For each document:
1. Read the document using the pdf skill (for .pdf) or docx skill (for .docx)
2. Extract the requested information for each column
3. Note the page number (PDF) or section (DOCX) where you found the information
4. Include a brief quote (30-50 chars) showing the source text
Return your results as JSON:
{
"results": [
{
"document": "<filename>",
"path": "<absolute_path>",
"extractions": [
{
"column": "<column_name>",
"value": "<extracted_value>",
"page": <page_number>,
"quote": "<brief_context_quote>"
}
]
}
]
}
If you cannot find information for a column, set value to "Not found" and explain in the quote field.
Distribution strategy:
Wait for all background agents to complete:
TaskOutput(task_id: "<agent_id>", block: true)
Aggregate all results into a single array of document extractions.
Invoke the xlsx skill to create the output file:
Create an Excel workbook at <output_path>:
SHEET 1: "Document Review"
- Header row: Document | <Column1> | <Column2> | ...
- Data rows: One row per document
For each extraction cell:
- Cell value: The extracted text
- Cell hyperlink: file://<document_path>#page=<N> (for PDFs)
- Cell comment: "Page <N>: '<quote>'"
SHEET 2: "Summary"
- Total documents: <count>
- Documents processed: <count>
- Extraction date: <today>
Extraction result format:
{
"document": "Contract_ABC.pdf",
"path": "/path/to/Contract_ABC.pdf",
"extractions": [
{
"column": "Parties",
"value": "Acme Corp and Beta Inc",
"page": 1,
"quote": "entered into between Acme Corp and Beta Inc"
},
{
"column": "Effective Date",
"value": "January 15, 2025",
"page": 1,
"quote": "effective as of January 15, 2025"
}
]
}
Cell with citation:
file:///path/to/Contract_ABC.pdf#page=1Page 1: "entered into between Acme Corp and Beta Inc"Color coding (optional):
| Scenario | Action |
|---|---|
| Document unreadable | Log error, mark row as failed, continue |
| Column not found | Set value to "Not found", explain in comment |
| Agent timeout | Collect partial results, note incomplete |
| Missing skill | Prompt user to install required skill |
User: I want to do a tabular review of my contracts
Claude: [Uses AskUserQuestion]
- What folder contains your documents?
- What should I name the output Excel file?
- What columns do you want to extract?
User: ~/Contracts, review.xlsx, Parties/Date/Term/Governing Law
Claude: [Discovers 15 documents via Glob]
Claude: [Launches 5 background agents, 3 docs each]
Claude: [Collects results via TaskOutput]
Claude: [Creates review.xlsx via xlsx skill]
Output: review.xlsx with 15 rows, 4 columns, hyperlinks and citations