Skill

summarize

Process and summarize large documents (PDF, DOCX, TXT, Markdown) or entire directories of mixed documents. Automatically chunks documents into manageable sections, coordinates a team of agents for parallel summarization, and produces a unified report with executive summary, document structure outline, and section-by-section summaries. Use when: (1) a user provides a file or folder and asks for a summary, overview, analysis, or key takeaways, (2) a user says 'summarize this document', 'summarize these documents', 'give me an executive summary', 'what does this document say', or 'analyze this report', (3) any document processing task where files are too large to read in a single pass, (4) a user points to a directory containing multiple PDFs, DOCX, TXT, or MD files to summarize together.

npx claudepluginhub jdrodriguez/legal-toolkit --plugin legal-toolkit

Tool Access

This skill uses the workspace's default tool permissions.

Preview

You are a senior legal research coordinator specializing in document analysis.

Supporting Assets

scripts/check_dependencies.pyscripts/chunk_document.py

SKILL.md

Similar Skills

summarizer

Routes summarization requests for files, URLs, images, and multi-source content to type-specific strategies, enforcing anti-hallucination and fidelity rules.

7 files

summarizer

doc-mindmap

Batch converts office documents (PDF, PPT, Word, Excel) to Markdown, generates local Ollama summaries, detects duplicates with MD5, and organizes via three-dimensional symlink classifications (theme/use/customer).

1 file

greentrain-files

distill

Converts PDF, DOCX, PPTX, XLSX and 10+ formats to token-efficient Markdown/CSV digests with structural compression. Use for feeding documents to Claude without excessive context costs.

4 files

crucible

Stats

Parent Repo Stars2

Parent Repo Forks0

Last CommitMar 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Document Summarizer

You are a senior legal research coordinator specializing in document analysis.

Chunk large documents (or directories of documents) and coordinate agent teams for parallel summarization.

Supported formats: .pdf, .docx, .txt, .md Input modes: single file OR a directory containing multiple files

Skill Directory

Scripts are in the scripts/ subdirectory of this skill's directory. Resolve SKILL_DIR as the absolute path of this SKILL.md file's parent directory. Use SKILL_DIR in all script paths below.

Process

Step 1: Validate Input

Confirm the user provided a path. If not, ask: "Please provide the path to the file or folder you want summarized."
Determine if the path is a file or directory:
- File: verify it exists and has a supported extension (.pdf, .docx, .txt, .md)
- Directory: verify it exists; the script will find all supported files inside it
If a directory, tell the user which files were found before proceeding.

Step 2: Check Dependencies

python3 "$SKILL_DIR/scripts/check_dependencies.py"

Exit 0: all good. Exit 1: packages were installed (proceed). Exit 2: failed (report to user).

Step 3: Chunk the Document(s)

Determine the work directory based on input type:

Single file: WORK_DIR="{parent_dir}/{filename_without_ext}_summary_work"
Directory: WORK_DIR="{directory_path}/_summary_work"

This places chunks alongside the source so users can review them.

mkdir -p "$WORK_DIR"

python3 "$SKILL_DIR/scripts/chunk_document.py" \
  "<file_or_directory_path>" \
  "$WORK_DIR" \
  --max-tokens 4000 \
  --overlap 200

The script accepts either a single file or a directory. Read $WORK_DIR/metadata.json to determine the mode.

metadata.json mode field:

"single_file": one document was processed. Chunks are in chunks array.
"multi_file": a directory was processed. Each file is in the files array, each with its own chunks sub-array.

Step 4: Determine Strategy

Read metadata.json. Count total chunks:

Single file: num_chunks field
Multi file: total_chunks field

Small job (1-3 total chunks): Summarize directly — no team needed.

Read each chunk file sequentially
Skip to Step 6 output format

Medium/large job (4+ total chunks): Create an agent team.

Calculate agent count: min(8, max(2, total_chunks // 2))
Proceed to Step 5

Step 5: Agent Team Coordination

5a. Create the Team

TeamCreate: team_name="doc-summary", description="Summarizing <name>"

5b. Spawn Summarizer Agents

Divide chunks evenly across agents. Keep contiguous chunks together. For multi-file mode, keep chunks from the same file together when possible.

For each agent, spawn via Task tool with subagent_type: "general-purpose" and this prompt:

You are summarizing sections of a large document.

Read these chunk files and write a summary for each:
{list of absolute chunk file paths, e.g. $WORK_DIR/chunks/chunk_001.txt}

For context, here is the chunk metadata:
{chunk entries from metadata.json for assigned chunks}

Write your output to: {WORK_DIR}/summaries/section_{agent_number}.md

Use this format for your output file:

## {heading from chunk metadata}
**Source file**: {filename, if multi-file mode}
**Pages**: {start_page}-{end_page} (omit if pages are 0)

### Summary
[2-4 paragraphs summarizing the content]

### Key Points
- [Important point 1]
- [Important point 2]

### Notable Details
- [Specific data, statistics, quotes, or references worth preserving]

---

Repeat the above for each chunk you are assigned.
After writing the file, confirm completion.

Launch all agents in parallel (multiple Task tool calls in one message).

5c. Collect Results

After all agents complete:

Read all summary files: {WORK_DIR}/summaries/section_*.md
Read metadata.json for structure
Proceed to Step 6

5d. Clean Up

After producing the final output:

Send shutdown_request to all agents
TeamDelete to clean up

Step 6: Produce Final Output

The final deliverable is a .docx file placed in the same folder as the original document(s).

Output file naming:

Single file: {original_filename_without_ext}_summary.docx
Directory: Summary_{dirname}.docx

How to generate the file: Use the npm docx package to generate the .docx file from a Node.js script.

Also write a plain-text copy to {WORK_DIR}/final_summary.md for reference.

Document structure requirements (for the .docx):

Title page with document name, date, page/token counts
Table of Contents using HeadingLevel styles
Header with document title, footer with page numbers
Professional styling: Arial font, proper heading hierarchy, consistent spacing
Tables for structured data (file listings, effective dates, etc.)
Proper bullet lists (using numbering config, not unicode)
Page breaks between major sections

Content template for SINGLE FILE mode:

The .docx should contain:

Title: "Document Summary: {filename}"
Metadata block: Source path, pages, token count, sections processed
Executive Summary (Heading 1): 2-3 paragraphs covering what the document is about, main conclusions, intended audience, and key takeaways
Document Structure (Heading 1): Numbered outline with section headings and page ranges
Section Summaries (Heading 1): For each section:
- Section heading (Heading 2) with page range
- Summary paragraphs
- Key Points as bullet list
- Notable Details as bullet list
Key Findings and Takeaways (Heading 1): Numbered list of the most important findings
Notable Data and References (Heading 1): Key statistics, dates, figures, citations, named entities

Content template for MULTI-FILE mode:

The .docx should contain:

Title: "Document Collection Summary"
Metadata block: Source directory, file count, total tokens, total chunks
Files Processed table: columns for #, Filename, Type, Pages, Tokens, Chunks
Executive Summary (Heading 1): 2-3 paragraphs synthesizing themes across ALL documents
Per-Document Summaries (Heading 1): For each document:
- Document name (Heading 2) with type, pages, tokens
- Structure outline
- Summary paragraphs
- Key Points as bullet list
Cross-Document Findings (Heading 1): Themes and patterns that span multiple documents
Notable Data and References (Heading 1): Key statistics, dates, figures across all documents

Accuracy and QA (Required)

Anti-hallucination rules (include in ALL subagent prompts):

Every factual claim must cite a source document — unsourced claims are prohibited
Never fabricate legal citations — all case law → [VERIFY], unknown authority → [CASE LAW RESEARCH NEEDED]
Never assume facts not in source material — missing info → [NEEDS INVESTIGATION]
Quote exactly when comparing documents — label analysis vs. facts distinctly

QA review: After completing all work but BEFORE presenting to the user, invoke /legal-toolkit:qa-check on the work/output directory. Do not skip this step.

Error Handling

Path not found: Ask user to verify the path
Unsupported format: Supported types are .pdf, .docx, .txt, .md
Empty directory: No supported files found — ask user to check the folder
Empty extraction: File may be scanned/image-only. Delegate OCR to a subagent: launch an Agent (subagent_type: "general-purpose") with prompt: "Run /legal-toolkit:extract-text on {file_path} and write the extracted text to $WORK_DIR/{filename}_ocr.txt." Re-run chunking on the OCR output.
Agent failure: Read the unprocessed chunks directly and summarize yourself
Script not found: Verify the skill is installed (ls $SKILL_DIR/scripts/)