Extract and analyze Excel workbooks (1MB-50MB+) with minimal token usage. Preserves formulas, cell formatting, and complex table structures through local extraction and sheet-based chunking.
Extracts and analyzes large Excel workbooks with minimal token usage, preserving formulas, formatting, and complex structures through local extraction and intelligent chunking.
/plugin marketplace add diegocconsolini/ClaudeSkillCollection/plugin install xlsx-smart-extractor@security-compliance-marketplaceUse this agent when:
⚠️ IMPORTANT: Cache Location
Extracted content is stored in a user cache directory, NOT the working directory:
Cache locations by platform:
~/.claude-cache/xlsx/{workbook_name}_{hash}/C:\Users\{username}\.claude-cache\xlsx\{workbook_name}_{hash}\Why cache directory?
Cache contents:
full_workbook.json - Complete workbook datasheet_{name}.json - Individual sheet filesnamed_ranges.json - Named ranges and tablesmetadata.json - Workbook metadatamanifest.json - Cache manifestAccessing cached content:
# List all cached workbooks
python scripts/query_xlsx.py list
# Query cached content
python scripts/query_xlsx.py search {cache_key} "search query"
# Find cache location (shown in extraction output)
# Example: ~/.claude-cache/xlsx/ComplianceMatrix_a1b2c3d4/
To extract files to working directory:
# Option 1: Use --output-dir flag during extraction
python3 scripts/extract_xlsx.py workbook.xlsx --output-dir ./extracted
# Option 2: Copy from cache manually
cp -r ~/.claude-cache/xlsx/{cache_key}/* ./extracted_content/
Note: Cache is local and not meant for version control. Keep original Excel files in the repository and extract locally on each development machine (one-time operation).
# Extract to cache (default)
python3 scripts/extract_xlsx.py /path/to/workbook.xlsx
# Extract and copy to working directory (interactive prompt)
python3 scripts/extract_xlsx.py /path/to/workbook.xlsx
# Will prompt: "Copy files? (y/n)"
# Will ask: "Keep cache? (y/n)"
# Extract and copy to specific directory (no prompts)
python3 scripts/extract_xlsx.py /path/to/workbook.xlsx --output-dir ./extracted
What happens:
ComplianceMatrix_a8f9e2c1)Output files:
full_workbook.json - All sheets with full datasheets/*.json - Individual sheet data filesformulas.json - All formulas extractedmetadata.json - Workbook metadatanamed_ranges.json - Named ranges and tablesmanifest.json - Extraction summaryPerformance:
python3 scripts/chunk_sheets.py <cache_key>
What happens:
Output files:
chunks/index.json - Chunk metadata and locationschunks/chunk_001.json - Individual chunk datachunks/chunk_002.json - ...Statistics:
# Search by keyword
python3 scripts/query_xlsx.py search <cache_key> "password policy"
# Get specific sheet
python3 scripts/query_xlsx.py sheet <cache_key> "Controls"
# Get cell range
python3 scripts/query_xlsx.py range <cache_key> "Sheet1!A1:E10"
# Get workbook summary
python3 scripts/query_xlsx.py summary <cache_key>
What happens:
Results format:
Found 3 result(s) for query: "password policy"
1. Sheet: Controls
Range: A5:E5
Relevance: 100%
Content:
A5: "AC-2"
B5: "Password Policy Implementation"
C5: "Configure password complexity..."
D5: "Evidence.docx"
E5: "Complete"
Tokens: 85
2. Sheet: Evidence
Range: B12:C12
Relevance: 95%
Content:
B12: "Password policy documented"
C12: "2025-10-15"
Tokens: 32
Total tokens: 117 (vs 45,892 full workbook = 392x reduction)
Scenario: ISO 27001 compliance tracking spreadsheet (5MB, 12 sheets, 500 controls)
Workflow:
python3 scripts/extract_xlsx.py iso27001_controls.xlsxpython3 scripts/query_xlsx.py search iso27001_controls_a8f9e2 "A.9.2.1"python3 scripts/query_xlsx.py sheet iso27001_controls_a8f9e2 "Evidence"Benefits:
Scenario: Revenue projection model (15MB, 8 sheets, complex formulas)
Workflow:
python3 scripts/extract_xlsx.py revenue_model.xlsxpython3 scripts/query_xlsx.py summary revenue_model_f3a8c1python3 scripts/query_xlsx.py search revenue_model_f3a8c1 "formula:SUM"python3 scripts/query_xlsx.py range revenue_model_f3a8c1 "Projections!A1:Z50"Benefits:
Scenario: Security event export (20MB, 50,000 rows, 30 columns)
Workflow:
python3 scripts/extract_xlsx.py security_logs.xlsxpython3 scripts/query_xlsx.py summary security_logs_b9d2e1python3 scripts/query_xlsx.py search security_logs_b9d2e1 "failed"python3 scripts/query_xlsx.py range security_logs_b9d2e1 "Logs!A1:F1000"Benefits:
User message: "I have a compliance matrix in Excel that maps ISO 27001 controls to our implementation evidence. Can you analyze it?"
Response: Extracting and analyzing the compliance matrix Excel file using the xlsx-analyzer plugin.
[Extract workbook] [Query for ISO control structure] [Provide summary of controls, evidence status, completion rates]
User message: "This revenue projection model has 8 sheets and complex formulas. Can you help me understand the calculation logic?"
Response: Extracting the financial model and analyzing its structure and formulas using the xlsx-analyzer plugin.
[Extract workbook] [Get workbook summary] [Extract formula patterns] [Explain calculation flow]
User message: "In this 10MB workbook, I need to find all cells that reference 'password policy' - can you help?"
Response: Searching the workbook for 'password policy' references using the xlsx-analyzer plugin.
[Extract workbook] [Search for keyword] [Return matching cells with sheet names and cell references]
Not supported:
xlrd separately)pyxlsb separately)odfpy separately)Cell Values:
Cell Formatting:
Formulas:
Workbook Structure:
Small sheets (< 1000 cells):
Wide tables (> 20 columns):
Long tables (> 500 rows):
Named ranges:
Tokens estimated using character count / 4 (approximation):
Actual token usage may vary with model (Claude uses different tokenizer than GPT).
1. File Not Found
Error: Excel file not found: /path/to/file.xlsx
Solution: Verify file path and permissions.
2. Corrupted Workbook
Error: Failed to open workbook: zipfile.BadZipFile
Solution: File may be corrupted. Try opening in Excel and re-saving.
3. Password Protected
Error: Workbook is password protected
Solution: openpyxl cannot open password-protected files. Remove protection first.
4. External Data Connections
Warning: Workbook contains external data connections (ignored)
Solution: External connections are not extracted. Only static data is preserved.
5. Unsupported Features
Warning: Pivot tables detected but not fully extracted
Warning: Charts detected but not extracted
Warning: VBA macros detected but not extracted
Solution: These features are noted in metadata but not extracted in detail.
Large workbook handling:
~/.claude-xlsx-cache/# Python 3.8+
python3 --version
# Install dependencies
pip3 install openpyxl>=3.1.0 pandas>=2.0.0
# Test openpyxl
python3 -c "import openpyxl; print('openpyxl available')"
# Test pandas
python3 -c "import pandas; print('pandas available')"
Solution:
pip3 install openpyxl
Solution:
chmod 755 ~/.claude-xlsx-cache/
Possible causes:
Solution: Use --force flag and check for warnings.
Solution: Process sheets one at a time instead of loading entire workbook.
Use this agent when analyzing conversation transcripts to find behaviors worth preventing with hooks. Examples: <example>Context: User is running /hookify command without arguments user: "/hookify" assistant: "I'll analyze the conversation to find behaviors you want to prevent" <commentary>The /hookify command without arguments triggers conversation analysis to find unwanted behaviors.</commentary></example><example>Context: User wants to create hooks from recent frustrations user: "Can you look back at this conversation and help me create hooks for the mistakes you made?" assistant: "I'll use the conversation-analyzer agent to identify the issues and suggest hooks." <commentary>User explicitly asks to analyze conversation for mistakes that should be prevented.</commentary></example>