Skill

doc-processor

Generates and parses office documents (PDF, DOCX, XLSX, PPTX, CSV) for reports, tabular data export, and uploaded file processing in Node.js/Python projects.

Javascript

Python

npx claudepluginhub rune-kit/rune --plugin @rune/analytics

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Document format utility. Generates and parses office documents (PDF, DOCX, XLSX, PPTX, CSV). Pure utility — no business logic, just format handling. Other skills call doc-processor when they need to produce or consume structured documents.

SKILL.md

Similar Skills

document-processing

Create, edit, and analyze office documents (PDF, DOCX, PPTX, XLSX) including text/table extraction, merging/splitting, form filling, and data analysis with pdfplumber, pypdf, pandas.

craft-workspace-webconsulting-skills

xlsx-generator

Generates and manipulates Excel XLSX files programmatically with Deno. Populates branded templates via placeholders, creates from JSON specs, extracts data, automates reports.

3 files

jwynia-agent-skills-1

generate-document

Generates PDF, DOCX, HTML, ODT, EPUB, RTF documents from markdown using pandoc. For user requests to create reports, export findings, code reviews, or save analysis as formatted files.

2 files10 tools

development-productivity

Stats

Stars67

Forks18

Last CommitApr 27, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

doc-processor

Purpose

Triggers

Called by docs when export to PDF/DOCX is requested
Called by marketing for generating PDF reports, PPTX presentations
Called by Rune Pro packs for business document generation
/rune doc-processor generate <format> <source> — manual document generation
/rune doc-processor parse <file> — manual document parsing

Calls (outbound)

None — pure L3 utility. Receives content, produces formatted output.

Called By (inbound)

docs (L2): export documentation to PDF/DOCX
marketing (L2): generate PDF reports, PPTX pitch decks
Rune Pro packs: business document generation (invoices, proposals, reports)
User: /rune doc-processor direct invocation

Format Reference

Supported Formats

Format	Generate	Parse	Node.js Library	Python Library
PDF	Yes	Yes (via Read tool)	jsPDF, Puppeteer (HTML→PDF)	reportlab, weasyprint
DOCX	Yes	Yes	docx (officegen)	python-docx
XLSX	Yes	Yes	ExcelJS	openpyxl
PPTX	Yes	Yes	pptxgenjs	python-pptx
CSV	Yes	Yes	Built-in (fs + string ops)	Built-in (csv module)
HTML	Yes	Yes	Built-in	Built-in

Library Selection

Detect project language from context:

If Node.js project → use Node.js libraries
If Python project → use Python libraries
If unclear → default to Node.js (wider ecosystem)
For HTML→PDF → prefer Puppeteer (best fidelity) or weasyprint (Python)

Executable Steps

Generate Mode

Step 1 — Determine Format and Template

Identify:

Target format (PDF, DOCX, XLSX, PPTX, CSV)
Content source (markdown, data object, template + data)
Styling requirements (brand colors, fonts, layout)
Output path

Step 2 — Select Generation Strategy

Source	Target	Strategy
Markdown → PDF	HTML intermediate	Render MD → HTML → Puppeteer → PDF
Markdown → DOCX	Direct conversion	Parse MD → docx library → DOCX
Data → XLSX	Direct write	Map data to sheets/cells → ExcelJS
Slides → PPTX	Template + data	Build slides from content → pptxgenjs
Data → CSV	Direct write	Serialize rows → CSV string → file
Any → HTML	Direct render	Template engine → HTML file

Step 3 — Generate Code

Produce the generation script:

PDF from Markdown:

// Strategy: Markdown → HTML → Puppeteer → PDF
const puppeteer = require('puppeteer');
const { marked } = require('marked');

async function generatePDF(markdownContent, outputPath, options = {}) {
  const html = `
    <!DOCTYPE html>
    <html>
    <head><style>${options.css || defaultCSS}</style></head>
    <body>${marked(markdownContent)}</body>
    </html>
  `;
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent(html, { waitUntil: 'networkidle0' });
  await page.pdf({ path: outputPath, format: 'A4', margin: { top: '1in', bottom: '1in', left: '1in', right: '1in' } });
  await browser.close();
}

XLSX from Data:

const ExcelJS = require('exceljs');

async function generateXLSX(data, outputPath, options = {}) {
  const workbook = new ExcelJS.Workbook();
  const sheet = workbook.addWorksheet(options.sheetName || 'Sheet1');
  if (data.length > 0) {
    sheet.columns = Object.keys(data[0]).map(key => ({ header: key, key, width: 20 }));
    data.forEach(row => sheet.addRow(row));
    // Style header row
    sheet.getRow(1).font = { bold: true };
    sheet.getRow(1).fill = { type: 'pattern', pattern: 'solid', fgColor: { argb: 'FFE0E0E0' } };
  }
  await workbook.xlsx.writeFile(outputPath);
}

PPTX from Slides:

const PptxGenJS = require('pptxgenjs');

function generatePPTX(slides, outputPath, options = {}) {
  const pptx = new PptxGenJS();
  pptx.author = options.author || 'Generated by Rune';
  slides.forEach(slide => {
    const s = pptx.addSlide();
    if (slide.title) s.addText(slide.title, { x: 0.5, y: 0.5, fontSize: 28, bold: true });
    if (slide.body) s.addText(slide.body, { x: 0.5, y: 1.5, fontSize: 16 });
    if (slide.bullets) s.addText(slide.bullets.map(b => ({ text: b, options: { bullet: true } })), { x: 0.5, y: 1.5, fontSize: 16 });
  });
  return pptx.writeFile({ fileName: outputPath });
}

Step 4 — Execute and Verify

Run the generation script. Verify:

Output file exists and is non-empty
File can be opened (basic format validation)
Content matches expected structure

Parse Mode

Step 1 — Detect Format

Identify file format from extension and MIME type.

Step 2 — Extract Content

Format	Extraction Strategy
PDF	Use Read tool (Claude can read PDFs natively)
DOCX	docx library → extract text, tables, images
XLSX	ExcelJS → extract sheets, rows, formulas
PPTX	pptxgenjs → extract slides, text, notes
CSV	Built-in parser → structured data

Step 3 — Structure Output

Return parsed content as structured data:

{
  "format": "xlsx",
  "sheets": [
    {
      "name": "Sheet1",
      "headers": ["Name", "Email", "Role"],
      "rows": [["Alice", "alice@co.com", "Engineer"], ...],
      "rowCount": 100
    }
  ]
}

Output Format

Generate Mode Output

Generated document file at specified output path
Verification report: file exists, non-empty, format valid

Document Generated:
- Format: [PDF/DOCX/XLSX/PPTX/CSV]
- Path: [output file path]
- Size: [file size]
- Strategy: [e.g., Markdown → HTML → Puppeteer → PDF]
- Status: verified ✓

Parse Mode Output

Structured JSON returned to calling skill:

{
  "format": "xlsx",
  "metadata": { "author": "...", "created": "..." },
  "content": {
    "sheets": [
      {
        "name": "Sheet1",
        "headers": ["Col1", "Col2"],
        "rows": [["val1", "val2"]],
        "rowCount": 100
      }
    ]
  }
}

Format-specific fields: sheets (XLSX), pages (PDF/DOCX), slides (PPTX), rows (CSV).

Constraints

MUST verify output file exists and is non-empty after generation
MUST handle missing libraries gracefully — suggest npm install / pip install if not found
MUST NOT embed secrets or sensitive data in generated documents
MUST preserve formatting fidelity — generated docs should look professional
Parse mode MUST handle malformed files gracefully — report errors, don't crash
MUST use appropriate library for each format — don't force one library for all formats

Sharp Edges

Failure Mode	Severity	Mitigation
Library not installed in project	HIGH	Check package.json/requirements.txt, suggest install command
PDF generation fails without headless browser	HIGH	Puppeteer needs chromium — suggest alternative (jsPDF) if unavailable
XLSX with formulas not evaluated	MEDIUM	Use ExcelJS formula support, warn if complex formulas
Large file generation runs out of memory	MEDIUM	Stream large datasets instead of loading all at once
Generated file is empty or corrupt	HIGH	Step 4 verification catches this — retry or report

Done When

Generate Mode

Target format and source identified
Generation strategy selected
Code produced and executed
Output file verified (exists, non-empty, valid format)

Parse Mode

File format detected
Content extracted to structured data
Output returned in consistent JSON format

Cost Profile

~1000-3000 tokens input, ~500-2000 tokens output. Sonnet — document processing requires understanding format libraries and generating correct code, but not deep reasoning.