From example-skills
Creates, edits, reads, analyzes, and converts .docx files using docx-js for generation, Python unpack/pack for XML edits, pandoc for extraction, and LibreOffice headless.
npx claudepluginhub joshuarweaver/cascade-code-general-misc-3 --plugin marcelleon-skills-zhThis skill uses the workspace's default tool permissions.
`.docx` 本质上是 ZIP 包内的一组 XML。
LICENSE.txtscripts/__init__.pyscripts/accept_changes.pyscripts/comment.pyscripts/office/helpers/__init__.pyscripts/office/helpers/merge_runs.pyscripts/office/helpers/simplify_redlines.pyscripts/office/pack.pyscripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsdscripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsdscripts/office/schemas/ISO-IEC29500-4_2016/pml.xsdscripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsdscripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsdscripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsdApplies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.
Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.
.docx 本质上是 ZIP 包内的一组 XML。
可分三类任务:
docx-jsunpack -> edit XML -> packpandoc、unpack、渲染图片核验| 任务 | 推荐方式 |
|---|---|
| 读取/分析正文 | pandoc 或解包看 XML |
| 新建 DOCX | docx-js + scripts/office/validate.py |
| 修改现有 DOCX | scripts/office/unpack.py -> XML 编辑 -> scripts/office/pack.py |
.doc 转 .docxpython scripts/office/soffice.py --headless --convert-to docx document.doc
# 含修订痕迹的文本导出
pandoc --track-changes=all document.docx -o output.md
# 读取原始 XML
python scripts/office/unpack.py document.docx unpacked/
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
python scripts/accept_changes.py input.docx output.docx
先安装:npm install -g docx
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
TabStopType, TabStopPosition, Column, SectionType,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
python scripts/office/validate.py doc.docx
校验失败时:解包修 XML 再回包,不要直接忽略。
docx-js 默认 A4。若是美式文档,要显式设 US Letter:
sections: [{
properties: {
page: {
size: { width: 12240, height: 15840 }, // 8.5 x 11 inch (DXA)
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch
}
},
children: []
}]
横向模式要点:传入短边 width、长边 height,再设 orientation: PageOrientation.LANDSCAPE,库会自动交换。
id 必须精确:Heading1、Heading2outlineLevel(H1=0, H2=1)•numbering + LevelFormat.BULLETcolumnWidths + 单元格 widthcolumnWidths 总和WidthType.DXA,不要用百分比(Google Docs 兼容性差)shading 使用 ShadingType.CLEAR,不要 SOLIDmargins 作为内边距,提升可读性ImageRun 必须写 type(png/jpg/svg 等)且补齐 altText。
这些场景按 docx 官方 API 正规建模,不要用“文本模拟布局”的捷径。
尤其注意:
PageBreak 必须放在 Paragraph 内Bookmark 再 InternalHyperlinkTableOfContents,标题必须来自 HeadingLevel\n 代替段落python scripts/office/unpack.py document.docx unpacked/
该脚本会做 pretty-print、run 合并和智能引号实体处理。需要时可 --merge-runs false。
编辑 unpacked/word/ 下文件。
默认作者名使用 Claude(除非用户要求其他名字)。
<w:t>Here’s a quote: “Hello”</w:t>
| Entity | 字符 |
|---|---|
‘ | ‘ |
’ | ’ |
“ | “ |
” | ” |
批注可用脚本生成样板:
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author"
python scripts/office/pack.py unpacked/ output.docx --original document.docx
默认包含校验与自动修复。--validate false 可跳过,但不建议常态使用。
自动修复覆盖:
durableId 越界<w:t> 缺少 xml:space="preserve"(有前后空白时)不会自动修复:
<w:pPr> 内元素顺序要合法:pStyle -> numPr -> spacing -> ind -> jc -> rPrxml:space="preserve"<w:ins>,删除用 <w:del><w:del> 内文本必须是 <w:delText>,不是 <w:t><w:pPr><w:rPr> 补 <w:del/>,否则接受修订后会残留空段落<w:ins> 内嵌你的 <w:del><w:del>,再新增你的 <w:ins>commentRangeStart/End 是 <w:p> 的子节点,不可塞进 <w:r>--parent 建立层级document.xml 同步插入 commentReference必须四步齐全:
word/media/ 放入图片word/_rels/document.xml.rels 增加 relationship[Content_Types].xml 增加扩展名映射document.xml 中用 r:embed 引用对应 rIdpandoc:文本抽取docx:npm install -g docxscripts/office/soffice.py 适配沙箱)pdftoppm 出图核验