From open-source-prep
Extracts formatting from .docx files into JSON and generates new documents with identical styles, tables, numbering but custom content via Python scripts.
npx claudepluginhub iamzhihuix/happy-claude-skills --plugin open-source-prepThis skill uses the workspace's default tool permissions.
Extract formatting information from existing Word documents (.docx) and use it to generate new documents with identical formatting but different content. This skill enables creating document templates, maintaining consistent formatting across multiple documents, and replicating complex Word document structures.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Extract formatting information from existing Word documents (.docx) and use it to generate new documents with identical formatting but different content. This skill enables creating document templates, maintaining consistent formatting across multiple documents, and replicating complex Word document structures.
Use this skill when the user:
Extract formatting information from an existing Word document to create a reusable format configuration.
python scripts/extract_format.py <template.docx> <output.json>
Example:
python scripts/extract_format.py "HY研制任务书.docx" format_template.json
What Gets Extracted:
Output: JSON file containing all format information (see references/format_config_schema.md for details)
Create a JSON file with the actual content for the new document. The content must follow the structure defined in references/content_data_schema.md.
Content Structure:
{
"metadata": {
"title": "Document Title",
"author": "Author Name",
"version": "1.0",
"date": "2025-01-15"
},
"sections": [
{
"type": "heading",
"content": "Section Title",
"level": 1,
"number": "1"
},
{
"type": "paragraph",
"content": "Paragraph text content."
},
{
"type": "table",
"rows": 3,
"cells": [
["Header 1", "Header 2"],
["Data 1", "Data 2"]
]
}
]
}
Supported Section Types:
heading - Headings with optional numberingparagraph - Text paragraphstable - Tables with configurable rows and columnspage_break - Page breaksSee assets/example_content.json for a complete example.
Generate a new Word document using the extracted format and prepared content.
python scripts/generate_document.py <format.json> <content.json> <output.docx>
Example:
python scripts/generate_document.py format_template.json new_content.json output_document.docx
Result: A new .docx file with the format from the template applied to the new content.
User asks: "I have a research task document. I need to create 5 more documents with the same format but different content."
python scripts/extract_format.py research_task_template.docx template_format.json
Create content files for each new document (content1.json, content2.json, etc.)
Generate documents:
python scripts/generate_document.py template_format.json content1.json document1.docx
python scripts/generate_document.py template_format.json content2.json document2.docx
# ... repeat for all documents
Extract format from a company template and generate reports, proposals, or specifications with consistent branding.
# One-time: Extract company template
python scripts/extract_format.py "Company Template.docx" company_format.json
# For each new document:
python scripts/generate_document.py company_format.json new_report.json "Monthly Report.docx"
Create multiple technical documents (specifications, test plans, manuals) with identical formatting.
# Extract from specification template
python scripts/extract_format.py spec_template.docx spec_format.json
# Generate multiple specs
python scripts/generate_document.py spec_format.json product_a_spec.json "Product A Spec.docx"
python scripts/generate_document.py spec_format.json product_b_spec.json "Product B Spec.docx"
The included example template (assets/hy_template_format.json) demonstrates a complete research task document format with:
Use this as a starting point for similar technical documents.
Modify scripts/extract_format.py to extract additional properties not covered by default:
Add new section types in scripts/generate_document.py:
See references/content_data_schema.md for extension guidelines.
Create a wrapper script to generate multiple documents:
import json
import subprocess
format_file = "template_format.json"
content_files = ["content1.json", "content2.json", "content3.json"]
for i, content_file in enumerate(content_files, 1):
output = f"document_{i}.docx"
subprocess.run([
"python", "scripts/generate_document.py",
format_file, content_file, output
])
The scripts require:
python-docx library: pip install python-docxNo additional dependencies are needed for the core functionality.
Both scripts include built-in help:
python scripts/extract_format.py --help
python scripts/generate_document.py --help
Read these for detailed information on file structures and available options.
Use these as references when creating your own format and content files.
Missing styles in output: Ensure style IDs in content data match those in format config. Check format.json for available style IDs.
Table formatting issues: Verify table dimensions (rows/columns) match between content data and format config. See format_config_schema.md for table structure.
Font not displaying correctly: Some fonts may not be available on all systems. Check that referenced fonts are installed.
Dependencies missing: Install required Python packages:
pip install python-docx
Test with examples first: Use the included hy_template_format.json and example_content.json to understand the workflow before extracting your own formats.
Start simple: Begin with basic headings and paragraphs, then add tables and complex formatting.
Validate JSON: Use a JSON validator to check content data files before generating documents.
Keep format configs: Store extracted format configurations for reuse across multiple projects.
Version control: Track both format configs and content data in version control for reproducible document generation.