Use when generating PDFs from markdown with Pandoc - covers differences from Python-Markdown, blank line rules, fix scripts for labels/anchors/metadata, and visual testing workflow
When generating PDFs from markdown with Pandoc - covers differences from Python-Markdown, blank line rules, fix scripts for labels/anchors/metadata, and visual testing workflow
/plugin marketplace add SecurityRonin/ronin-marketplace/plugin install docs-skills@ronin-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill documents lessons learned from generating PDF documents from markdown using Pandoc, drawing from experiences with MkDocs HTML generation and applying systematic validation approaches.
| Feature | Python-Markdown (MkDocs) | Pandoc (PDF) |
|---|---|---|
Roman numerals (i., ii.) | ❌ Not supported | ✅ Supported |
| Grid tables | ⚠️ Needs extension | ✅ Native support |
LaTeX commands (\pagebreak) | ❌ Renders as text | ✅ Native support |
| Nested list indent | 4 spaces (strict) | More flexible |
| Footnotes continuation | 4-space indent required | More flexible |
Key Insight: Pandoc is MORE capable than Python-Markdown, but this means markdown that works for PDF might break in MkDocs!
Good News: Some formatting rules work consistently across both renderers!
Blank Line Rules (Universal):
Validation Method:
# Generate both outputs
mkdocs build --clean
./scripts/generate-pdf.sh
# Check MkDocs HTML rendering
grep -A 5 "For complete details, see:" site/soc2-type1/index.html
# Should show: <ul><li>...</li></ul>
# Check Pandoc PDF rendering
pdftotext output/Documentation.pdf - | grep -A 5 "For complete details, see:"
# Should show: • Bullet point
Implication: Fix markdown once, works for both HTML and PDF! This makes maintaining shared source files much easier.
When using same markdown files for both MkDocs (HTML) and Pandoc (PDF):
Option 1: Optimize for MkDocs (Current Approach)
\pagebreak, etc.Option 2: Optimize for Pandoc
Option 3: Separate Sources (Best for large projects)
docs/ for MkDocspdf-source/ for PDFOption 4: Conditional Formatting (Advanced)
./scripts/generate-pdf.sh
Check for errors:
CRITICAL: Actually open and read the PDF!
open output/Documentation.pdf
Checklist:
##)Check specific sections user mentioned:
For example, if user says "these should be bullet points":
**Access Removal:**
- Item one
- Item two
git add output/Documentation.pdf
git commit -m "docs: regenerate PDF with [specific improvements]"
Symptom: Text that should be headers (H2, H3) appears as regular paragraphs in PDF.
Root Cause: Markdown not properly formatted for Pandoc.
Check markdown:
# ✅ CORRECT - Header
## User Identification and Authentication
# ❌ WRONG - Plain text
User Identification and Authentication
Solution: Ensure headers have ## prefix, blank line before and after.
Symptom: Text shows dashes/bullets as characters, not formatted lists.
Root Cause:
Check markdown:
# ✅ CORRECT
**Access Removal:**
- Termination: Immediate revocation
- Role change: Adjusted within 5 days
# ❌ WRONG - No blank line
**Access Removal:**
- Termination: Immediate revocation
Solution:
- or *)Symptom:
[WARNING] Missing character: There is no ├ (U+251C) in font [lmmono10-regular]
Root Cause: Default LaTeX font doesn't support all Unicode characters (box-drawing, emojis, etc.)
Solutions:
Option 1: Change Font
# In pandoc command
--pdf-engine=xelatex
--variable mainfont="DejaVu Sans"
Option 2: Remove Special Characters
# Replace tree diagrams with ASCII
sed -i '' 's/├/+/g' file.md
sed -i '' 's/─/-/g' file.md
Option 3: Accept Warnings
Symptom: Tables overflow page width, text cut off.
Solutions:
Option 1: Rotate Table (Landscape)
\begin{landscape}
| Col 1 | Col 2 | Col 3 |
|-------|-------|-------|
| Data | Data | Data |
\end{landscape}
Option 2: Smaller Font in Table
\small
| Col 1 | Col 2 | Col 3 |
|-------|-------|-------|
| Data | Data | Data |
\normalsize
Option 3: Redesign Table
Symptom: Headers at bottom of page, orphaned content.
Solutions:
Option 1: Manual Page Breaks
\pagebreak
## Next Section
Option 2: Pandoc Variables
--variable pagestyle=headings
--variable geometry:margin=1in
Option 3: LaTeX Penalties
\widowpenalty=10000
\clubpenalty=10000
Symptom: Bold labels followed by lists render as inline text instead of separate formatted list.
Example in PDF:
Technology Changes: - New system implementations - Software upgrades - Infrastructure modifications
Root Cause: Pandoc requires blank line after bold labels (format: **Label:**) before lists.
Check markdown:
# ❌ WRONG - No blank line
**Technology Changes:**
- New system implementations
- Software upgrades
# ✅ CORRECT - Blank line after label
**Technology Changes:**
- New system implementations
- Software upgrades
Solution: Add blank line between bold label and list.
Automated Detection:
# Find all bold labels immediately followed by lists
grep -n '^\*\*[^*]*:\*\*$' file.md | while read line; do
num=$(echo $line | cut -d: -f1)
next=$((num + 1))
nextline=$(sed -n "${next}p" file.md)
if [[ $nextline =~ ^[-*] ]]; then
echo "Line $num: Missing blank line after bold label"
fi
done
Automated Fix: Use fix_pandoc_lists.py script (see Automation section below).
## CharactersSymptom: Headers render as plain text with literal ## characters visible.
Example in PDF:
## Fraud Risk Assessment
Root Cause: Pandoc requires blank line after HTML anchor tags before markdown headers.
Check markdown:
# ❌ WRONG - No blank line after anchor
<a name="fraud-risk"></a>
## Fraud Risk Assessment
# ✅ CORRECT - Blank line after anchor
<a name="fraud-risk"></a>
## Fraud Risk Assessment
Why This Happens: Pandoc treats HTML and markdown as separate contexts. Without blank line, it doesn't recognize the ## as a markdown header.
Solution: Add blank line between HTML anchor and header.
Automated Detection:
# Find anchors immediately followed by headers
grep -n '^<a name=' file.md | while read line; do
num=$(echo $line | cut -d: -f1)
next=$((num + 1))
nextline=$(sed -n "${next}p" file.md)
if [[ $nextline =~ ^## ]]; then
echo "Line $num: Missing blank line after anchor"
fi
done
Automated Fix: Use fix_pandoc_anchors.py script (see Automation section below).
Symptom: Consecutive metadata fields render on single line instead of separate lines.
Example in PDF:
Title: Report Name Author: Your Name Date: January 2025
Root Cause: Pandoc requires blank lines between consecutive paragraphs. Without them, it merges lines into continuous text flow.
Check markdown:
# ❌ WRONG - No blank lines between
**Organization:** Example Corp
**Audit Type:** SOC 2 Type 1
**Scope:** Security (CC1-CC9)
# ✅ CORRECT - Blank lines between each
**Organization:** Example Corp
**Audit Type:** SOC 2 Type 1
**Scope:** Security (CC1-CC9)
Solution: Add blank lines between consecutive bold label lines.
Automated Detection:
# Find consecutive bold label lines
grep -n '^\*\*[^*]*:\*\* ' file.md | \
awk 'NR > 1 && $1 == prev+1 {print "Lines " prev "-" $1 ": Consecutive bold labels"} {prev=$1}'
Automated Fix: Use fix_pandoc_metadata.py script (see Automation section below).
Symptom: Plain text (not bold) ending with colon followed by list renders inline.
Example in PDF:
The security program aligns with: - SOC 2 - ISO 27001 - NIST Framework
Root Cause: Same as Issue 6, but for plain text labels instead of bold.
Check markdown:
# ❌ WRONG - No blank line
The security program aligns with:
- SOC 2 Trust Services Criteria
- ISO 27001 control framework
# ✅ CORRECT - Blank line after plain text label
The security program aligns with:
- SOC 2 Trust Services Criteria
- ISO 27001 control framework
Solution: Add blank line after any text ending with : when followed by list.
Automated Fix: Enhanced fix_pandoc_lists.py handles both bold and plain text labels.
Purpose: Fix bold and plain text labels before lists.
Usage:
python3 fix_pandoc_lists.py
What it fixes:
**Label:** → blank line → listText: → blank line → listExample output:
Processing 03-risk-assessment.md...
Line 186: Added blank line after '**Technology Changes:**'
Line 265: Added blank line after 'The security program aligns with:'
✅ Fixed 03-risk-assessment.md
Script location: Project root directory
Purpose: Fix HTML anchors before headers.
Usage:
python3 fix_pandoc_anchors.py
What it fixes:
<a name="..."></a> → blank line → ## HeaderExample output:
Processing 03-risk-assessment.md...
Line 141: Added blank line after '<a name="fraud-risk"></a>'
✅ Fixed 03-risk-assessment.md
Script location: Project root directory
Purpose: Fix consecutive bold label metadata fields.
Usage:
python3 fix_pandoc_metadata.py
What it fixes:
**Label:** value lines → add blank lines between themExample output:
Processing index.md...
Line 3: Added blank line after '**Organization:** Example Corp'
Line 4: Added blank line after '**Audit Type:** SOC 2 Type 1'
✅ Fixed index.md
Script location: Project root directory
Complete fix workflow:
# Fix all Pandoc formatting issues
python3 fix_pandoc_lists.py # Lists after labels
python3 fix_pandoc_anchors.py # Anchors before headers
python3 fix_pandoc_metadata.py # Consecutive metadata
# Regenerate PDF
./scripts/generate-pdf.sh
# Visual verification
open output/Documentation.pdf
When to run:
pandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex
pandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--toc \
--toc-depth=3 \
--number-sections
pandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--metadata title="Document Title" \
--metadata author="Author Name" \
--metadata date="$(date +%Y-%m-%d)"
pandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--template=custom-template.tex
Copy this checklist for each PDF generation:
## PDF Generation Test - [DATE]
### Generation Phase
- [ ] Script runs without errors
- [ ] PDF file created
- [ ] File size reasonable (< 10MB for typical docs)
### Visual Inspection Phase
- [ ] Opened PDF and scrolled through ALL pages
- [ ] Cover page correct
- [ ] TOC complete and accurate
- [ ] All headers styled correctly (no literal `##`)
- [ ] All bullets formatted as lists (not inline)
- [ ] All numbered lists formatted correctly (not inline)
- [ ] Bold/plain labels before lists properly spaced
- [ ] Metadata fields on separate lines (not run together)
- [ ] All tables fit on pages
- [ ] No obviously bad page breaks
- [ ] No missing content
- [ ] Font rendering acceptable
### Specific Checks (from user feedback)
- [ ] [Specific section] renders correctly
- [ ] [Specific formatting] matches intent
- [ ] [Specific issue] is fixed
### Final Validation
- [ ] PDF matches markdown source intent
- [ ] All user-reported issues addressed
- [ ] Ready for commit
**Issues Found:** [List any issues]
**Next Steps:** [What needs fixing]
Create: scripts/test-pdf.sh
#!/bin/bash
# Test PDF generation and basic quality checks
set -e
# Generate PDF
./scripts/generate-pdf.sh
PDF="output/Documentation.pdf"
# Check file exists
if [ ! -f "$PDF" ]; then
echo "❌ PDF not generated"
exit 1
fi
# Check file size (should be between 100KB and 10MB)
SIZE=$(stat -f%z "$PDF" 2>/dev/null || stat -c%s "$PDF")
if [ $SIZE -lt 100000 ]; then
echo "⚠️ WARNING: PDF seems too small ($SIZE bytes)"
elif [ $SIZE -gt 10000000 ]; then
echo "⚠️ WARNING: PDF seems too large ($SIZE bytes)"
else
echo "✅ PDF size OK: $(numfmt --to=iec-i --suffix=B $SIZE)"
fi
# Check page count (using pdfinfo if available)
if command -v pdfinfo &> /dev/null; then
PAGES=$(pdfinfo "$PDF" | grep "Pages:" | awk '{print $2}')
echo "📄 Pages: $PAGES"
if [ $PAGES -lt 50 ]; then
echo "⚠️ WARNING: Expected ~89 pages, got $PAGES"
fi
fi
echo ""
echo "✅ Basic checks passed!"
echo "📋 Next: Open PDF and visually inspect"
echo " open $PDF"
The "Blank Line Rule": Pandoc requires blank lines in these situations:
Quick Check Commands:
# Check for labels before lists (no blank line)
grep -B1 '^[-*] ' file.md | grep ':$' | grep -v '^--$'
# Check for anchors before headers (no blank line)
grep -A1 '^<a name=' file.md | grep '^##'
# Check for consecutive bold labels
grep '^\*\*[^*]*:\*\* ' file.md | uniq -c | grep -v '^ *1 '
When in doubt: Add a blank line. Pandoc almost never complains about too many blank lines.
Project: Large documentation set Files: 15 markdown files Issues Found: 469 formatting problems across 4 categories
Fixes Applied:
Time Investment:
ROI: 3 hours invested, automated solution for future. All issues fixed in 5 minutes.
Status: Production-ready with automation scripts
Master Bash Automated Testing System (Bats) for comprehensive shell script testing. Use when writing tests for shell scripts, CI/CD pipelines, or requiring test-driven development of shell utilities.