Batch-process web pages via headless Playwright browser, extract HTML, convert to markdown using Turndown, and save to timestamped scratchpad file. Use when user asks to "capture these pages as markdown", "save web content", "fetch and convert webpages", or needs clean markdown from HTML. All URLs from one prompt → single file at docs/web-captures/<timestamp>.md.
/plugin marketplace add otrebu/agents/plugin install knowledge-work@otrebu-dev-toolsThis skill is limited to using the following tools:
package.jsonpnpm-lock.yamlscripts/convert-and-append.tsscripts/html-to-markdown.tsscripts/scrape-and-convert.tstests/html-to-markdown.test.tstests/scrape-and-convert.test.tstsconfig.jsonCaptures web pages using headless Playwright browser automation (handles JavaScript-rendered content), converts HTML to clean markdown via Turndown library, and saves all URLs from a single request into one timestamped file.
Key features:
cd skills/web-to-markdown && pnpm installWhen user provides URLs and asks to:
Output: docs/web-captures/YYYYMMDD_HHMMSS.md containing all pages.
cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts <url1> [url2] [url3] ...
That's it! Script handles:
Single URL:
cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts https://example.com/docs
# Output: docs/web-captures/20251103_143052.md
Multiple URLs (Batch):
cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts \
https://example.com/guide \
https://example.com/api \
https://example.com/faq
# Output: docs/web-captures/20251103_143052.md (all 3 pages)
From project root:
pnpm --filter @skills/web-to-markdown tsx scripts/scrape-and-convert.ts <urls...>
# Web Captures - YYYY-MM-DD HH:MM:SS
Generated: YYYYMMDD_HHMMSS
URLs: N
---
## 📄 https://example.com/page1
[Converted markdown content...]
---
## 📄 https://example.com/page2
[Converted markdown content...]
---
TypeScript + FP Patterns:
BrowserError, FileError, HtmlConversionErrorFile Structure:
skills/web-to-markdown/
├── SKILL.md # This file (workflow instructions)
├── package.json # pnpm workspace config
├── tsconfig.json # TypeScript config
├── scripts/
│ ├── scrape-and-convert.ts # Main CLI (Playwright + Turndown)
│ ├── html-to-markdown.ts # Pure conversion function (Turndown wrapper)
│ └── convert-and-append.ts # Legacy CLI (deprecated, kept for reference)
└── tests/
└── html-to-markdown.test.ts # Unit tests
pnpm exec playwright install chromiumDefault timeout: 30 seconds per page
To customize, edit DEFAULT_CONFIG in scripts/scrape-and-convert.ts:
const DEFAULT_CONFIG = {
outputDir: 'docs/web-captures',
timeout: 30000, // milliseconds
};
| Feature | web-to-markdown | scratchpad-fetch | Jina AI Reader |
|---|---|---|---|
| Transport | Playwright (headless) | curl (HTTP) | Cloud API |
| JavaScript | ✅ Full rendering | ❌ No | ✅ Server-side |
| Conversion | ✅ Turndown | ❌ Raw HTML | ✅ LLM-powered |
| Self-hosted | ✅ Yes | ✅ Yes | ❌ Cloud only |
| Setup | pnpm install | None | API key |
| Speed | Medium (2-5s/page) | Fast (<1s) | Fast (~2s) |
| Visible browser | ❌ No (headless) | N/A | N/A |
"Executable doesn't exist" error:
cd skills/web-to-markdown
pnpm exec playwright install chromium
Pages timing out:
DEFAULT_CONFIGEmpty markdown output:
Implement GDPR-compliant data handling with consent management, data subject rights, and privacy by design. Use when building systems that process EU personal data, implementing privacy controls, or conducting GDPR compliance reviews.
Create employment contracts, offer letters, and HR policy documents following legal best practices. Use when drafting employment agreements, creating HR policies, or standardizing employment documentation.