Extracts content from blog posts and news articles. Use when user asks to scrape a URL, extract article content, get text from a webpage, discover articles from a blog, parse RSS feeds, or needs LLM-ready content with token counts. Supports single articles, batch processing, and site-wide discovery.
Extracts full article content from any URL for LLM processing. Automatically detects and scrapes blog posts or news articles when users request content from a specific webpage, batch of URLs, or entire site.
/plugin marketplace add tyroneross/blog-content-scraper/plugin install tyroneross-blog-content-scraper@tyroneross/blog-content-scraperThis skill is limited to using the following tools:
Extract blog and news content from any website using the @tyroneross/blog-scraper SDK.
import { extractArticle } from '@tyroneross/blog-scraper';
const article = await extractArticle('https://example.com/blog/post');
// Returns: { title, markdown, text, html, wordCount, readingTime, excerpt, author }
import { scrapeForLLM } from '@tyroneross/blog-scraper/llm';
const { markdown, tokens, chunks, frontmatter } = await scrapeForLLM(url);
// tokens: estimated count for context window management
// chunks: pre-split for RAG applications
import { scrapeWebsite } from '@tyroneross/blog-scraper';
const result = await scrapeWebsite('https://techcrunch.com', {
maxArticles: 10,
extractFullContent: true
});
import { smartScrape } from '@tyroneross/blog-scraper';
const result = await smartScrape(url);
if (result.mode === 'article') {
console.log(result.article.title);
} else {
console.log(result.articles.length, 'articles found');
}
import { scrapeUrls } from '@tyroneross/blog-scraper/batch';
const result = await scrapeUrls(urls, { concurrency: 3 });
import { validateUrl } from '@tyroneross/blog-scraper/validation';
const { isReachable, robotsAllowed, suggestedAction } = await validateUrl(url);
| Property | Description |
|---|---|
title | Article title |
markdown | Formatted Markdown content |
text | Plain text (no formatting) |
html | Raw HTML content |
excerpt | Short summary |
author | Author name if detected |
publishedDate | Publication date |
wordCount | Total words |
readingTime | Estimated minutes to read |
Create a script file and run with:
npx tsx script.ts
| User Request | Function |
|---|---|
| "Extract this article" | extractArticle(url) |
| "Get content for LLM" | scrapeForLLM(url) |
| "Find articles on this site" | scrapeWebsite(url) |
| "Not sure if article or blog" | smartScrape(url) |
| "Process these 5 URLs" | scrapeUrls(urls) |
| "Can I scrape this?" | validateUrl(url) |