Extract article content from a blog or news URL
Extracts article content from blog or news URLs and converts to markdown.
/plugin marketplace add tyroneross/blog-content-scraper/plugin install tyroneross-blog-content-scraper@tyroneross/blog-content-scraper<url>Extract content from the URL: $ARGUMENTS
First, determine if the URL looks like a single article or a listing page
Use the appropriate scraper function:
extractArticle(url)scrapeWebsite(url, { maxArticles: 5 })smartScrape(url)Create a temporary script to run the scraper:
import { smartScrape } from '@tyroneross/blog-scraper';
async function main() {
const url = '$ARGUMENTS';
console.log('Scraping:', url);
const result = await smartScrape(url);
if (result.mode === 'article') {
console.log('\nš Article Extracted:\n');
console.log('Title:', result.article.title);
console.log('Words:', result.article.wordCount);
console.log('Reading time:', result.article.readingTime, 'min');
console.log('\n--- Excerpt ---');
console.log(result.article.excerpt);
console.log('\n--- Full Markdown ---');
console.log(result.article.markdown.substring(0, 2000) + '...');
} else if (result.mode === 'listing') {
console.log('\nš Articles Discovered:', result.articles.length);
for (const a of result.articles.slice(0, 5)) {
console.log(`\n- ${a.title}`);
console.log(` ${a.url}`);
if (a.publishedDate) console.log(` Published: ${a.publishedDate}`);
}
} else {
console.log('ā Failed:', result.error);
}
}
main().catch(console.error);
Run with: npx tsx <script-file>
Report results to the user including: