Batch scrape multiple URLs with progress tracking
Scrapes multiple URLs with progress tracking and batch processing.
/plugin marketplace add tyroneross/blog-content-scraper/plugin install tyroneross-blog-content-scraper@tyroneross/blog-content-scraper<url1> <url2> [url3...]Process multiple URLs: $ARGUMENTS
Parse the URLs from the arguments (comma or space separated) and use batch processing.
import { scrapeUrls } from '@tyroneross/blog-scraper/batch';
async function main() {
// Parse URLs from arguments
const input = '$ARGUMENTS';
const urls = input.split(/[,\s]+/).filter(u => u.startsWith('http'));
if (urls.length === 0) {
console.log('No valid URLs found. Provide URLs separated by spaces or commas.');
return;
}
console.log(`\nš Processing ${urls.length} URLs...\n`);
const result = await scrapeUrls(urls, {
concurrency: 3,
mode: 'smart',
onProgress: (p) => {
console.log(`Progress: ${p.completed}/${p.total} (${p.percentage}%)`);
}
});
console.log('\nš Results:');
console.log(`ā
Successful: ${result.stats.successful}`);
console.log(`ā Failed: ${result.stats.failed}`);
console.log(`ā±ļø Duration: ${result.stats.totalDurationMs}ms`);
console.log('\nš Extracted Articles:');
for (const article of result.articles.slice(0, 10)) {
console.log(`\n- ${article.title}`);
console.log(` Source: ${article.sourceUrl}`);
}
if (result.failed.length > 0) {
console.log('\nā Failed URLs:');
for (const f of result.failed) {
console.log(`- ${f.url}: ${f.error}`);
}
}
}
main().catch(console.error);
Run with: npx tsx <script-file>