Extract LLM-ready content with token counts and chunks from a URL
Extracts LLM-ready content from URLs with token counts and chunked segments.
/plugin marketplace add tyroneross/blog-content-scraper/plugin install tyroneross-blog-content-scraper@tyroneross/blog-content-scraper<url>Extract LLM-optimized content from: $ARGUMENTS
Use scrapeForLLM to get content formatted for AI/LLM use with token estimation.
import { scrapeForLLM } from '@tyroneross/blog-scraper/llm';
async function main() {
const url = '$ARGUMENTS';
console.log('Extracting LLM-ready content from:', url);
try {
const output = await scrapeForLLM(url);
console.log('\nš Content Stats:');
console.log('Title:', output.title);
console.log('Tokens:', output.tokens);
console.log('Chunks:', output.chunks.length);
console.log('Reading Level:', output.metadata.readingLevel);
console.log('\nš Frontmatter (for prompts):');
console.log(output.frontmatter);
console.log('\nš Content Preview (first 1500 chars):');
console.log(output.markdown.substring(0, 1500) + '...');
if (output.chunks.length > 1) {
console.log('\nš§© Chunks for RAG:');
output.chunks.forEach((c, i) => {
console.log(` Chunk ${i + 1}: ${c.tokens} tokens`);
});
}
} catch (error) {
console.error('ā Extraction failed:', error.message);
}
}
main();
Run with: npx tsx <script-file>
Report to user: