From milan-jovanovic
Scrapes new articles from Milan Jovanovic's .NET blog (post-November 2025) using optimized pre-filtering from listing page, Firecrawl scraping, and Python scripts to target only new or changed content.
npx claudepluginhub melodic-software/claude-code-plugins --plugin milan-jovanovicThis skill is limited to using the following tools:
Scrape new articles from Milan Jovanovic's .NET blog with **optimized pre-filtering**. Parses dates from listing page to avoid unnecessary per-article scraping.
Searches Milan Jovanovic's .NET blog (Nov 2025+) for Clean Architecture, DDD, CQRS, EF Core, ASP.NET Core patterns, code examples, and .NET 10 guidance. Invoke for .NET architecture.
Automatically scrapes websites by analyzing page structure, handling pagination/anti-blocking, discovering article series using Playwright and Crawl4AI. Zero config needed.
Monitors blogs and RSS/Atom feeds for updates via blogwatcher CLI. Add/track blogs, scan for new articles, list them, and mark read. Useful for staying updated on dev blogs.
Share bugs, ideas, or general feedback.
Scrape new articles from Milan Jovanovic's .NET blog with optimized pre-filtering. Parses dates from listing page to avoid unnecessary per-article scraping.
--force: Re-scrape all articles (compare content hash to skip unchanged)--since YYYY-MM-DD: Custom date filter (default: 2025-11-01)--limit N: Limit number of articles (for testing)--dry-run: Preview what would be scraped without savingInvoke the milan-jovanovic:milan-jovanovic-blog skill to load context and access scripts.
Key efficiency optimization: Parse dates from listing page BEFORE scraping individual articles.
Scrape the blog listing page using firecrawl_scrape:
URL: https://www.milanjovanovic.tech/blog
Format: markdown
Save listing content to temp file (e.g., .claude/temp/milan-listing.md)
Run pre-filter script to identify articles needing scraping:
# Normal mode - only new articles
python scripts/core/check_new_articles.py .claude/temp/milan-listing.md --json --since 2025-11-01
# Force mode - include existing for re-check
python scripts/core/check_new_articles.py .claude/temp/milan-listing.md --json --force --since 2025-11-01
Parse JSON output to get to_scrape list. If empty, skip to Step 5 (no scraping needed).
For each article in to_scrape:
For articles with in_index: false (new):
firecrawl_scrapecanonical/milanjovanovic-tech/blog/{slug}.mdFor articles with in_index: true (force mode re-check):
firecrawl_scrapecontent_hash from pre-filter outputAfter scraping completes:
python scripts/management/refresh_index.py
Report:
The scraper removes these promotional patterns:
Footer patterns (stop processing):
Sponsor patterns (remove section):
Inline patterns (remove):
| Scenario | Without Optimization | With Optimization |
|---|---|---|
| No new articles | 10+ firecrawl requests | 1-2 requests |
| 1 new article | 10+ firecrawl requests | 2-3 requests |
| Force (unchanged) | 10+ requests | 10+ requests but skips writes |
Why this matters: Firecrawl has API costs and rate limits. Pre-filtering saves 80-90% of requests when articles haven't changed.
/milan-jovanovic:scrape-posts
/milan-jovanovic:scrape-posts --limit 3 --dry-run
/milan-jovanovic:scrape-posts --force
/milan-jovanovic:scrape-posts --since 2025-12-01
If firecrawl MCP is not connected, the command will fail. Ensure the firecrawl MCP server is configured and running.
If listing page dates can't be parsed, the script logs them in no_date category. These articles are skipped unless you provide a specific URL.
If check_new_articles.py shows 0 articles to scrape:
--force to re-check)--since)