Add Instagram accounts or web aggregators to sources.yaml (web sources require profiling)
Adds Instagram accounts and web aggregator URLs to sources.yaml with automatic profiling.
/plugin marketplace add aniketpanjwani/local_media_tools/plugin install newsletter-events@local-media-toolsAdd new event sources to your configuration. Web aggregators require profiling to discover optimal scraping strategy.
| Input | Type | Workflow |
|---|---|---|
@handle | Simple add (no profiling) | |
http:// or https:// URL | Web Aggregator | Requires profiling with Firecrawl |
facebook.com/events/* | Not stored - use /research directly |
# Instagram accounts (simple)
/newsletter-events:add-source @localvenue @musicbar
# Web aggregator (will be profiled)
/newsletter-events:add-source https://hudsonvalleyevents.com
# Mix of sources
/newsletter-events:add-source @venue1 https://events.com
import re
sources = {"instagram": [], "web": [], "facebook": []}
for token in user_input.replace(",", " ").replace(" and ", " ").split():
token = token.strip()
if not token:
continue
if "facebook.com/events/" in token:
sources["facebook"].append(token)
elif token.startswith("@") or re.match(r"^[a-zA-Z][a-zA-Z0-9_.]+$", token):
sources["instagram"].append(token.lstrip("@").lower())
elif token.startswith("http://") or token.startswith("https://"):
sources["web"].append(token)
For Facebook URLs, inform user:
Facebook events are not stored in configuration.
Use: /research https://facebook.com/events/123456
from pathlib import Path
import yaml
config_path = Path.home() / ".config" / "local-media-tools" / "sources.yaml"
if not config_path.exists():
print("ERROR: sources.yaml not found. Run /newsletter-events:setup first.")
# STOP
with open(config_path) as f:
config = yaml.safe_load(f)
For each Instagram handle:
Check if already exists:
existing = {a["handle"].lower() for a in config["sources"]["instagram"]["accounts"]}
If new, add with defaults:
new_account = {
"handle": handle,
"name": handle.replace("_", " ").title(),
"type": "venue",
}
config["sources"]["instagram"]["accounts"].append(new_account)
existing_urls = {s["url"].lower() for s in config["sources"]["web_aggregators"]["sources"]}
if url.lower() in existing_urls:
# Skip - already exists
continue
from urllib.parse import urlparse
domain = urlparse(url).netloc.replace("www.", "")
name = domain.split(".")[0].replace("-", " ").title()
new_source = {
"url": url,
"name": name,
"source_type": "listing",
"max_pages": 50,
}
The plugin includes a CLI tool for profiling. First get the plugin path, then run the profiler. DO NOT try to import Python modules directly - use the CLI tool. </critical>
Step 1: Get plugin directory:
cat ~/.claude/plugins/installed_plugins.json | jq -r '.plugins["newsletter-events@local-media-tools"][0].installPath'
Save the output path as PLUGIN_DIR.
Step 2: Run the profiler:
cd "$PLUGIN_DIR" && uv run python scripts/profile_source.py "{url}"
This returns JSON with discovery_method, event_urls, and suggested regex pattern.
The profiler will:
map_url() first (fast sitemap/link discovery)crawl_url() if map finds < 5 event URLs/events?/, /calendar/, /shows?/, etc.Analyze discovered URLs to generate regex pattern:
# Examples:
# /events/jazz-night ā r"/events/[^/]+$"
# /event/a-frosty-fest/76214 ā r"/event/[^/]+/\d+/?$"
š Source Profile: {name}
URL: {url}
Discovery: {discovery_method}
Found: {len(event_urls)} event URLs
Pattern: {learned_regex}
Samples:
⢠{event_urls[0]}
⢠{event_urls[1]}
⢠{event_urls[2]}
Save this profile? (Y/n)
Wait for user confirmation.
from datetime import datetime
new_source["profile"] = {
"discovery_method": discovery_method,
"crawl_depth": 2,
"event_url_regex": learned_regex,
"sample_event_urls": event_urls[:5],
"notes": f"Discovered {len(event_urls)} event URLs.",
"profiled_at": datetime.now().isoformat(),
}
config["sources"]["web_aggregators"]["enabled"] = True
config["sources"]["web_aggregators"]["sources"].append(new_source)
If both map and crawl find 0 event URLs:
ā ļø Could not discover event URLs.
Options:
1. Add anyway (manually configure later)
2. Skip this source
3. Provide custom event_url_pattern
Choose (1-3):
import shutil
from datetime import datetime
backup_path = config_path.with_suffix(f".yaml.{datetime.now():%Y%m%d%H%M%S}.backup")
shutil.copy2(config_path, backup_path)
with open(config_path, "w") as f:
yaml.dump(config, f, default_flow_style=False, sort_keys=False, allow_unicode=True)
from config.config_schema import AppConfig
try:
AppConfig.from_yaml(config_path)
except Exception as e:
shutil.copy2(backup_path, config_path)
print(f"ERROR: Invalid config. Restored backup. Error: {e}")
# STOP
| Type | Source | Name | Status |
|---|---|---|---|
| @localvenue | Local Venue | Added | |
| Web | greatnortherncatskills.com | Great Northern Catskills | Added (profiled: map) |
Config saved to ~/.config/local-media-tools/sources.yaml
Backup at sources.yaml.YYYYMMDDHHMMSS.backup
Run /newsletter-events:research to scrape these sources.