From auto-paper-collecter
Fetches latest papers and code repos from arXiv, Crossref, Semantic Scholar, GitHub, HuggingFace, Papers with Code, and RSS; deduplicates; then the assistant expands queries, filters for CS relevance, writes Chinese summaries, detects hot sub-fields, and outputs a Markdown+HTML digest (optionally emailed).
How this skill is triggered — by the user, by Claude, or both
Slash command
/auto-paper-collecter:auto-paper-collecterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A self-hosted research-literature radar that runs inside a coding agent.
A self-hosted research-literature radar that runs inside a coding agent. The Python scripts do only the deterministic work (API fetch, dedup, render, email). YOU — the assistant running this skill — do all the judgement work: query expansion, computer-science relevance filtering, Chinese summaries, and hot-topic synthesis. That means no AI API key is needed — whichever model is running this skill (Claude in Claude Code, GPT in Codex, …) is the LLM.
skill/
├── SKILL.md
├── scripts/ common.py · fetch.py · render.py · notify.py (stdlib only)
├── state/ config.json · (queries/candidates/curated/trends/seen .json)
└── digests/ YYYY-MM-DD.md + .html
Run scripts from scripts/: cd skill/scripts && python3 <script>.py
state/config.jsonkeywords: up to ~3 topic strings to track.domain: the field to constrain relevance to (default computer science).sources: toggle arXiv / Crossref / Semantic Scholar / GitHub / HuggingFace / PapersWithCode / RSS.lookback_days: how far back to fetch (dedup stops repeats anyway).max_per_source, rss_feeds.When the user asks to change keywords / sources / field, edit this file and confirm the change back to them.
Optional env vars (never stored in the repo):
SEMANTIC_SCHOLAR_KEY(lifts S2 rate limits),GITHUB_TOKEN(lifts GitHub limits),SMTP_*/EMAIL_TO(email), and push channels —TELEGRAM_BOT_TOKEN/TELEGRAM_CHAT_ID,SLACK_WEBHOOK_URL,WECHAT_WEBHOOK(企业微信群机器人) orSERVERCHAN_KEY(Server酱).
Read state/config.json. For each keyword, think of 2–3 associative
English search queries — synonyms, full forms, adjacent sub-topics — so recall
isn't limited to the literal term (e.g. C2Rust → ["C2Rust", "C-to-Rust translation", "migrating legacy C code to Rust"]). Write them to
state/queries.json as {"<keyword>": ["q1", "q2", ...], ...}.
cd skill/scripts && python3 fetch.py
Fetches every enabled source for those queries, drops anything already in
state/seen.json or older than lookback_days, and writes
state/candidates.json. If it reports 0 candidates, tell the user "暂无新文献"
and stop (nothing else to do).
Read state/candidates.json. For each item decide: is it (a) computer-science
and (b) genuinely on-topic for its topic keyword? Drop the rest (medical
"translation", finance "AI", random GitHub star-lists, etc.). For every kept
item write a concise Chinese summary and assemble state/curated.json — a
list of objects:
{"source","topic","title","url","venue","authors","published",
"tldr":"一句话核心 (<=60字)","method":"方法简述 (<=80字)",
"contributions":["核心贡献1","核心贡献2"]}
Keep papers first, GitHub repos last (they are a supplementary signal). If a
source gave a tldr already, you may build on it.
GitHub items are repos, not papers — don't over-summarize them. Use the repo description (its
abstract) as thetldrand leavemethod/contributionsempty.fetch.pyalready keeps only repos with ≥10 stars, ranked by stars, so they tend to be substantive (course / framework / awesome-list), not personal noise.
Cluster the kept items into a handful of coarse CS sub-fields (自然语言处理 /
计算机视觉 / 系统与编译 …; merge aggressively). Write state/trends.json:
{"top": [{"name","delta": <count>, "summary": "<=80字方向总结", "papers": ["title", ...]}, ... up to 3]}.
cd skill/scripts && python3 render.py
Writes digests/YYYY-MM-DD.md + .html from curated.json (+ trends.json)
and records everything shown into seen.json so it won't repeat.
cd skill/scripts && python3 notify.py # emails the HTML digest if SMTP_* env is set
Tell the user how many papers were kept, the top hot directions, and the digest path. Offer to open the HTML or adjust keywords.
pip install required.fetch.py already filters garbage future dates and de-duplicates across runs.npx claudepluginhub ovohao/auto-paper-collecterSearches arXiv and bioRxiv for recent papers on a given topic, scores and selects top papers, then produces structured bilingual (Chinese/English) summaries.
Discovers, filters, and deep-reads academic papers via Scholar Inbox API and NotebookLM. Use for browsing today's papers, getting recommendations, rating/collecting, and asking questions about papers.
Searches and monitors arXiv papers by topic, author, or category. Downloads PDFs and summarizes abstracts for research workflows.