Help us improve
Share bugs, ideas, or general feedback.
From thinking-frameworks-skills
Fetches and keyword-filters preprints from bioRxiv/medRxiv within a date window. Handles pagination, deduplication, and normalization.
npx claudepluginhub lyndonkl/claude --plugin thinking-frameworks-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-frameworks-skills:fetch-preprint-recentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Fetch preprints from bioRxiv or medRxiv for a date window, normalize the records, and keyword-filter them.
Queries bioRxiv/medRxiv preprints via REST API without authentication. Searches by DOI, category, or date range; returns metadata (title, abstract, authors, DOI) and PDFs.
Fetches arXiv papers by date window and keywords, optionally restricted to categories (cs.LG, stat.ML, etc). Normalizes records for deduplication with other preprint sources.
Searches and retrieves preprints from bioRxiv by keywords, authors, date ranges, or categories. Supports metadata extraction and PDF downloads for literature reviews.
Share bugs, ideas, or general feedback.
Fetch preprints from bioRxiv or medRxiv for a date window, normalize the records, and keyword-filter them.
- [ ] Step 1: Validate inputs (server, from, to, keywords)
- [ ] Step 2: Page through the details endpoint until messages.cursor exhausted
- [ ] Step 3: Normalize each record to the canonical shape
- [ ] Step 4: Dedupe within the window (keep latest version per DOI)
- [ ] Step 5: Keyword-filter on lowercased title + abstract
- [ ] Step 6: Return matched records + a summary (total fetched, total matched, pages fetched)
Step 1 — Validate inputs
Required:
server: one of biorxiv or medrxiv (the API uses these literal strings)from: YYYY-MM-DD, inclusiveto: YYYY-MM-DD, inclusive, must be ≥ fromkeywords: list of strings; may include multi-word phrases; case-insensitive matchingReject if window > 31 days (the API supports it but you almost never want to dump a month of preprints in one call without a stronger filter — flag and confirm).
Step 2 — Page through the endpoint
The endpoint is:
https://api.biorxiv.org/details/{server}/{from}/{to}/{cursor}
cursor=0.{
"messages": [{"status": "ok", "interval": "2026-05-04/2026-05-10", "cursor": "0", "count": 100, "total": 327}],
"collection": [ { record }, { record }, ... ]
}
collection, increment cursor by 100 (the page size is fixed) and refetch until cursor + count >= total.Use WebFetch with the URL. If WebFetch returns malformed JSON or a 5xx, retry once with a 2-second backoff; on second failure, return partial results with a fetch_errors field listing the failed cursors.
Step 3 — Normalize each record
The API returns fields like doi, title, authors, author_corresponding, author_corresponding_institution, date, version, type, license, category, jatsxml, abstract, published, server. Reduce to:
{
"id": "10.1101/2026.05.07.123456", // doi
"title": "...",
"authors": ["Smith J", "Doe A", ...], // split the API's `authors` string on `;`
"abstract": "...",
"date": "2026-05-07",
"server": "biorxiv", // or "medrxiv"
"version": 1,
"category": "neuroscience", // bioRxiv subject area
"url": "https://www.biorxiv.org/content/10.1101/2026.05.07.123456v1",
"published_doi": null // populated if the preprint has been published; from `published` field
}
URL pattern: https://www.{server}.org/content/{doi}v{version} (no https://doi.org/ redirect — direct to the preprint server keeps the abstract page accessible).
Step 4 — Dedupe within the window
The same DOI can appear with multiple version values if the authors revised mid-window. Keep the highest version per DOI. Drop the rest.
Step 5 — Keyword-filter
For each kept record, check whether any keyword (or phrase) appears in lowercase(title + " " + abstract). Match logic:
"protein language model" → must appear as a contiguous substring."crispr" → must appear with word boundaries (don't match "crisper").Track which keyword(s) matched per paper — downstream paper-relevance-filter will use that signal.
Step 6 — Return
Return a payload like:
{
"server": "biorxiv",
"window": "2026-05-04/2026-05-10",
"fetched_total": 327,
"matched_total": 14,
"pages_fetched": 4,
"fetch_errors": [],
"records": [ {normalized record + "matched_keywords": [...]} , ... ]
}
Cache the raw API JSON (pre-normalization) to the agent's .cache/ directory under {YYYY-WW}-{server}.json so a re-run can skip the network if the user wants to re-synthesize without re-fetching.
Pattern A — One server, one window: standard call. Use this in a weekly digest.
Pattern B — Multi-week catch-up: call once per week, never one giant 28-day window. The cursor pagination is fine but the keyword filter is more honest at weekly granularity (matches the way papers are released and discussed).
Pattern C — Preprint-only follow-up of a known paper: if you already have a DOI, do not use this skill. Use a direct WebFetch on https://api.biorxiv.org/details/{server}/{doi} instead.
from/to dates before the call.10.1101/... string.| Field | Source | Notes |
|---|---|---|
| Endpoint | api.biorxiv.org/details/{server}/{from}/{to}/{cursor} | Same host serves both bioRxiv and medRxiv; only {server} varies |
| Auth | None | Public API. Be polite — don't hammer. |
| Page size | 100, fixed | Cursor is the offset into the window's results |
| Window cap | 31 days (soft); 7 days is the typical weekly call | Wider windows = thousands of records before keyword filter |
| Rate limit | Not formally documented; ~1 req/sec is safe | Backoff on 5xx |
| URL pattern | https://www.{server}.org/content/{doi}v{version} | Linkable to abstract page |
| Server values | biorxiv, medrxiv | Case-sensitive in path |