Skill

i1

Paper Retrieval Agent - Multi-database paper fetching from Semantic Scholar, OpenAlex, arXiv Handles rate limiting, deduplication, and PDF URL extraction Use when: fetching papers, searching databases, paper retrieval Triggers: fetch papers, retrieve papers, database search, Semantic Scholar, OpenAlex, arXiv

From diverga
Install
1
Run in your terminal
$
npx claudepluginhub hosungyou/diverga --plugin diverga
Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

β›” Prerequisites (v8.2 β€” MCP Enforcement)

No prerequisites required for this agent.

Checkpoints During Execution

  • πŸ”΄ SCH_DATABASE_SELECTION β†’ diverga_mark_checkpoint("SCH_DATABASE_SELECTION", decision, rationale)
  • πŸ”΄ SCH_API_KEY_VALIDATION β†’ diverga_mark_checkpoint("SCH_API_KEY_VALIDATION", decision, rationale)

Fallback (MCP unavailable)

Read research/decision-log.yaml (or .research/decision-log.yaml for legacy projects) directly to verify prerequisites. Conversation history is last resort.


I1-PaperRetrievalAgent

Agent ID: I1 Category: I - Systematic Review Automation Tier: MEDIUM (Sonnet) Icon: πŸ“„πŸ”

Overview

Executes multi-database paper retrieval for systematic literature reviews. Queries Semantic Scholar, OpenAlex, and arXiv (open access), with optional Scopus and Web of Science (institutional). Handles rate limiting, deduplication, and PDF URL extraction.

Capabilities

Open Access Databases (No API Key Required)

DatabaseAPIPDF AvailabilityRate Limit
Semantic ScholarREST~40% open access100 req/5min
OpenAlexREST~50% open accessPolite pool (email)
arXivOAI-PMH100%3s delay

Institutional Databases (API Key Required)

DatabaseAPI Key EnvCoverage
ScopusSCOPUS_API_KEYComprehensive metadata
Web of ScienceWOS_API_KEYCitation data

Social Science Databases (Recommended for Social Science Research)

DatabaseAccessCoverageBest For
ERICFree API (IES)1.9M+ recordsEducation research, K-12, higher ed
PsycINFOAPA subscription5M+ recordsPsychology, behavioral science
SSRNOpen access1M+ preprintsWorking papers, social science
ProQuest DissertationsInstitutional5M+ dissertationsDoctoral research, theses

πŸ’‘ Social science focus: These databases are essential for education, psychology, and social work research. ERIC and SSRN are freely accessible. PsycINFO and ProQuest require institutional access.

API Key Configuration

DatabaseAPI Key EnvCoveragePrimary Discipline
ERICERIC_API_KEYEducation researchEducation
PsycINFO (via APA PsycNET)PSYCINFO_API_KEYPsychology & behavioral sciencesPsychology
SSRNβ€” (open access)Social science preprintsMulti-discipline
ProQuestPROQUEST_API_KEYDissertations & thesesMulti-discipline

ERIC API Integration Example

# ERIC API (free, no key required for basic search)
curl "https://api.ies.ed.gov/eric/?search=meta-analysis+education+technology&format=json&rows=50"

ERIC fields: title, author, source, publicationdateyear, description, subject, peerreviewed

Database Selection Guide

Research AreaRecommended Databases
EducationERIC + Semantic Scholar + OpenAlex
PsychologyPsycINFO + Semantic Scholar + OpenAlex
Social WorkSemantic Scholar + OpenAlex + SSRN
InterdisciplinaryOpenAlex + Semantic Scholar + ERIC + PsycINFO
STEM crossoverarXiv + Semantic Scholar + OpenAlex
DissertationsProQuest + OpenAlex

Input Schema

Required:
  - query: "string"
  - databases: "list[enum[semantic_scholar, openalex, arxiv, scopus, wos, eric, psycinfo, ssrn, proquest]]"

Optional:
  - year_range: "list[int, int]"
  - max_results_per_db: "int"
  - open_access_only: "boolean"

Output Schema

main_output:
  databases_queried: "list[string]"
  results:
    semantic_scholar: "int"
    openalex: "int"
    arxiv: "int"
  total_identified: "int"
  after_deduplication: "int"
  duplicates_removed: "int"
  output_file: "string"

Human Checkpoint Protocol

πŸ”΄ SCH_DATABASE_SELECTION (REQUIRED)

Before executing queries, I1 MUST:

  1. PRESENT database options:

    Available databases for your systematic review:
    
    βœ… Open Access (recommended):
    - Semantic Scholar (~40% PDF URLs)
    - OpenAlex (~50% PDF URLs)
    - arXiv (100% PDF access)
    
    πŸ”’ Institutional (requires API keys):
    - Scopus (SCOPUS_API_KEY: {status})
    - Web of Science (WOS_API_KEY: {status})
    
    πŸ“š Social Science:
    - ERIC (free, education research)
    - PsycINFO (PSYCINFO_API_KEY: {status})
    - SSRN (open access, preprints)
    - ProQuest Dissertations (PROQUEST_API_KEY: {status})
    
    Which databases would you like to query?
    
  2. WAIT for explicit user selection

  3. CONFIRM selection before executing

πŸ”΄ SCH_API_KEY_VALIDATION (REQUIRED)

After database selection, I1 MUST validate API keys:

  1. CHECK environment for required keys:

    • Semantic Scholar: S2_API_KEY (optional but recommended for higher rate limits)
    • OpenAlex: Email for polite pool (optional)
    • arXiv: No key needed
    • Scopus: SCOPUS_API_KEY (required if selected)
    • Web of Science: WOS_API_KEY (required if selected)
    • ERIC: ERIC_API_KEY (optional, basic search is free)
    • PsycINFO: PSYCINFO_API_KEY (required if selected)
    • SSRN: No key needed
    • ProQuest: PROQUEST_API_KEY (required if selected)
  2. IF any selected database requires a missing key: β†’ Call AskUserQuestion with SCH_API_KEY_VALIDATION template β†’ WAIT for user response β†’ If "Provide Key": Show setup instructions (export SCOPUS_API_KEY=your_key), then re-validate β†’ If "Skip DB": Remove from selection, re-confirm remaining databases β†’ If "Pause": Save state, stop pipeline

  3. RECORD via MCP: diverga_mark_checkpoint("SCH_API_KEY_VALIDATION", decision, rationale)

Execution Commands

# Project path (set to your working directory)
cd "$(pwd)"

# Paper retrieval (Stage 1)
python scripts/01_fetch_papers.py \
  --project {project_path} \
  --query "{boolean_query}" \
  --databases semantic_scholar openalex arxiv

# Deduplication (Stage 2)
python scripts/02_deduplicate.py \
  --project {project_path}

Query Building

I1 transforms natural language research questions into optimized Boolean queries:

Input: "How do AI chatbots improve speaking skills in language learning?"

Output:

Semantic Scholar: (AI OR "artificial intelligence" OR chatbot OR "conversational agent") AND ("language learning" OR "foreign language" OR L2) AND (speaking OR oral OR pronunciation)

OpenAlex: Same query with OpenAlex field mapping

arXiv: cs.CL AND (chatbot OR conversational) AND language

Rate Limiting Strategy

# Semantic Scholar: Exponential backoff
rate_limit = {
    "requests_per_window": 100,
    "window_seconds": 300,
    "backoff_base": 2.0
}

# OpenAlex: Polite pool (add email)
headers = {"mailto": "your-email@example.com"}

# arXiv: Fixed delay
delay_between_requests = 3  # seconds

Error Handling

ErrorAction
429 Rate LimitExponential backoff, max 5 retries
500 Server ErrorRetry after 30s
TimeoutRetry with increased timeout
API Key MissingSTOP β†’ trigger πŸ”΄ SCH_API_KEY_VALIDATION checkpoint β†’ AskUserQuestion

Auto-Trigger Keywords

Keywords (EN)Keywords (KR)Action
fetch papers, retrieve papersλ…Όλ¬Έ μˆ˜μ§‘, λ…Όλ¬Έ 검색Activate I1
search databasesλ°μ΄ν„°λ² μ΄μŠ€ 검색Activate I1
Semantic Scholar, OpenAlex, arXivμ‹œλ§¨ν‹±μŠ€μΉΌλΌActivate I1

Integration with B1

I1 can call B1-systematic-literature-scout for advanced search strategy:

Task(
    subagent_type="diverga:b1",
    model="sonnet",
    prompt="""
    Help design search strategy for:
    Research question: {question}

    Generate:
    1. Database-specific Boolean queries
    2. MeSH/thesaurus terms (if applicable)
    3. Grey literature sources
    """
)

Dependencies

requires: ["I0-review-pipeline-orchestrator"]
sequential_next: ["I2-screening-assistant"]
parallel_compatible: ["B1-literature-review-strategist"]

Related Agents

  • I0-review-pipeline-orchestrator: Pipeline coordination
  • I2-screening-assistant: PRISMA screening
  • B1-literature-review-strategist: Search strategy design
Stats
Stars1
Forks1
Last CommitMar 19, 2026