From pdf-research
Indexes PDF documents with LightRAG, extracts text via PyMuPDF, builds embeddings and knowledge graphs, enables hybrid semantic searches with citations for document Q&A.
npx claudepluginhub hongsw/plugin-for-claude-research --plugin pdf-researchThis skill uses the workspace's default tool permissions.
LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.
Ingests PDF datasheets or reference manuals into the embedded docs search index via ingest_docs tool. Reports chunks ingested and tables found.
Searches indexed local document folders using natural language queries on Markdown/text files. Activates for file content questions, 'find document about...', or indexing requests.
Indexes local directories for BM25, vector, or hybrid search on documents, files, notes, and knowledge bases. Delivers AI answers with citations and local web UI.
Share bugs, ideas, or general feedback.
LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.
When user invokes /pdf-research, Claude should:
python pdf_research.py status to see current configuration# Always run from scripts directory
cd ~/.claude/skills/pdf-research/scripts
# Check current status
python pdf_research.py status
# Index PDFs (when user provides a directory)
python pdf_research.py index /path/to/pdfs
# Search (single query)
python pdf_research.py search "user's question" --mode hybrid
# Interactive search session
python pdf_research.py search
Before running commands, ensure:
# Activate Python environment with dependencies
source /path/to/venv/bin/activate # or use system Python with deps installed
# Ensure OpenAI API key is set
export OPENAI_API_KEY=sk-...
index command)search command)status command)config command)python pdf_research.py index <path>python pdf_research.py statuspython pdf_research.py search "<question>"python pdf_research.py config --pdf-dir <path> --storage-dir <path># Configure defaults (run once)
python pdf_research.py config --pdf-dir /path/to/pdfs --storage-dir ./rag_storage
# Index PDFs
python pdf_research.py index [pdf_dir] [--storage <path>]
# Search (single query)
python pdf_research.py search "query" [--mode hybrid|local|global|naive]
# Search (interactive)
python pdf_research.py search
# Check status
python pdf_research.py status
| Mode | Best For | Description |
|---|---|---|
hybrid | General queries | Combined local + global (default) |
local | Specific facts | Names, numbers, definitions |
global | Summaries | Themes, trends, overviews |
naive | Exact terms | Simple keyword matching |
After indexing, rag_storage/ contains:
| File | Description |
|---|---|
config.json | User configuration |
kv_store_full_docs.json | Full document text |
kv_store_text_chunks.json | Semantic chunks |
kv_store_full_entities.json | Extracted entities |
vdb_*.json | Vector embeddings |
graph_*.graphml | Knowledge graph |
User: /pdf-research ~/Documents/papers 인덱싱해줘
Claude: [Runs indexing]
Indexing complete!
- Documents: 5
- Chunks: 247
- Storage: 32.5 MB
User: AI 인재 양성 전략에 대해 알려줘
Claude: [Runs search]
Based on the indexed documents...
[Detailed response with references]
export OPENAI_API_KEY=sk-your-key
python pdf_research.py index /path/to/pdfs
pip install lightrag-hku[api] pymupdf python-dotenv
lightrag-hku[api]>=1.4.9pymupdf>=1.24.0python-dotenv>=1.0.0