From academicskills
Builds Wikipedia-style Obsidian vaults from academic PDFs, extracting concepts into linked notes with atomic sentences and citations. Expands existing networks with new papers.
npx claudepluginhub lnilya/effortless-academic-skills --plugin academicskillsThis skill uses the workspace's default tool permissions.
This skill transforms a collection of academic PDFs into a rich, interconnected Obsidian knowledge network — a Wikipedia-style web of concept notes, each populated with atomic sentences drawn directly from the literature and linked to one another.
Manages project-scoped literature reviews in Obsidian: organizes papers in Sources/Papers, synthesizes insights in Knowledge notes like Literature Overview and Research Gaps, drafts in Writing, maps to literature.canvas.
Generates browsable Obsidian vault from Gaia knowledge package: skeleton structure, rewritten claim/section pages as rich documents, cross-reference audits.
Materializes literature sources into local references/ directory using Paperpile for academic papers, Readwise for personal highlights, and Obsidian for web clips after brainstorm phase.
Share bugs, ideas, or general feedback.
This skill transforms a collection of academic PDFs into a rich, interconnected Obsidian knowledge network — a Wikipedia-style web of concept notes, each populated with atomic sentences drawn directly from the literature and linked to one another.
This skill has two modes. Determine which applies before doing anything else:
If the user mentions an existing vault or says they want to add papers to an existing network, use Expand mode (Phase 5). Otherwise, use Build mode.
Complete each phase fully before starting the next. Phases 1 and 3 can use parallel subagents for large paper sets (≥5 papers or ≥10 concepts respectively).
Before starting, ask the user:
If the user has the obsidian:obsidian-cli skill available, use it to write notes directly to their vault. Otherwise, write .md files to the output directory and tell the user where to find them.
pip install pdfplumber --break-system-packages 2>/dev/null
Use the bundled script for each PDF:
python <skill_dir>/scripts/extract_pdf_text.py "<pdf_path>" "<workspace>/_extracted/<paper_slug>.txt"
The script outputs plain text. Store all extracted text in <workspace>/_extracted/.
If a PDF produces very little text (< 200 words), it is likely a scanned/image-only PDF. Note this and skip it, informing the user at the end.
For each paper, read its extracted text and produce a JSON file at <workspace>/_paper_data/<paper_slug>.json.
If there are 5 or more papers, spawn one subagent per paper (or per batch of 3) to do this in parallel. Each subagent receives:
Each JSON file must follow this schema:
{
"title": "Full paper title as it appears in the document",
"authors": ["Smith J", "Jones K", "Brown L"],
"year": 2023,
"citation_key": "Smith 2023",
"abstract": "The abstract text...",
"concepts": [
"biodiversity",
"climate change",
"species richness",
"habitat fragmentation"
],
"findings": [
{
"concept": "biodiversity",
"statement": "Global biodiversity loss is accelerating at rates 100–1000 times above natural background levels.",
"related_concepts": ["climate change", "habitat fragmentation"],
"section": "results"
},
{
"concept": "climate change",
"statement": "Elevated atmospheric CO₂ concentrations correlate with a 2.3°C increase in mean annual temperature across temperate biomes.",
"related_concepts": ["biodiversity", "species distribution"],
"section": "discussion"
}
]
}
Guidelines for extraction:
Citation key format:
Smith 2023 (first author surname + space + year)Smith ND (no date)Concepts (list 5–20 per paper):
Findings — this is the most important part:
related_concepts: other concepts from the paper's concept list that appear meaningfully in the same claimsection: one of "results", "discussion", or "conclusions"Read all _paper_data/*.json files. Then:
"concepts" arrayWrite the concept list to <workspace>/concept_list.md:
# Concept List — Please Review Before Proceeding
I've extracted **N concepts** from **M papers**. Review this list:
- ✏️ Edit names (this becomes the note title)
- ❌ Delete lines you don't want
- ➕ Add new lines for concepts you feel are underrepresented
- The indented lines show which papers discuss each concept
When you're happy, let me know and I'll build the knowledge network.
---
## Core Concepts (discussed in 3+ papers)
- biodiversity (12 papers: Smith 2023, Jones 2022, ...)
- climate change (10 papers)
- species richness (8 papers)
## Supporting Concepts (1–2 papers)
- trophic cascade (2 papers)
- keystone predator (1 paper)
## Concepts I'm Unsure About (possible duplicates — your call)
- biodiversity loss ↔ biodiversity (may overlap)
- CO₂ emissions ↔ carbon dioxide (may overlap)
Present this file to the user and explicitly ask them to review it. Say something like:
"Here's the concept list I extracted across all your papers. Please review it — edit names, remove anything irrelevant, add any gaps you notice — and let me know when you're ready. I won't start building the notes until you confirm."
Do NOT proceed to Phase 3 without explicit user confirmation.
After the user confirms the concept list, parse it to get the final list of approved concepts (ignore any lines that don't look like concept entries).
For each concept, generate one Obsidian note. If there are 10 or more concepts, spawn subagents in batches of 10–15. Each subagent:
_paper_data/*.json files<vault_output>/Concepts/Every concept note follows this exact template:
---
tags: [concept, knowledge-network]
aliases: []
---
# <Concept Name>
<A 2–3 sentence academic definition of this concept, written in your own words. This sets the stage — no citations needed here. Be precise and accurate.>
## <Thematic Section 1>
- <Atomic sentence with [[related concept]] linked inline where it appears.> [[CitationKey]]
- <Atomic sentence linking [[another concept]] where meaningful.> [[CitationKey]]
## <Thematic Section 2>
- <Atomic sentence.> [[CitationKey]]
## Related Concepts
- [[concept-a]] — <5–10 words explaining how it relates to this concept>
- [[concept-b]] — <5–10 words explaining how it relates to this concept>
- [[concept-c]] — <5–10 words explaining how it relates to this concept>
These are Obsidian wikilinks — they must exactly match the concept names in the approved list (Obsidian is case-insensitive but be consistent):
[[biodiversity]], [[climate change]], [[habitat fragmentation]]
[[Smith 2023]] — always at the end of the atomic sentence, in its own brackets, after the period
Sentence text here. [[CitationKey]]An atomic sentence is a single, verifiable claim attributed to one source. It must:
[[Habitat fragmentation]] reduces [[species richness]] in old-growth temperate forests. [[Pereira 2022]]The Related Concepts section must:
- [[concept]] — <relation phrase>- [[poleward shift]] — most range shifts move poleward specifically- [[habitat fragmentation]] — reduces corridors needed for range expansion- [[species richness]] — declines as ranges contract under warming- [[poleward shift]] — a type of range shift ✗- [[habitat fragmentation]] — fragmentation of natural habitats ✗Group atomic sentences into 2–5 thematic sections per note. Choose headings that fit the concept naturally. Good options: Global Trends, Mechanisms, Measurement & Indices, Regional Patterns, Ecological Impacts, Policy Implications, Methodological Considerations, Interventions & Restoration, Drivers, Relationships to Other Factors.
Not every note needs the same sections — let the content guide you.
Every concept note should contain at least 3 atomic sentences from at least 2 different papers (when available). If a concept is only discussed in one paper, write what's there and note it with a callout:
> [!note] Limited sources
> This concept is currently documented from a single paper. Consider adding more literature.
Create one note per paper in <vault_output>/Papers/<CitationKey>.md:
---
tags: [paper]
title: "<Full paper title>"
authors: [<comma-separated author list>]
year: <year>
citation_key: <CitationKey>
---
# [[<CitationKey>.pdf|<Full paper title>]]
> [!abstract]
> <Abstract text>
## Key Concepts
[[concept-a]] | [[concept-b]] | [[concept-c]]
## Main Contributions
- <One atomic sentence summarising each major finding from this paper>
- <Another finding>
The "Key Concepts" line should list every concept from the approved concept list that this paper addresses.
Before declaring completion, verify:
[[CitationKey]] wikilinkIf any check fails, fix it before presenting the output.
When all notes are written, tell the user:
Example:
✅ Built 47 concept notes and 18 paper stubs across your Obsidian vault.
Most-connected concepts: biodiversity (34 sentences), climate change (28), species richness (21), habitat fragmentation (19), ecosystem function (17)
Skipped PDFs:
jones_2019_scan.pdf(image-only, no extractable text)Limited coverage: "keystone predator" (only 1 paper, 2 sentences)
For very large collections (30+ papers), the extraction phase can take significant time. Keep the user informed of progress:
Transparency about progress makes the wait feel productive.
Use this phase when the user already has a knowledge network built with this skill and wants to integrate one or more new papers into it.
Always issue this warning first, before doing anything else:
⚠️ Back up your vault before proceeding. Expanding the network will modify existing concept notes in place. If something goes wrong, you'll want to be able to restore them. Copy your
Concepts/andPapers/folders somewhere safe, then let me know when you're ready.
Wait for the user to confirm they've made a backup before proceeding.
Then ask:
Concepts/ and Papers/ folders)Run Phase 1 (PDF extraction) on the new paper(s) only, writing to <workspace>/_extracted/ and <workspace>/_paper_data/ as usual.
Read the existing vault: scan all .md files in Concepts/ and collect their note titles as the existing concept list.
From the new paper's extracted data, collect its concepts. Then classify each one:
Present this classification to the user before making any changes:
Here's what I found in the new paper(s):
**Concepts already in your vault** (will be expanded with new sentences):
- biodiversity (matches existing note)
- species richness (matches existing note)
- climate change (matches existing note)
**New concepts not yet in your vault** (new notes will be created):
- thermal tolerance
- microhabitat heterogeneity
**Possible overlaps — your call:**
- "range contraction" ↔ existing "range shift" — treat as same or separate?
Shall I proceed with this plan, or would you like to adjust anything?
Do not modify any files until the user confirms. They may want to merge, rename, or skip certain concepts.
For each existing concept that the new paper addresses, open the concept note from Concepts/<concept>.md and add new atomic sentences from the paper.
Rules for updating:
For any concepts classified as new, generate a full concept note following the same template and rules as Phase 3. Cross-link these new notes to existing concepts where relevant — both inline in atomic sentences and in the Related Concepts section.
Also check the reverse: for any existing concept that is meaningfully related to a new concept, open that existing note and add the new concept to its Related Concepts section.
Create a paper stub for the new paper in Papers/<CitationKey>.md following the Phase 4 template.
When all edits are complete, give the user a clear summary:
✅ Expansion complete.
**New paper added:** Smith 2024 — "Full title here"
**Existing concept notes updated (N):**
- [[biodiversity]] — 3 new sentences added (Results, Drivers sections)
- [[species richness]] — 2 new sentences added (Regional Patterns section)
- [[climate change]] — 1 new sentence added; also added new related concept [[thermal tolerance]]
**New concept notes created (N):**
- [[thermal tolerance]] — 4 sentences from Smith 2024
- [[microhabitat heterogeneity]] — 3 sentences from Smith 2024
**Anything worth noting:**
Use that last section to flag anything that stands out — for example:
Keep this section brief and specific — only mention things that genuinely jump out, not generic observations.