Skill

obsidian-mineru

Extracts and ingests document content (PDF, Office) into Obsidian vault using MinerU. Supports direct extraction or Zotero-linked pipeline for literature notes.

documentation

automation

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/obsidian-vault:obsidian-mineru

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this skill when the request is about full-document parsing and source-note ingestion rather than simple literature metadata import.

Supporting Files

agents/openai.yaml

SKILL.md

116 lines · ~1.6k tokens

Stats

LanguagePython

Stars13

MaintenanceExcellent

Last CommitMay 30, 2026

Actions

View Source View Plugin View on GitHub View README

Obsidian MinerU

Use this skill when the request is about full-document parsing and source-note ingestion rather than simple literature metadata import. 处理全文提取、PDF 解析和来源笔记导入时优先使用。

Choose the Path

If the request is Zotero-linked literature ingestion, prefer obsidian_pipeline_ingest_item(parse_with_mineru=true) or obsidian_pipeline_parse_with_mineru.
If MinerU output already exists under the pipeline layout, use obsidian_pipeline_rename_mineru_images to normalize images and regenerate images-index.md.
For direct, non-Zotero extraction, run the local MinerU CLI and then import or link the generated Markdown with obsidian_write_file.

Direct Extraction Workflow

Check obsidian_pipeline_doctor before pipeline extraction, or mineru-open-api --version before direct CLI extraction.
Parse copied Zotero PDFs into attachments/mineru/<zoteroKey>/paper.md.
Rename extracted images to English semantic filenames such as fig-01-process-flow-diagram.png.
Generate attachments/mineru/<zoteroKey>/images-index.md.

Zotero-Linked Workflow

Import the literature note with obsidian_pipeline_ingest_item.
Pass parse_with_mineru=true, or later call obsidian_pipeline_parse_with_mineru(zotero_key=...).
Expect the literature note to link to the copied PDF, Zotero PDF URI, MinerU Markdown, and image index while preserving user reading work.

Output Expectations

MinerU assets are machine-generated and may be overwritten on re-parse.
Literature notes are stable user workspaces; preserve custom YAML, Reading Notes, and AI Summary. To generate an empty ## AI Summary after parsing, pass write_ai_summary=true to obsidian_pipeline_parse_with_mineru.
The plugin does not generate AI summaries, wiki pages, graphs, or reviews from MinerU output.

Troubleshooting

If task creation succeeds but Markdown download fails, check network routes for mineru.net, mineru.oss-cn-shanghai.aliyuncs.com, cdn-mineru.openxlab.org.cn, and *.openxlab.org.cn.
Treat this skill as optional integration logic. If MinerU is unavailable, fall back to importing existing Markdown or a vault PDF attachment instead of blocking the whole workflow.

Direct Extraction (without Zotero)

To parse a PDF directly with MinerU without going through the Zotero pipeline:

Confirm MinerU is available: Bash → mineru-open-api --version (or check MINERU_CLI_COMMAND env var).
Run MinerU on a PDF path:

mineru-open-api --files "C:\path\to\paper.pdf" --output-dir "C:\vault\mineru-output" --method auto

The output directory will contain:
- paper.md — extracted Markdown
- images/ — extracted figures
Use obsidian_read_file on the generated .md to review.
Use obsidian_write_file to copy the content into the vault's literature folder.

Extract and Ingest

To extract a PDF and immediately create a literature note:

Run MinerU as above.
Read the generated Markdown.
Create the literature note with obsidian_write_file, including frontmatter and links to extracted images.
Use obsidian_pipeline_rename_mineru_images (MCP) to rename extracted images to semantic English slugs.

Batch Folder Extraction

To process all PDFs in a folder:

Get-ChildItem "C:\zotero-exports" -Filter "*.pdf" | ForEach-Object {
    $out = "C:\vault\mineru-batch\$($_.BaseName)"
    mineru-open-api --files $_.FullName --output-dir $out --method auto
}

Then ingest each output folder individually following the "Extract and Ingest" workflow above.

Zotero PDF Text Extraction

To extract text from a Zotero-managed PDF without full MinerU parsing:

Get the PDF path from obsidian_zotero_list_pdf_attachments (MCP).
Use Bash + pypdf:

import pypdf, sys
reader = pypdf.PdfReader(sys.argv[1])
text = "\n".join(page.extract_text() or "" for page in reader.pages)
print(text[:5000])

Run: python extract_text.py "C:\Zotero\storage\KEY\paper.pdf"

Figure & Table Analysis

Use when the user asks a specific question about a figure, chart, or table in a parsed paper.

Read attachments/mineru/<zoteroKey>/images-index.md with obsidian_read_file. The index lists every figure with its semantic slug filename and the original caption context, e.g.:
```
- fig-01-process-flow-diagram.png (was: image-a.png)
  Caption context: "Figure 1 Process flow diagram showing…"
```
Identify which figure matches the user's question from the slug name and caption.
Run obsidian_search using the slug filename (e.g. fig-01-process-flow-diagram) as query to locate the surrounding paragraph in paper.md. The search snippet will include the figure's Markdown image tag and adjacent text.
Read that section of paper.md with obsidian_read_file if the search snippet is insufficient.
Answer using the extracted caption and surrounding text only. Do not attempt to decode image binary data — the image files are not readable as text.

Typical budget: 2–3 tool calls (read index → search → answer, or read index → read section → answer).

Eval Scenarios

Trigger: "Parse Zotero item ITEM1 with MinerU and summarize it." Expected: use obsidian_pipeline_parse_with_mineru(write_ai_summary=true). Must preserve literature-note YAML, ## Reading Notes, and non-empty ## AI Summary.
Trigger: "Rename the figures for this parsed paper." Expected: call obsidian_pipeline_rename_mineru_images, then report renamed files and cleanup candidates. Must treat MinerU assets as machine-generated and avoid editing the literature note except tool-managed links.
Trigger: "What does Figure 2 show?" Expected: read images-index.md, locate the semantic slug/caption, then search/read nearby paper.md context. Must answer from extracted text rather than image bytes.

obsidian-mineru

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

obsidian-mineru

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Obsidian MinerU

Choose the Path

Direct Extraction Workflow

Zotero-Linked Workflow

Output Expectations

Troubleshooting

Direct Extraction (without Zotero)

Extract and Ingest

Batch Folder Extraction

Zotero PDF Text Extraction

Figure & Table Analysis

Eval Scenarios

Similar Skills

Obsidian MinerU

Choose the Path

Direct Extraction Workflow

Zotero-Linked Workflow

Output Expectations

Troubleshooting

Direct Extraction (without Zotero)

Extract and Ingest

Batch Folder Extraction

Zotero PDF Text Extraction

Figure & Table Analysis

Eval Scenarios

Similar Skills