Skill

wiki-source-document-ingest

Ingests text-based documents like certificates, contracts, or employment letters into a wiki's sources layer as structured markdown files and integrates facts into wiki pages.

Markdown

Obsidian

documentation

npx claudepluginhub thedavidweng/skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Take a raw document (certificate, contract, income proof, employment letter) provided as text and systematically integrate it into the wiki as both a source document and structured wiki data.

Supporting Assets

agents/openai.yaml

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars7

Forks0

Last CommitApr 23, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Source Document Ingestion

Take a raw document (certificate, contract, income proof, employment letter) provided as text and systematically integrate it into the wiki as both a source document and structured wiki data.

When to Use

User provides a text document to be preserved in the wiki
Document contains verifiable facts about a person, place, or organization
Facts should become part of the canonical wiki knowledge base
Document is valuable enough to archive in sources/ (not just referenced externally)

When NOT to Use

Document lives externally and should only be linked by path (use vault-cross-reference)
Document is trivial or low-value (use sources/docs/OCR-Only/ or discard)
OCR extraction needed from scanned images (use identity-ocr or obsidian-ocr-cleanup)

Before Starting

Read the target document fully
Read the existing wiki page for the entity (if it exists) with read_file(path=...) using skill_view
Check sources/docs/readme.md to understand the Archive vs OCR-Only policy
Scan for existing similar source documents to follow naming conventions

Step-by-Step Process

Step 1: Decide Source Location

Consult sources/[category]/ structure. Common locations:

Document Type	Location	Rationale
Identity documents (ID, passport, visa)	`sources/identity/`	Already existing pattern
Employment/income certificates	`sources/docs/Archive/`	High-value official docs, keep forever
Academic transcripts	`sources/identity/` or `sources/docs/Archive/`	Identity-adjacent
Contracts/agreements	`sources/docs/Archive/`	Legal documents
Low-value OCR docs	`sources/docs/OCR-Only/`	Searchable but not worth keeping

Naming: [slug]-[descriptor].md e.g. zhang-san-income-cert.md

Step 2: Create Source Markdown Document

Create the source document with proper frontmatter:

---
layer: source
kind: document
created: YYYY-MM-DD  # Date on document, not today
updated: YYYY-MM-DD  # Today's date
source_type: manual    # or 'ocr', 'import'
source_uri: "Document title - Person name"
tags: [category1, category2]
---

Body:

Original document text (preserve exactly)
Metadata section with wikilinks to related wiki pages
Structured extracted facts (bullet points)

# Document Title

Full original text...

## Metadata

- **Person**: [[person-slug|Person Name]]
- **Organization**: [[org-slug|Org Name]]
- **Document type**: Type
- **Key fact**: Value (with units/dates)
- **Date issued**: YYYY-MM-DD

Step 3: Update Person/Entity Page

3.1 Add to `source_notes` frontmatter

Add the new source path to the source_notes array (create if missing):

source_notes:
  - "sources/previous/doc.md"
  - "sources/docs/Archive/new-doc.md"

3.2 Add structured data section

Add a new section (## Employment, ## Income, ## Education, etc.) after existing sections. Format:

## Section Name

Brief narrative sentence.

- **Field name**: Value with [[wikilinks]] (bold for fields)
- **Another field**: Value
- **Source**: Link to source document inline

3.3 Clean up content

Check for:

Duplicate information already present elsewhere on the page
Orphaned leftover text from previous edits
Inconsistent formatting vs existing sections

Step 4: Validate

Run quick checks:

from hermes_tools import search_files
assert search_files(pattern="new-doc.md", path="sources/").get('total_count',0) > 0
assert search_files(pattern="new-doc.md", path="wiki/people/").get('total_count',0) == 0

Verify the person page renders cleanly with all wikilinks resolved.

Pitfalls Learned

Source vs Reference: Ingesting (copying into sources/) is NOT the same as referencing (external path). Ingest when user provides document text and expects it archived in wiki.
Archive vs OCR-Only: Use Archive/ for documents that must be provable/re-checkable later. Use OCR-Only/ for low-value content where only the text matters.
Don't break existing content: When inserting new sections, verify you're not truncating or orphanizing existing sentences. Use read_file with context first.
Duplicate data: Avoid repeating bio details already in page intro. Employment section should be concise structured data, not re-telling the whole story.
Date formatting: Use YYYY-MM-DD for dates in metadata. Narrative dates can be Chinese YYYY年M月D日.

wiki-source-document-ingest

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

wiki-source-document-ingest

Tool Access

Preview

Supporting Assets

SKILL.md

Source Document Ingestion

When to Use

When NOT to Use

Before Starting

Step-by-Step Process

Step 1: Decide Source Location

Step 2: Create Source Markdown Document

Step 3: Update Person/Entity Page

3.1 Add to source_notes frontmatter

3.2 Add structured data section

3.3 Clean up content

Step 4: Validate

Pitfalls Learned

Similar Skills

Help us improve

Source Document Ingestion

When to Use

When NOT to Use

Before Starting

Step-by-Step Process

Step 1: Decide Source Location

Step 2: Create Source Markdown Document

Step 3: Update Person/Entity Page

3.1 Add to source_notes frontmatter

3.2 Add structured data section

3.3 Clean up content

Step 4: Validate

Pitfalls Learned

3.1 Add to `source_notes` frontmatter

3.1 Add to `source_notes` frontmatter