Skill

pdf-processing

From nx

Use when PDF documents need to be indexed into nx store for semantic search

npx claudepluginhub hellblazer/nexus --plugin nx

Tool Access

This skill uses the workspace's default tool permissions.

Preview

- When PDFs need to be indexed for semantic search

SKILL.md

Similar Skills

pdf-research

Indexes PDF documents with LightRAG, extracts text via PyMuPDF, builds embeddings and knowledge graphs, enables hybrid semantic searches with citations for document Q&A.

8 files

pdf-research

Ingest Docs

Ingests PDF datasheets or reference manuals into the embedded docs search index via ingest_docs tool. Reports chunks ingested and tables found.

bitwise-embedded-docs

librarian-index

Indexes registered code libraries from libraries.yaml by chunking files, embedding with FastEmbed, and storing in local Qdrant vector DB for semantic search.

1 file

grimoire

Stats

Parent Repo Stars1

Parent Repo Forks0

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

PDF Processing Skill

When This Skill Activates

When PDFs need to be indexed for semantic search
When importing technical documentation or research papers
When adding documents to the knowledge base
When user wants to query PDF content semantically

Quick Path (Single PDF — No Agent Needed)

For a single PDF, run directly without spawning an agent:

# Index a PDF into T3 (docs__ collection)
nx index pdf /path/to/file.pdf --corpus {corpus-name} --monitor

# Index into knowledge__ collection (for reference material)
nx index pdf /path/to/file.pdf --collection knowledge__{name} --monitor

# Dry-run preview (extract and embed locally without storing)
nx index pdf /path/to/file.pdf --corpus {corpus-name} --dry-run

# Force re-index (bypass staleness check)
nx index pdf /path/to/file.pdf --corpus {corpus-name} --force --monitor

# Verify indexing — use search tool: query="representative query", corpus="docs__{corpus-name}", limit=3

Corpus naming: Use author-year-short-title pattern. The --corpus flag auto-prepends docs__ — do NOT include the prefix.

Agent Invocation (Batch/Complex Scenarios)

Delegates to the pdf-chromadb-processor agent (haiku) for:

Multiple PDFs needing batch processing
PDFs from URLs (download + index)
Complex corpus organization decisions

## Relay: pdf-chromadb-processor

**Task**: [what needs to be done]
**Bead**: [ID] or 'none'

### Input Artifacts
- Files: [PDF paths or URLs]

### Deliverable
Indexed PDF content in T3 via `nx index pdf`, with chunk counts and searchability verified

### Quality Criteria
- [ ] All PDFs indexed via `nx index pdf`
- [ ] Content searchable via search tool
- [ ] Processing report with chunk counts

For full relay structure and optional fields, see RELAY_TEMPLATE.md.

Success Criteria

All PDFs indexed via nx index pdf (not manual store_put tool)
Corpus names follow author-year-short-title convention
Sample queries return relevant results via search tool
No sandbox permission issues (all extraction runs inside nx index pdf native pipeline)

Agent-Specific PRODUCE

Outputs generated by the pdf-chromadb-processor agent:

T3 knowledge: Indexed PDF content via nx index pdf pipeline (Docling extraction, context-safe chunking, Voyage embeddings, atomic storage)
T2 memory: Indexing status log via memory_put tool: content="content", project="{project}", title="pdf-index-log.md", ttl="30d"
T1 scratch: Working notes during processing via scratch tool: action="put", content="Processing: {filename} - {N} chunks indexed", tags="pdf-processing"

Session Scratch (T1): Agent uses scratch tool for ephemeral working notes during the session. Flagged items auto-promote to T2 at session end.