Skill

ingest

Ingests local files, git commit history, or crawled websites into gnosis-mcp knowledge base. Supports embedding, pruning, wiping, incremental updates, and chunk sizing.

Git

Markdown

Bash

cli-tools

git-workflow

npx claudepluginhub nicholasglazer/gnosis-mcp

Tool Access

This skill uses the workspace's default tool permissions.

Preview

One skill that covers every way to get content into gnosis-mcp. Routes

SKILL.md

Similar Skills

setup

Guides first-time Gnosis MCP setup: pip installs Python package with extras, initializes SQLite/PostgreSQL database, ingests docs folder for hybrid search. Use to enable agent doc querying.

gnosis

kb-harvest

Harvests knowledge from external sources like sibling repos, local directories, files, or web URLs into the project's KB system with provenance tracking.

ai-knowledge

learn

Ingests an external data source into the Second Brain. Fetches content from Confluence, Google Docs, GitHub repositories, remote URLs, or any local file format supported by docling (DOCX, PPTX, XLSX, PDF, HTML, EPUB, images, Markdown, CSV, and more), converts non-markdown formats to markdown via docling, runs the /graphify extraction pipeline, and delegates entity persistence (including the graphify-output merge) to /bedrock:preserve. Use when: "bedrock learn", "bedrock-learn", "learn", "ingest source", "import document", "/bedrock:learn", or when the user provides a Confluence, Google Docs, or GitHub URL, a remote file URL, or a local file path to incorporate into the vault.

11 tools

bedrock

Stats

Stars17

Forks5

Last CommitApr 18, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Ingest

One skill that covers every way to get content into gnosis-mcp. Routes based on $ARGUMENTS:

Path to a directory / file → local file ingest
git <repo> → git-history ingest
crawl <url> → web-crawl ingest
reingest → full reset + re-ingest from the default path
prune <path> → delete DB chunks whose source is gone

Action: $ARGUMENTS

`ingest <path>` — local files

Default entry point. Handles .md, .txt, .ipynb, .toml, .csv, .json (+ .rst / .pdf if those extras are installed).

First-time ingest

gnosis-mcp ingest ./docs --embed

--embed runs the bundled ONNX embedder (requires [embeddings] extra). Without it you get keyword-only search — usually enough for dev-doc corpora (see bench-experiments), but --embed costs nothing on a first ingest and enables hybrid later if you want it.
Incremental: every file's content hash is stored, so re-running only processes changed files. Use --force to re-ingest regardless.

Chunk size

Default is 2000 characters (~600 tokens) — peak of the Feb 2026 sweep on a real dev-docs corpus. Override per-ingest or globally:

# This invocation only
GNOSIS_MCP_CHUNK_SIZE=1500 gnosis-mcp ingest ./docs --embed

# Persistent (put in shell profile)
export GNOSIS_MCP_CHUNK_SIZE=3000   # long-form blogs / ADRs

If you're unsure, run /gnosis:tune to sweep sizes against your own golden queries.

Reorganized your knowledge folder

Files moved, deleted, renamed. Pick one:

# Safest: re-ingest + drop chunks for files that no longer exist
gnosis-mcp ingest ./docs --embed --prune

# Nuclear: drop everything first, then re-ingest
gnosis-mcp ingest ./docs --embed --wipe

# Preview what prune would delete
gnosis-mcp prune ./docs --dry-run

By default --prune leaves crawled URLs alone (since those don't correspond to local files). Add --include-crawled if you want those gone too.

`ingest git <repo>` — git commit history

Indexes each file's commit history as a searchable markdown document. Lets your agent answer "why does this code exist" queries.

gnosis-mcp ingest-git /path/to/repo --since 6m --embed

Common flags:

Flag	Effect
`--since 6m` / `--since 2025-01-01`	Window of commits to include
`--until 2026-03-01`	Upper bound
`--author "alice@"`	Filter by author name or email substring
`--max-commits-per-file 20`	Default 10, most-recent wins
`--include "src/**"`	Glob filter on touched files
`--exclude "*.lock,package.json"`	Skip noisy files
`--include-merges`	Default excludes merge commits

Each indexed doc's file_path is git-history/<original-path>.md. Cross-file co-edits generate git_co_change edges; source-file references get git_ref. Query them via mcp__gnosis__search_git_history (or filter mcp__gnosis__get_related by relation_type=git_co_change).

Re-run ingest-git whenever your history grows past the window you already indexed.

`ingest crawl <url>` — web crawl

Indexes a documentation website. Requires the [web] extra (pip install 'gnosis-mcp[web]').

# Preferred — discover URLs from sitemap.xml
gnosis-mcp crawl https://docs.stripe.com --sitemap --embed

# No sitemap? BFS link crawl, one hop deep
gnosis-mcp crawl https://docs.example.com --max-depth 1 --embed

# Subset only
gnosis-mcp crawl https://docs.example.com --sitemap \
  --include "/docs/api/**" --exclude "*.pdf"

# Preview, don't fetch
gnosis-mcp crawl https://docs.example.com --dry-run

Other flags:

Flag	Effect
`--max-pages 5000`	Safety cap
`--force`	Ignore the ETag / Last-Modified / hash cache

Behaviour:

Respects robots.txt. A same-host redirect on /robots.txt is treated as disallow (prevents spoofing).
Caches ETag + Last-Modified + content hash at ~/.local/share/gnosis-mcp/crawl-cache.json — subsequent crawls issue conditional GETs and skip unchanged pages.
Extracts markdown via trafilatura with a 30 s per-page timeout (GNOSIS_MCP_CRAWL_EXTRACT_TIMEOUT_S).

Vendor docs strategy: crawl them once, commit the indexed SQLite to version control, and you have offline, searchable vendor docs alongside your private docs. No Context7 subscription required.

`ingest reingest` — full reset

Drop everything, reinitialise, reindex. Use when:

You changed embedder (GNOSIS_MCP_EMBED_MODEL or _EMBED_DIM) — old vectors are now incompatible
You want to force a clean baseline before a benchmark
Schema drifted (rare, but init-db is idempotent so rerunning is safe)

gnosis-mcp init-db                        # ensure schema is current
gnosis-mcp ingest ./docs --embed --wipe   # delete everything + reingest
gnosis-mcp stats                           # confirm

`ingest prune <path>` — dead-chunk cleanup

Standalone prune, independent of re-ingest.

# What would go
gnosis-mcp prune ./docs --dry-run

# Delete chunks for files no longer on disk under ./docs
gnosis-mcp prune ./docs

# Also drop crawled URLs (normally spared)
gnosis-mcp prune ./docs --include-crawled

Safer than --wipe because it only deletes rows whose original file_path resolved as a local file under the given root AND is now missing. Crawled URLs, git-history docs (git-history/*), and any path outside the root are untouched unless you explicitly opt in.

Watch mode (server-side auto-reingest)

Skip the manual re-run loop — the server can watch a folder and re-ingest on file changes.

gnosis-mcp serve --watch ./docs --transport streamable-http --rest

Mtime polling + debounce. Works on every OS, no fsnotify dependency. Ideal for docs-as-code repos where you push a doc and want it searchable by your editor within a few seconds.

Verify afterwards

Always run gnosis-mcp stats (or /gnosis:status stats) after a big ingest to confirm:

Doc count matches expectations
Chunk count is sensible (~1-5 chunks per doc for ~2000-char chunk size)
Embeddings coverage is 100 % if you used --embed
No vec0 table errors

$ gnosis-mcp stats
Documents: 558
Chunks:    1,742
Embeddings: 1,742 / 1,742 (100.0 %)
Last access log entry: 2026-04-18 07:12 UTC
Backend: sqlite

ingest

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

ingest

Tool Access

Preview

SKILL.md

Ingest

Action: $ARGUMENTS

`ingest <path>` — local files

First-time ingest

Chunk size

Reorganized your knowledge folder

`ingest git <repo>` — git commit history

`ingest crawl <url>` — web crawl

`ingest reingest` — full reset

`ingest prune <path>` — dead-chunk cleanup

Watch mode (server-side auto-reingest)

Verify afterwards

See also

Similar Skills

Help us improve

Ingest

Action: $ARGUMENTS

`ingest <path>` — local files

First-time ingest

Chunk size

Reorganized your knowledge folder

`ingest git <repo>` — git commit history

`ingest crawl <url>` — web crawl

`ingest reingest` — full reset

`ingest prune <path>` — dead-chunk cleanup

Watch mode (server-side auto-reingest)

Verify afterwards

See also

ingest

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

ingest

Tool Access

Preview

SKILL.md

Ingest

Action: $ARGUMENTS

ingest <path> — local files

First-time ingest

Chunk size

Reorganized your knowledge folder

ingest git <repo> — git commit history

ingest crawl <url> — web crawl

ingest reingest — full reset

ingest prune <path> — dead-chunk cleanup

Watch mode (server-side auto-reingest)

Verify afterwards

See also

Similar Skills

Help us improve

Ingest

Action: $ARGUMENTS

ingest <path> — local files

First-time ingest

Chunk size

Reorganized your knowledge folder

ingest git <repo> — git commit history

ingest crawl <url> — web crawl

ingest reingest — full reset

ingest prune <path> — dead-chunk cleanup

Watch mode (server-side auto-reingest)

Verify afterwards

See also

`ingest <path>` — local files

`ingest git <repo>` — git commit history

`ingest crawl <url>` — web crawl

`ingest reingest` — full reset

`ingest prune <path>` — dead-chunk cleanup

`ingest <path>` — local files

`ingest git <repo>` — git commit history

`ingest crawl <url>` — web crawl

`ingest reingest` — full reset

`ingest prune <path>` — dead-chunk cleanup