npx claudepluginhub nicholasglazer/gnosis-mcpThis skill uses the workspace's default tool permissions.
One skill that covers every way to get content into gnosis-mcp. Routes
Guides first-time Gnosis MCP setup: pip installs Python package with extras, initializes SQLite/PostgreSQL database, ingests docs folder for hybrid search. Use to enable agent doc querying.
Harvests knowledge from external sources like sibling repos, local directories, files, or web URLs into the project's KB system with provenance tracking.
Ingests an external data source into the Second Brain. Fetches content from Confluence, Google Docs, GitHub repositories, remote URLs, or any local file format supported by docling (DOCX, PPTX, XLSX, PDF, HTML, EPUB, images, Markdown, CSV, and more), converts non-markdown formats to markdown via docling, runs the /graphify extraction pipeline, and delegates entity persistence (including the graphify-output merge) to /bedrock:preserve. Use when: "bedrock learn", "bedrock-learn", "learn", "ingest source", "import document", "/bedrock:learn", or when the user provides a Confluence, Google Docs, or GitHub URL, a remote file URL, or a local file path to incorporate into the vault.
Share bugs, ideas, or general feedback.
One skill that covers every way to get content into gnosis-mcp. Routes
based on $ARGUMENTS:
git <repo> → git-history ingestcrawl <url> → web-crawl ingestreingest → full reset + re-ingest from the default pathprune <path> → delete DB chunks whose source is goneingest <path> — local filesDefault entry point. Handles .md, .txt, .ipynb, .toml, .csv,
.json (+ .rst / .pdf if those extras are installed).
gnosis-mcp ingest ./docs --embed
--embed runs the bundled ONNX embedder (requires [embeddings]
extra). Without it you get keyword-only search — usually enough for
dev-doc corpora (see bench-experiments), but --embed costs nothing
on a first ingest and enables hybrid later if you want it.--force to re-ingest regardless.Default is 2000 characters (~600 tokens) — peak of the Feb 2026 sweep on a real dev-docs corpus. Override per-ingest or globally:
# This invocation only
GNOSIS_MCP_CHUNK_SIZE=1500 gnosis-mcp ingest ./docs --embed
# Persistent (put in shell profile)
export GNOSIS_MCP_CHUNK_SIZE=3000 # long-form blogs / ADRs
If you're unsure, run /gnosis:tune to sweep sizes against your own
golden queries.
Files moved, deleted, renamed. Pick one:
# Safest: re-ingest + drop chunks for files that no longer exist
gnosis-mcp ingest ./docs --embed --prune
# Nuclear: drop everything first, then re-ingest
gnosis-mcp ingest ./docs --embed --wipe
# Preview what prune would delete
gnosis-mcp prune ./docs --dry-run
By default --prune leaves crawled URLs alone (since those don't
correspond to local files). Add --include-crawled if you want those
gone too.
ingest git <repo> — git commit historyIndexes each file's commit history as a searchable markdown document. Lets your agent answer "why does this code exist" queries.
gnosis-mcp ingest-git /path/to/repo --since 6m --embed
Common flags:
| Flag | Effect |
|---|---|
--since 6m / --since 2025-01-01 | Window of commits to include |
--until 2026-03-01 | Upper bound |
--author "alice@" | Filter by author name or email substring |
--max-commits-per-file 20 | Default 10, most-recent wins |
--include "src/**" | Glob filter on touched files |
--exclude "*.lock,package.json" | Skip noisy files |
--include-merges | Default excludes merge commits |
Each indexed doc's file_path is git-history/<original-path>.md.
Cross-file co-edits generate git_co_change edges; source-file
references get git_ref. Query them via
mcp__gnosis__search_git_history (or filter mcp__gnosis__get_related
by relation_type=git_co_change).
Re-run ingest-git whenever your history grows past the window you
already indexed.
ingest crawl <url> — web crawlIndexes a documentation website. Requires the [web] extra
(pip install 'gnosis-mcp[web]').
# Preferred — discover URLs from sitemap.xml
gnosis-mcp crawl https://docs.stripe.com --sitemap --embed
# No sitemap? BFS link crawl, one hop deep
gnosis-mcp crawl https://docs.example.com --max-depth 1 --embed
# Subset only
gnosis-mcp crawl https://docs.example.com --sitemap \
--include "/docs/api/**" --exclude "*.pdf"
# Preview, don't fetch
gnosis-mcp crawl https://docs.example.com --dry-run
Other flags:
| Flag | Effect |
|---|---|
--max-pages 5000 | Safety cap |
--force | Ignore the ETag / Last-Modified / hash cache |
Behaviour:
robots.txt. A same-host redirect on /robots.txt is
treated as disallow (prevents spoofing).~/.local/share/gnosis-mcp/crawl-cache.json — subsequent crawls
issue conditional GETs and skip unchanged pages.GNOSIS_MCP_CRAWL_EXTRACT_TIMEOUT_S).Vendor docs strategy: crawl them once, commit the indexed SQLite to version control, and you have offline, searchable vendor docs alongside your private docs. No Context7 subscription required.
ingest reingest — full resetDrop everything, reinitialise, reindex. Use when:
GNOSIS_MCP_EMBED_MODEL or _EMBED_DIM) —
old vectors are now incompatibleinit-db is idempotent so rerunning is safe)gnosis-mcp init-db # ensure schema is current
gnosis-mcp ingest ./docs --embed --wipe # delete everything + reingest
gnosis-mcp stats # confirm
ingest prune <path> — dead-chunk cleanupStandalone prune, independent of re-ingest.
# What would go
gnosis-mcp prune ./docs --dry-run
# Delete chunks for files no longer on disk under ./docs
gnosis-mcp prune ./docs
# Also drop crawled URLs (normally spared)
gnosis-mcp prune ./docs --include-crawled
Safer than --wipe because it only deletes rows whose original
file_path resolved as a local file under the given root AND is now
missing. Crawled URLs, git-history docs (git-history/*), and any
path outside the root are untouched unless you explicitly opt in.
Skip the manual re-run loop — the server can watch a folder and re-ingest on file changes.
gnosis-mcp serve --watch ./docs --transport streamable-http --rest
Mtime polling + debounce. Works on every OS, no fsnotify dependency. Ideal for docs-as-code repos where you push a doc and want it searchable by your editor within a few seconds.
Always run gnosis-mcp stats (or /gnosis:status stats) after a big
ingest to confirm:
--embed$ gnosis-mcp stats
Documents: 558
Chunks: 1,742
Embeddings: 1,742 / 1,742 (100.0 %)
Last access log entry: 2026-04-18 07:12 UTC
Backend: sqlite
/gnosis:tune — chunk-size sweep on your own corpus/gnosis:status — connectivity + DB health/gnosis:search — query the index you just populatedGNOSIS_MCP_* env var