Guide for writing mcp-vector-search configuration files
Creates `.mcp-vector-search/config.edn` files for semantic document search. Use when users need to set up vector search indexing for documentation or code. Triggers on requests to configure mcp-vector-search or index project files.
/plugin marketplace add hugoduncan/mcp-vector-search/plugin install config-guide@mcp-vector-search-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Comprehensive reference for writing .mcp-vector-search/config.edn configuration files.
mcp-vector-search indexes documents using semantic embeddings and provides a search tool via the Model Context Protocol. Configuration controls which files are indexed, how they're processed, and what metadata is extracted.
Create a configuration file at one of these locations (first found is used):
.mcp-vector-search/config.edn (in project root)~/.mcp-vector-search/config.edn (user home directory)The server reads the configuration at startup and indexes all specified sources.
{:description "Custom search tool description" ; optional
:watch? true ; optional, enable file watching
:sources [{:path "/docs/**/*.md"
:name "Documentation" ; optional
:ingest :whole-document ; optional, defaults to :whole-document
:watch? true ; optional, overrides global :watch?
:custom-key "custom-value"}]} ; any additional keys become metadata
Top-level keys:
:description - Custom description for the search tool (optional):watch? - Enable automatic re-indexing when files change (optional, default: false):sources - Array of source configurations (required)Filesystem sources (:path key):
/{:path "/docs/**/*.md"}
Classpath sources (:class-path key):
/{:class-path "docs/**/*.md"}
Important: Sources must specify exactly one of :path or :class-path.
Single-level glob (*):
{:path "/docs/*.md"} ; matches /docs/README.md
; does NOT match /docs/api/guide.md
Recursive glob (**):
{:path "/docs/**/*.md"} ; matches /docs/README.md
; matches /docs/api/guide.md
Extract metadata from file paths using named regex groups:
{:path "/docs/(?<category>[^/]+)/*.md"}
Syntax: (?<name>pattern)
name - Metadata key (converted to keyword)pattern - Java regular expressionFor file /docs/api/functions.md:
{:category "api"}Multiple captures example:
{:path "/(?<project>[^/]+)/(?<version>v\\d+)/(?<file>.+\\.clj)"}
For file /myapp/v1/core.clj:
{:project "myapp", :version "v1", :file "core.clj"};; All markdown files recursively
{:path "/docs/**/*.md"}
;; Single directory level
{:path "/docs/*.md"}
;; Capture directory name
{:path "/docs/(?<category>[^/]+)/*.md"}
;; Multiple captures
{:path "/(?<project>[^/]+)/(?<version>[^/]+)/**/*.clj"}
;; Literal file
{:path "/docs/README.md"}
;; Classpath resource (no leading /)
{:class-path "docs/**/*.md"}
Ingest strategies control how documents are processed, embedded, and stored. Set via the :ingest key.
Embeds and stores the entire file content as a single segment.
{:sources [{:path "/docs/**/*.md"
:ingest :whole-document}]}
Characteristics:
Use when: You want to search across complete documents and return full content.
For Clojure source files - embeds only the namespace docstring but stores the full file content.
{:sources [{:path "/src/**/*.clj"
:ingest :namespace-doc}]}
Requirements:
ns form:namespace to metadata (e.g., {:namespace "my.app.core"})Characteristics:
Use when: You want to search Clojure namespaces by their documentation while still returning the complete source code.
Embeds the full content but stores only the file path.
{:sources [{:path "/docs/**/*.md"
:ingest :file-path}]}
Characteristics:
Use when:
Analyzes Clojure and Java source files using clj-kondo to extract code elements (vars, namespaces, classes, methods, fields, macros). Creates one searchable segment per code element.
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis}]}
Configuration options:
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis
:visibility :public-only ; :all (default) | :public-only
:element-types #{:var :macro}}]} ; optional filter
:visibility - Controls which elements to include:
:all (default) - Include all elements regardless of visibility:public-only - Include only public elements
^:private or {:private true} metadataprivate or protected access modifiers:element-types (optional) - Set of element types to include:
:var, :macro, :namespace, :class, :method, :field, :constructorCharacteristics:
Segment metadata:
:element-type - Type of code element (var, macro, namespace, class, method, field, constructor):element-name - Qualified name (e.g., "my.ns/my-fn" or "com.example.MyClass.myMethod"):language - Source language (clojure or java):namespace - Containing namespace (Clojure) or package (Java):visibility - Access level (public, private, or protected)Use when: You want to search code by documentation or API discovery, finding functions/classes/methods based on their purpose rather than file names.
Examples:
;; Search all code elements
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis}]}
;; Search only public API
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis
:visibility :public-only}]}
;; Search only vars and macros
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis
:element-types #{:var :macro}}]}
;; Java source code analysis
{:sources [{:path "/src/**/*.java"
:ingest :code-analysis
:visibility :public-only}]}
Splits documents into smaller segments using LangChain4j's recursive text splitter. Enables better semantic search for large documents.
{:sources [{:path "/docs/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100}]}
Configuration:
:chunk-size - Maximum characters per chunk (default: 512):chunk-overlap - Characters to overlap between chunks (default: 100)Note: LangChain4j's recursive paragraph splitter prioritizes semantic boundaries (paragraph breaks) over exact overlap amounts. Adjacent chunks may have less overlap than specified if splitting at a paragraph boundary.
Characteristics:
:doc-id for batch removal during updates:chunk-index (position), :chunk-count (total chunks), :chunk-offset (character offset)Chunk sizing guidance:
Use when: You have large documents and need precise fact-based retrieval where specific information may be buried in lengthy content.
Examples:
;; Fine-grained retrieval for technical docs
{:sources [{:path "/docs/**/*.md"
:ingest :chunked
:chunk-size 384
:chunk-overlap 75}]}
;; Broader context for narrative content
{:sources [{:path "/articles/**/*.md"
:ingest :chunked
:chunk-size 1024
:chunk-overlap 200}]}
;; Compare strategies for different content types
{:sources [
;; Small reference docs - whole document works well
{:path "/api-reference/**/*.md"
:ingest :whole-document}
;; Large guides - chunking improves precision
{:path "/guides/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100}]}
Metadata comes from two sources:
:path, :class-path, :name, :ingest, :watch?){:sources [{:path "/docs/(?<category>[^/]+)/*.md"
:project "my-project"
:type "documentation"}]}
For a file /docs/api/functions.md:
{:project "my-project", :type "documentation", :category "api"}The :name key, if provided, is also added to metadata.
System-added metadata:
:doc-id - File path (used for watch updates/deletes):file-id - File path:segment-id - Unique segment identifierStrategy-specific metadata:
:namespace-doc adds: :namespace:code-analysis adds: :element-type, :element-name, :language, :namespace, :visibility:chunked adds: :chunk-index, :chunk-count, :chunk-offsetOptional file watching system for automatic re-indexing when files change.
Configuration:
:watch? true enables watching for all sources:watch? true/false overrides global setting:path), not classpath sources{:watch? true ; enable globally
:sources [
{:path "/docs/**/*.md"} ; watched (global setting)
{:path "/src/**/*.clj"
:watch? false} ; not watched (override)
{:path "/notes/**/*.txt"
:watch? true}]} ; watched (explicit)
Behavior:
:doc-id, re-index:doc-id** glob{:sources [{:path "/Users/me/docs/**/*.md"}]}
{:description "Project documentation and code search"
:sources [
{:path "/Users/me/project/docs/**/*.md"
:name "Documentation"
:type "docs"}
{:path "/Users/me/project/src/**/*.clj"
:ingest :namespace-doc
:name "Source Code"
:type "code"}]}
{:sources [{:path "/docs/(?<category>[^/]+)/*.md"
:project "myapp"
:type "documentation"}]}
{:sources [
{:path "/src/**/*.clj"
:ingest :code-analysis
:visibility :public-only
:element-types #{:var :macro}}
{:path "/src/**/*.java"
:ingest :code-analysis
:visibility :public-only}]}
{:sources [
{:path "/guides/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100}]}
{:sources [
;; Filesystem documentation
{:path "/Users/me/docs/**/*.md"
:source "local"}
;; Bundled library documentation from classpath
{:class-path "lib-docs/**/*.md"
:source "library"}
;; Clojure source from classpath
{:class-path "my/app/**/*.clj"
:ingest :namespace-doc
:source "library-code"}]}
{:watch? true
:sources [
{:path "/Users/me/project/docs/**/*.md"}
{:path "/Users/me/project/src/**/*.clj"
:ingest :namespace-doc}]}
{:description "Comprehensive project search"
:watch? true
:sources [
;; API reference - small docs, keep whole
{:path "/docs/api/**/*.md"
:ingest :whole-document
:category "api-reference"}
;; User guides - large docs, chunk them
{:path "/docs/guides/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100
:category "guides"}
;; Public API code
{:path "/src/(?<namespace>[^/]+)/**/*.clj"
:ingest :code-analysis
:visibility :public-only
:category "code"}
;; README files - whole document
{:path "/(?<project>[^/]+)/README.md"
:ingest :whole-document
:category "readme"}]}
Path specifications:
/)/)Ingest strategies:
:whole-document for most use cases:namespace-doc for Clojure codebases to search by documentation:code-analysis when you need fine-grained API discovery:chunked for large documents (>1000 chars):file-path when you need to minimize memory usageMetadata:
File watching:
:watch? true for developmentPerformance:
:file-path strategy significantly reduces memory usage