Help us improve
Share bugs, ideas, or general feedback.
npx claudepluginhub robertogogoni/cortex-claudeClaude's Cognitive Layer - Zero-cost dual-model memory with MCP Sampling, HyDE search, write gates, and bi-temporal knowledge
Share bugs, ideas, or general feedback.
Zero-cost, self-healing memory OS for Claude Code with MCP Sampling, HyDE search, write gates, and bi-temporal knowledge
Quick Start | How It Works | Commands | Roadmap

Claude Code is powerful, but it forgets everything between sessions. Cortex solves this with a dual-model cognitive layer:
| Problem | How Others Solve It | How Cortex Solves It |
|---|---|---|
| Claude forgets everything between sessions | CLAUDE.md files (manual) | Auto-extraction at session end + auto-injection at session start |
| Context lost when window compresses | Nothing (data lost forever) | PreCompact hook saves critical context before compression |
| Memory search is keyword-only | Basic text matching | Hybrid vector search (HNSW + BM25 + Reciprocal Rank Fusion) |
| No reasoning about memories | Store and retrieve | Dual-model: Haiku queries + Sonnet reflects/infers/learns |
| Memory grows without limit | Manual cleanup | LADS framework: auto-consolidation, decay, tier promotion |
| No cost visibility | Hidden API costs | Per-operation costs shown: query ~$0.001, reflect ~$0.01 |
graph TB
subgraph "Claude Code Session"
CC[Claude Code] -->|SessionStart| SS[Session Start Hook]
CC -->|SessionEnd| SE[Session End Hook]
CC -->|PreCompact| PC[PreCompact Hook]
CC -->|MCP Tools| MCP[Cortex MCP Server]
end
subgraph "Dual-Model Engine"
MCP -->|Fast queries| H[Haiku Worker<br/>~$0.001/call]
MCP -->|Deep reasoning| S[Sonnet Thinker<br/>~$0.01/call]
end
subgraph "Memory Layers"
H --> WM[Working Memory<br/>< 24h, max 50]
H --> ST[Short-Term<br/>1-7 days, max 200]
H --> LT[Long-Term<br/>Permanent, quality-filtered]
S --> INS[Insights & Learnings]
end
subgraph "Search Engine"
H --> VS[Vector Search<br/>HNSW + BM25]
VS --> RRF[Reciprocal Rank Fusion]
end
SE -->|Extract| EE[Extraction Engine]
EE -->|Store| WM
WM -->|Promote| ST
ST -->|Promote| LT
Full semantic search with local embeddings - no external API calls for search. New in v3.0: HyDE (Hypothetical Document Embeddings) query expansion for improved recall.
| Feature | Specification |
|---|---|
| Embedding Model | all-MiniLM-L6-v2 (384 dimensions) |
| Index Type | HNSW (Hierarchical Navigable Small World) |
| Text Search | BM25 via SQLite FTS5 |
| Ranking | Reciprocal Rank Fusion (RRF) |
| Vector Count | 672 vectors indexed |
| Query Speed | ~21ms warm, ~500ms cold |
Cortex combines multiple search strategies for best results:
User Query: "JWT authentication patterns"
|
+---------------+---------------+
| |
v v
+----------+ +------------+
| BM25 | | Vector |
| (FTS5) | | (HNSW) |
+----+-----+ +-----+------+
| |
| keyword matches | semantic similarity
| |
+-------------+---------------+
|
v
+------------------+
| RRF Fusion |
| (k=60 default) |
+--------+---------+
|
v
Ranked Results
| Operation | Cold Start | Warm Cache |
|---|---|---|
| Model load | 1,389ms | N/A (cached) |
| Embedding generation | ~200ms | ~21ms |
| HNSW search | ~50ms | ~10ms |
| Hybrid query | ~500ms | ~100ms |
Cortex works at two levels: