From copilot-cli-toolkit
Manages four-tier memory for AI agents: Tier 1 semantic search (Serena+Forgetful), Tier 2 episodic session replay, Tier 3 causal graphs. Enables memory-first checks before code changes per Chesterton's Fence.
npx claudepluginhub rjmurillo/ai-agentsThis skill uses the workspace's default tool permissions.
Unified memory operations across four tiers for AI agents.
CLAUDE.md__init__.pyconftest.pymemory_core/__init__.pymemory_core/memory_router.pymemory_core/reflexion_memory.pymemory_core/schema_validation.pymemory_core/url_validation.pyreferences/agent-integration.mdreferences/api-reference.mdreferences/benchmarking.mdreferences/codebase-knowledge-graph.mdreferences/memory-router.mdreferences/quick-start.mdreferences/reflexion-memory.mdreferences/skill-reference.mdreferences/tier-selection-guide.mdreferences/troubleshooting.mdreferences/zettelkasten-memory-agents.mdresources/schemas/causal-graph.schema.jsonImplements unified four-tier memory system for AI agents: Tier 1 semantic search (Serena+Forgetful), Tier 2 episodic session replay, Tier 3 causal graphs. Use for memory-first checks before code changes.
Manages local memory for AI agents with compression detection, auto-snapshots, semantic search across episodic, semantic, and procedural tiers. Use to prevent context loss, organize memories, and track usage.
Guides agent memory system implementation, compares frameworks (Mem0, Zep/Graphiti, Letta, LangMem, Cognee), and designs persistence for cross-session knowledge retention.
Share bugs, ideas, or general feedback.
Unified memory operations across four tiers for AI agents.
# Check system health
python3 .claude/skills/memory/scripts/test_memory_health.py
# Search memory (Tier 1)
python3 .claude/skills/memory/scripts/search_memory.py "git hooks"
# Extract episode from session (Tier 2)
python3 .claude/skills/memory/scripts/extract_session_episode.py ".agents/sessions/2026-01-01-session-126.json"
# Update causal graph (Tier 3)
python3 .claude/skills/memory/scripts/update_causal_graph.py
| Scenario | Use Memory Router? | Alternative |
|---|---|---|
| Script needs memory | Yes | - |
| Agent needs deep context | No | context-retrieval agent |
| Human at CLI | No | /memory-search command |
| Cross-project semantic search | No | Forgetful MCP directly |
See context-retrieval agent for complete decision tree.
Core Insight: Memory-first architecture implements Chesterton's Fence principle for AI agents.
"Do not remove a fence until you know why it was put up" - G.K. Chesterton
Translation for agents: Do not change code/architecture/protocol until you search memory for why it exists.
Without memory search (removing fence without investigation):
With memory search (Chesterton's Fence investigation):
search_memory.py "validation logic edge case"When you encounter something you want to change:
| Change Type | Memory Search Required |
|---|---|
| Remove ADR constraint | search_memory.py "[constraint name]" |
| Bypass protocol | search_memory.py "[protocol name] why" |
| Delete >100 lines | search_memory.py "[component] purpose" |
| Refactor complex code | search_memory.py "[component] edge case" |
| Change workflow | search_memory.py "[workflow] rationale" |
Tier 1 (Semantic): Facts, patterns, constraints
Tier 2 (Episodic): Past session outcomes
Tier 3 (Causal): Decision patterns
Before changing existing systems, you MUST:
python3 .claude/skills/memory/scripts/search_memory.py "[topic]"Why BLOCKING: <50% compliance with "check memory first" guidance. Making it BLOCKING achieves 100% compliance (same pattern as session protocol gates).
Verification: Session logs must show memory search BEFORE decisions, not after.
See .agents/analysis/chestertons-fence.md for:
Key takeaway: Memory IS your investigation tool. It contains the "why" that Chesterton's Fence requires you to discover.
This skill implements progressive disclosure principles from Anthropic and claude-mem.ai research through three-layer architecture.
| Layer | Tool | Cost | When to Use |
|---|---|---|---|
| Index | search_memory.py | ~100-500 tokens | Always start here |
| Details | mcp__serena__read_memory | ~500-10K tokens | After index confirms relevance |
| Deep Dive | Follow cross-references | Variable | For complete understanding |
# Count tokens before retrieval (informed ROI decision)
python3 .claude/skills/memory/scripts/count_memory_tokens.py .serena/memories/memory-index.md
# Output: memory-index.md: 2,450 tokens
Caching: SHA-256 hash-based cache in .serena/.token-cache.json provides 10-100x speedup on repeated queries.
See: scripts/README-count-tokens.md
# Pre-commit hook: enforce atomicity thresholds
python3 .claude/skills/memory/scripts/test_memory_size.py .serena/memories --pattern "*.md"
# Exit 0 (pass) or 1 (fail) with decomposition recommendations
Thresholds (from memory-size-001-decomposition-thresholds):
See: scripts/README-test-size.md
Progressive Disclosure: List names → Read details → Deep dive on cross-references. Prevents loading 9,500 tokens when only 1,200 are relevant (87% waste reduction).
Just-in-Time Retrieval: Serena-first with Forgetful augmentation. High precision through lexical search before expensive semantic operations.
Size Enforcement: Atomic memories prevent token waste. One retrievable concept per file.
For full analysis, see: .agents/analysis/context-engineering.md
Use this skill when the user says:
search memory for semantic search across tierscheck memory health for system statusextract episode from session for session replayupdate causal graph for pattern trackingcount memory tokens for budget analysis| Operation | Script | Key Parameters |
|---|---|---|
| Search facts/patterns | search_memory.py | query, --lexical-only, --max-results |
| Extract episode | extract_session_episode.py | session_log_path, --output-path |
| Update patterns | update_causal_graph.py | --episode-path, --dry-run |
| Health check | test_memory_health.py | --format (json/table) |
| Benchmark performance | measure_memory_performance.py | --serena-only, --format |
| Convert index links | convert_index_table_links.py | --memory-path, --dry-run |
| Cross-reference | invoke_memory_cross_reference.py | --memory-path, --threshold |
| Improve graph density | improve_memory_graph_density.py | --memory-path, --dry-run |
What do you need?
│
├─► Current facts, patterns, or rules?
│ └─► TIER 1: search_memory.py
│
├─► What happened in a specific session?
│ └─► TIER 2: Episode JSON in .agents/memory/episodes/
│
├─► Need to store new knowledge?
│ ├─ From completed session? → extract_session_episode.py
│ └─ Factual knowledge? → using-forgetful-memory skill
│
├─► Update decision patterns?
│ └─► TIER 3: update_causal_graph.py
│
└─► Not sure which tier?
└─► Start with TIER 1 (search_memory.py), escalate if insufficient
| Anti-Pattern | Do This Instead |
|---|---|
| Skipping memory search | Always search before multi-step reasoning |
| Tier confusion | Follow decision tree explicitly |
| Forgetful dependency | Use --lexical-only fallback |
| Stale causal graph | Run update_causal_graph.py after extractions |
| Incomplete extraction | Only extract from COMPLETED sessions |
| Document | Content |
|---|---|
| quick-start.md | Common workflows |
| skill-reference.md | Detailed script parameters |
| tier-selection-guide.md | When to use each tier |
| memory-router.md | ADR-037 router architecture |
| reflexion-memory.md | ADR-038 episode/causal schemas |
| troubleshooting.md | Error recovery |
| benchmarking.md | Performance targets |
| agent-integration.md | Multi-agent patterns |
| zettelkasten-memory-agents.md | Atomic memory principle and auto-linking |
| codebase-knowledge-graph.md | GitNexus pattern for structural context via MCP |
| Data | Location |
|---|---|
| Serena memories | .serena/memories/*.md |
| Forgetful memories | HTTP MCP (vector DB) |
| Episodes | .agents/memory/episodes/*.json |
| Causal graph | .agents/memory/causality/causal-graph.json |
| Operation | Verification |
|---|---|
| Search completed | Result count > 0 OR logged "no results" |
| Episode extracted | JSON file in .agents/memory/episodes/ |
| Graph updated | Stats show nodes/edges added |
| Health check | All tiers show "available: true" |
python3 .claude/skills/memory/scripts/test_memory_health.py --format table
Determine the memory tier and run the appropriate script.
Verify results are non-empty and relevant to the query context.
Return structured results to the caller with source attribution.
| Script | Purpose | Exit Codes |
|---|---|---|
search_memory.py | Tier 1 semantic search across Serena and Forgetful | 0=success, 1=error |
count_memory_tokens.py | Token counting with tiktoken caching | 0=success, 1=error |
test_memory_size.py | Memory atomicity validation | 0=pass, 1=violations |
test_memory_health.py | System health dashboard | 0=success |
extract_session_episode.py | Episode extraction from session logs | 0=success, 1=error |
update_causal_graph.py | Causal graph pattern tracking | 0=success, 1=error |
measure_memory_performance.py | Serena/Forgetful benchmark | 0=success, 1=error |
| Skill | When to Use Instead |
|---|---|
using-forgetful-memory | Deep Forgetful operations (create, update, link) |
curating-memories | Memory maintenance (obsolete, deduplicate) |
exploring-knowledge-graph | Multi-hop graph traversal |