From magic-powers
Use when designing memory systems for AI agents — tiered memory architecture (in-context, session, long-term, episodic), context window management, memory compression, and retrieval strategies for persistent agent state.
npx claudepluginhub kienbui1995/magic-powers --plugin magic-powersThis skill uses the workspace's default tool permissions.
Agents forget everything between sessions by default. Building memory into agents requires deliberate architecture: choosing what to store, where, for how long, and how to retrieve it efficiently without polluting the context window.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Agents forget everything between sessions by default. Building memory into agents requires deliberate architecture: choosing what to store, where, for how long, and how to retrieve it efficiently without polluting the context window.
Four tiers, each with different storage, latency, and persistence:
| Tier | Storage | Latency | Lifetime | Use case |
|---|---|---|---|---|
| In-context | Token window (4K-200K) | 0ms | Current session | Active task state, recent tool results, current conversation |
| Session | Redis / Postgres | 1-10ms | One conversation | Conversation history, user preferences in session, task progress |
| Long-term | Vector DB + key-value | 10-100ms | Persistent | User facts, learned patterns, past decisions |
| Episodic | DB + vector embeddings | 50-200ms | Persistent | Past task completions, examples, learned workflows |
class TieredMemory:
def __init__(self):
self.in_context = [] # current messages
self.session = SessionStore() # Redis
self.long_term = VectorDB() # Pinecone/Weaviate/pgvector
self.episodic = EpisodicDB() # past task completions
def recall(self, query: str, tiers=["session", "long_term"]) -> list[Memory]:
results = []
if "session" in tiers:
results.extend(self.session.get_relevant(query))
if "long_term" in tiers:
results.extend(self.long_term.search(query, top_k=5))
if "episodic" in tiers:
results.extend(self.episodic.find_similar_tasks(query, top_k=3))
return deduplicate(results, key="content")
Retrieve memory before the agent starts working — not mid-task. Front-loading relevant memory prevents mid-loop context changes.
The most common practical problem — context fills up in long conversations:
def manage_context_window(messages: list, max_tokens: int = 6000) -> list:
"""Keep context within limits using priority-based pruning"""
# Always keep: system prompt + last 5 messages + current user message
must_keep = [messages[0]] + messages[-6:]
middle = messages[1:-6]
current_tokens = count_tokens(must_keep)
if current_tokens < max_tokens:
# Add middle messages until we approach limit
for msg in reversed(middle):
msg_tokens = count_tokens([msg])
if current_tokens + msg_tokens < max_tokens * 0.85:
must_keep.insert(1, msg)
current_tokens += msg_tokens
else:
# Compress: summarize the middle section
if middle:
summary = summarize(middle)
must_keep.insert(1, {
"role": "system",
"content": f"[Summary of earlier conversation: {summary}]"
})
return must_keep
Strategies by situation:
Not everything should be remembered. Use selective storage:
def should_store_long_term(content: str, agent_output: str) -> bool:
"""Store only information that's useful across sessions"""
store_triggers = [
"user mentioned their name",
"user stated a strong preference",
"user corrected the agent",
"user shared context about their role/company",
"important decision was made",
"user expressed frustration with agent behavior",
]
# Use LLM to classify
return llm_classify(content, store_triggers)
def store_user_fact(fact: str, user_id: str, confidence: float):
long_term_db.upsert({
"user_id": user_id,
"fact": fact,
"embedding": embed(fact),
"confidence": confidence,
"source": "agent_extraction",
"created_at": now(),
"last_accessed": now()
})
Memory decay: Old, unaccessed memories should decay in confidence. Facts accessed frequently = higher confidence. Implement a cron job that reduces confidence by a small delta each week and prunes below a threshold (e.g., confidence < 0.2).
Agents improve by referencing how similar tasks were completed:
def store_completed_task(task_id, input, steps_taken, outcome, quality_score):
episodic_db.insert({
"task_id": task_id,
"input_embedding": embed(input),
"input_summary": summarize(input),
"steps": steps_taken,
"outcome": outcome,
"quality_score": quality_score,
"duration_seconds": elapsed,
"tools_used": [s.tool for s in steps_taken],
})
def recall_similar_tasks(current_input: str, top_k: int = 3) -> list[Episode]:
query_embedding = embed(current_input)
similar = episodic_db.search(query_embedding, top_k=top_k)
# Use these as few-shot examples in the agent's context
return similar
Only store completed tasks with quality_score above a threshold (e.g., > 0.7). Storing low-quality episodes teaches the agent bad patterns.
When multiple agents share memory:
class SharedAgentMemory:
"""Thread-safe shared memory for multi-agent systems"""
def write(self, agent_id: str, key: str, value: Any, scope: str = "shared"):
"""scope: 'agent' (private) or 'shared' (all agents can read)"""
memory_store.set(
key=f"{scope}:{key}",
value=value,
metadata={"written_by": agent_id, "timestamp": now()}
)
def read(self, agent_id: str, key: str) -> Any:
# Agents can always read shared scope
# Can only read agent scope if agent_id matches
return memory_store.get(f"shared:{key}") or \
memory_store.get(f"{agent_id}:{key}")
Multi-agent memory patterns:
agentic-ai-patterns for understanding where memory fits in the observe-think-act agent looprag-architecture for vector search patterns in long-term memory retrievalllm-observability to track memory hit rates, context window utilization, and retrieval latencyagentic-security — retrieved memories are external data and should be treated as untrusted if user-supplied@ai-engineer uses this when designing stateful agent systems