From wicked-garden
Context window management, token optimization, and memory patterns for efficient multi-agent systems. Use when: "context window", "token optimization", "agent memory", "reduce token usage", "context engineering"
npx claudepluginhub mikeparcewski/wicked-garden --plugin wicked-gardenThis skill uses the workspace's default tool permissions.
Techniques for managing context windows, optimizing token usage, and designing efficient memory systems for agentic applications.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Techniques for managing context windows, optimizing token usage, and designing efficient memory systems for agentic applications.
Context Window: Maximum tokens an LLM can process in a single request (input + output).
Common Limits:
Token Efficiency Matters:
All agents access common state store.
Use when: Agents need synchronized view of world. Pros: Consistency, simple coordination Cons: Contention, single point of failure
Each agent maintains its own private state.
Use when: Agents operate independently, no coordination needed. Pros: No contention, parallel execution Cons: Inconsistency possible, harder to coordinate
Periodically save state snapshots for recovery.
Use when: Long-running processes, need recovery from failures. Pros: Fault tolerance, replayability Cons: Storage overhead, consistency complexity
See refs/compression-techniques.md for implementation details.
Compress old context into summaries to reduce token usage.
Only load relevant context based on the current task.
Use JSON/structured formats instead of prose to reduce tokens.
Example:
{"name": "John Smith", ...} (compact)Load details only when explicitly needed.
Reference external documents instead of embedding full text.
See refs/selective-loading.md and refs/caching-and-optimization.md for code examples and detailed strategies.
Recent conversation, current task state.
Scope: Current session/task Size: 1K-10K tokens Retention: Minutes to hours
Persistent knowledge, learned facts.
Scope: Cross-session, permanent Size: Unbounded (stored externally via vector DB) Retention: Days to forever
Specific past events/experiences.
Scope: Historical episodes Size: Summaries stored Retention: Varies by importance
See refs/compression-techniques.md for implementation patterns.
Be specific about agent's role and boundaries.
Example:
You are a Python code reviewer specializing in security.
Your job is to identify security vulnerabilities.
You do NOT review style or performance.
Clear, actionable instructions with explicit format.
Bad: "Review this code." Good: "Review for security: 1) SQL injection 2) Input validation 3) Secrets. Output: JSON with vulnerabilities."
Specify exact output format to reduce tokens.
Show examples for complex tasks.
See refs/selective-loading.md for detailed prompting patterns.
Load context before it's needed (if predictable). Pros: Faster response time Cons: May load unnecessary data
Load context only when explicitly needed. Pros: Minimal token usage Cons: Latency on each request
Combine both: Always load core context + JIT load task-specific context.
Track input and output tokens separately. Rates vary by model (typically $0.003-0.075 per 1K tokens).
Set hard token limits per agent/session to prevent runaway costs.
Track costs per agent to identify expensive components.
See refs/cost-calculation-budget.md and refs/cost-optimization-reporting.md for detailed cost strategies.
Sequential Pattern: Pass only output of previous agent, not entire chain.
Hierarchical Pattern: Parent gets summaries from children, children get only relevant task context.
Collaborative Pattern: Shared context (compressed), each agent adds only delta.
Autonomous Pattern: Minimal shared context, isolated context per agent.
refs/compression-techniques.md - Conversation summarization, deduplication, entity compressionrefs/selective-loading.md - Relevance filtering, time decay, token-budgeted retrievalrefs/caching-and-optimization.md - Prompt caching, semantic caching, batching, cost-aware model selectionrefs/cost-calculation-budget.md - Token pricing, cost calculation, budget managementrefs/cost-optimization-reporting.md - Cost estimation, optimization strategies, reporting