Search everything...

Skill

Agent Skills for Context Engineering

Installs bundles of skills for LLM context engineering, multi-agent architectures, memory systems, tool design, and agent evaluation in Claude Code and Cursor.

Anthropic

Install

npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37

Tool Access

This skill uses the workspace's default tool permissions.

Preview

```markdown

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.2k

mcp-builder

9 files

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

anthropics-skills-13

124.2k

canvas-design

20 files

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

anthropics-skills-13

124.2k

Stats

Stars36

Forks8

Last CommitMar 18, 2026

Actions

View Source View Plugin View on GitHub View README

Agent Skills for Context Engineering

From aradotso-trending-skills-37

Installs bundles of skills for LLM context engineering, multi-agent architectures, memory systems, tool design, and agent evaluation in Claude Code and Cursor.

Install

npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37

Tool Access

This skill uses the workspace's default tool permissions.

Preview

```markdown

SKILL.md

---
name: agent-skills-context-engineering
description: Comprehensive collection of Agent Skills for context engineering, multi-agent architectures, memory systems, and production agent systems using Claude Code, Cursor, and other AI platforms.
triggers:
  - "context engineering for agents"
  - "build multi-agent system"
  - "install agent skills claude code"
  - "context window management"
  - "agent memory architecture"
  - "optimize agent context"
  - "implement BDI mental states"
  - "design agent evaluation framework"
---

# Agent Skills for Context Engineering

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

A comprehensive, open collection of Agent Skills focused on context engineering — the discipline of curating what enters an LLM's context window to maximize agent effectiveness. Covers foundational context mechanics, multi-agent architectures, memory systems, tool design, evaluation, and cognitive modeling.

## What This Project Does

Context engineering is about managing the **holistic set of tokens** that enter a model's attention budget: system prompts, tool definitions, retrieved documents, message history, and tool outputs. This repository provides structured, installable skills that teach AI coding agents these principles across any platform.

Key problems addressed:
- **Lost-in-the-middle**: Models degrade when relevant content is buried in long contexts
- **Context poisoning/distraction**: Irrelevant tokens degrade reasoning quality
- **Attention scarcity**: More tokens ≠ better outcomes; fewer high-signal tokens do
- **Multi-agent coordination**: How agents hand off context without loss

## Installation

### Claude Code (Plugin Marketplace)

```bash
# Register the marketplace
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering

# Install individual plugin bundles
/plugin install context-engineering-fundamentals@context-engineering-marketplace
/plugin install agent-architecture@context-engineering-marketplace
/plugin install agent-evaluation@context-engineering-marketplace
/plugin install agent-development@context-engineering-marketplace
/plugin install cognitive-architecture@context-engineering-marketplace

Cursor

Listed on Cursor Plugin Directory. Install via the Cursor plugin panel or reference .plugin/plugin.json directly.

Manual / Custom Agent

Clone and reference skill files directly:

git clone https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering.git

Load skill content from skills/<skill-name>/SKILL.md into your agent's system prompt or context.

Plugin Bundles

Plugin	Skills Included
`context-engineering-fundamentals`	context-fundamentals, context-degradation, context-compression, context-optimization
`agent-architecture`	multi-agent-patterns, memory-systems, tool-design, filesystem-context, hosted-agents
`agent-evaluation`	evaluation, advanced-evaluation
`agent-development`	project-development
`cognitive-architecture`	bdi-mental-states

Repository Structure

Agent-Skills-for-Context-Engineering/
├── .plugin/
│   └── plugin.json              # Open Plugins manifest
├── skills/
│   ├── context-fundamentals/    # Context anatomy, token budgets
│   ├── context-degradation/     # Failure modes and diagnostics
│   ├── context-compression/     # Compression and summarization
│   ├── context-optimization/    # Caching, masking, compaction
│   ├── multi-agent-patterns/    # Orchestrator, peer, hierarchical
│   ├── memory-systems/          # Short/long-term, graph memory
│   ├── tool-design/             # Effective tool construction
│   ├── filesystem-context/      # File-based context offloading
│   ├── hosted-agents/           # Sandboxed background agents
│   ├── evaluation/              # Agent evaluation frameworks
│   ├── advanced-evaluation/     # LLM-as-a-Judge techniques
│   ├── project-development/     # LLM project methodology
│   └── bdi-mental-states/       # BDI cognitive architecture
└── examples/
    ├── digital-brain-skill/     # Personal OS for founders
    ├── x-to-book-system/        # Multi-agent X→book pipeline
    ├── llm-as-judge-skills/     # TypeScript evaluation tools
    └── book-sft-pipeline/       # Style transfer fine-tuning

Core Concepts

Context Window Anatomy

# The five components competing for attention budget
context = {
    "system_prompt": "...",          # Role, instructions, constraints
    "tool_definitions": [...],       # Available tools and schemas
    "retrieved_documents": [...],    # RAG results, memory lookups
    "message_history": [...],        # Conversation turns
    "tool_outputs": [...],           # Results from tool calls
}

# Token budget allocation example
TOTAL_BUDGET = 128_000  # tokens
budget = {
    "system_prompt":      2_000,   # 1.6%  — keep tight
    "tool_definitions":   5_000,   # 3.9%  — prune unused tools
    "retrieved_documents":40_000,  # 31%   — highest ROI
    "message_history":   70_000,   # 55%   — compress aggressively
    "tool_outputs":      11_000,   # 8.5%  — offload to filesystem
}

Context Degradation Patterns

# Pattern 1: Lost-in-the-middle
# Critical information placed in the center of a long context
# degrades recall significantly. Always place key info at edges.

def order_context_for_attention(documents: list[str], query: str) -> list[str]:
    """Place most relevant documents first and last."""
    scored = rank_by_relevance(documents, query)
    n = len(scored)
    ordered = [None] * n
    # High relevance → positions 0 and -1
    for i, doc in enumerate(scored):
        if i % 2 == 0:
            ordered[i // 2] = doc          # fill from front
        else:
            ordered[n - 1 - i // 2] = doc  # fill from back
    return ordered

# Pattern 2: Context poisoning
# Contradictory or stale information causes unpredictable behavior
def validate_context_consistency(facts: list[dict]) -> list[dict]:
    """Remove contradicting or outdated facts before injection."""
    seen_keys = {}
    clean = []
    for fact in sorted(facts, key=lambda f: f["timestamp"], reverse=True):
        key = fact["subject"] + fact["predicate"]
        if key not in seen_keys:
            seen_keys[key] = True
            clean.append(fact)
    return clean

Context Compression

import anthropic

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY env var

def compress_conversation(
    messages: list[dict],
    keep_last_n: int = 10,
    model: str = "claude-opus-4-5",
) -> list[dict]:
    """
    Compress long conversation history into a summary + recent tail.
    Preserves decisions, outcomes, and key entities.
    """
    if len(messages) <= keep_last_n:
        return messages

    to_compress = messages[:-keep_last_n]
    recent = messages[-keep_last_n:]

    summary_prompt = f"""Summarize this conversation segment.
Preserve: decisions made, key entities, open questions, errors encountered.
Discard: pleasantries, repetition, superseded plans.

Conversation:
{format_messages(to_compress)}
"""

    response = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=[{"role": "user", "content": summary_prompt}],
    )

    summary_message = {
        "role": "assistant",
        "content": f"[COMPRESSED HISTORY]\n{response.content[0].text}",
    }

    return [summary_message] + recent


def format_messages(messages: list[dict]) -> str:
    return "\n".join(
        f"{m['role'].upper()}: {m['content']}" for m in messages
    )

Multi-Agent Patterns

# Orchestrator pattern — one agent routes, subagents execute
class OrchestratorAgent:
    def __init__(self, subagents: dict[str, "SubAgent"]):
        self.subagents = subagents
        self.client = anthropic.Anthropic()

    def route(self, task: str) -> str:
        """Determine which subagent handles this task."""
        routing_prompt = f"""Given this task, which specialist should handle it?
Specialists: {list(self.subagents.keys())}
Task: {task}
Reply with only the specialist name."""

        response = self.client.messages.create(
            model="claude-opus-4-5",
            max_tokens=50,
            system="You are a routing agent. Reply with only the specialist name.",
            messages=[{"role": "user", "content": routing_prompt}],
        )
        return response.content[0].text.strip()

    def execute(self, task: str) -> str:
        specialist = self.route(task)
        if specialist not in self.subagents:
            raise ValueError(f"Unknown specialist: {specialist}")
        # Pass minimal context — only what the subagent needs
        return self.subagents[specialist].run(task)


# Context handoff — pass structured summaries, not raw history
def create_handoff_context(completed_work: dict) -> str:
    """Minimal handoff context between agents."""
    return f"""AGENT HANDOFF
Task: {completed_work['task']}
Status: {completed_work['status']}
Key Outputs: {completed_work['outputs']}
Open Questions: {completed_work.get('open_questions', 'None')}
Next Agent Should: {completed_work['next_steps']}
"""

Memory Systems

import json
from pathlib import Path
from datetime import datetime

# Append-only JSONL memory — agent-friendly, auditable
class AgentMemory:
    def __init__(self, path: str = "agent_memory.jsonl"):
        self.path = Path(path)
        # Schema declaration as first line
        if not self.path.exists():
            self.path.write_text(
                json.dumps({"_schema": "v1", "fields": ["ts", "type", "key", "value"]}) + "\n"
            )

    def remember(self, memory_type: str, key: str, value: str) -> None:
        entry = {
            "ts": datetime.utcnow().isoformat(),
            "type": memory_type,   # "fact" | "decision" | "entity" | "error"
            "key": key,
            "value": value,
        }
        with self.path.open("a") as f:
            f.write(json.dumps(entry) + "\n")

    def recall(self, memory_type: str | None = None, limit: int = 50) -> list[dict]:
        entries = []
        with self.path.open() as f:
            for line in f:
                entry = json.loads(line)
                if "_schema" in entry:
                    continue
                if memory_type is None or entry["type"] == memory_type:
                    entries.append(entry)
        return entries[-limit:]  # most recent N

    def recall_as_context(self, memory_type: str | None = None) -> str:
        entries = self.recall(memory_type)
        if not entries:
            return "No relevant memories."
        lines = [f"[{e['ts']}] {e['type']}/{e['key']}: {e['value']}" for e in entries]
        return "\n".join(lines)


# Usage
memory = AgentMemory()
memory.remember("decision", "database_choice", "PostgreSQL — chosen for JSONB support")
memory.remember("entity", "user_id_format", "UUID v4, stored as TEXT")

# Inject into agent context
context = f"""AGENT MEMORY
{memory.recall_as_context()}
---
"""

Tool Design Principles

# Good tool: single responsibility, structured output, error info included
def search_codebase(
    query: str,
    file_pattern: str = "**/*.py",
    max_results: int = 10,
) -> dict:
    """
    Search codebase for relevant code.

    Returns structured results an agent can parse without hallucination.
    Always include metadata — agents need to know WHERE results came from.
    """
    import glob, re

    results = []
    for filepath in glob.glob(file_pattern, recursive=True):
        try:
            content = Path(filepath).read_text()
            if query.lower() in content.lower():
                # Find line numbers for precise context
                lines = content.splitlines()
                matches = [
                    {"line": i + 1, "text": line}
                    for i, line in enumerate(lines)
                    if query.lower() in line.lower()
                ]
                results.append({
                    "file": filepath,
                    "match_count": len(matches),
                    "matches": matches[:3],  # Top 3 per file
                })
        except (UnicodeDecodeError, PermissionError):
            pass

    return {
        "query": query,
        "total_files_matched": len(results),
        "results": results[:max_results],
        "truncated": len(results) > max_results,
    }


# Tool output offloading — don't bloat context with large outputs
def run_with_file_output(tool_fn, args: dict, output_path: str) -> str:
    """
    Run a tool and write output to file instead of returning to context.
    Returns a file reference the agent can selectively read.
    """
    result = tool_fn(**args)
    Path(output_path).write_text(json.dumps(result, indent=2))
    return f"[OUTPUT SAVED: {output_path}] — {len(str(result))} chars. Read with read_file('{output_path}')."

LLM-as-Judge Evaluation

import anthropic
from enum import Enum

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY

class JudgeVerdict(Enum):
    A_BETTER = "A"
    B_BETTER = "B"
    TIE = "TIE"

def pairwise_judge(
    prompt: str,
    response_a: str,
    response_b: str,
    criteria: list[str],
    model: str = "claude-opus-4-5",
) -> dict:
    """
    Compare two responses with position bias mitigation.
    Runs A/B and B/A then averages to cancel order effects.
    """

    def single_comparison(first: str, second: str) -> str:
        criteria_text = "\n".join(f"- {c}" for c in criteria)
        judge_prompt = f"""Compare these two responses to the prompt below.

Prompt: {prompt}

Response 1:
{first}

Response 2:
{second}

Criteria:
{criteria_text}

Which response better satisfies the criteria?
Reply with exactly one of: RESPONSE_1, RESPONSE_2, TIE
Then on a new line explain in 1-2 sentences."""

        resp = client.messages.create(
            model=model,
            max_tokens=256,
            system="You are an impartial evaluator. Be concise and consistent.",
            messages=[{"role": "user", "content": judge_prompt}],
        )
        return resp.content[0].text.strip()

    # Run both orderings to mitigate position bias
    ab_result = single_comparison(response_a, response_b)
    ba_result = single_comparison(response_b, response_a)

    # Normalize: in ba_result, "RESPONSE_1" means B won
    def normalize(result: str, flipped: bool) -> JudgeVerdict:
        first_line = result.splitlines()[0]
        if "TIE" in first_line:
            return JudgeVerdict.TIE
        if "RESPONSE_1" in first_line:
            return JudgeVerdict.B_BETTER if flipped else JudgeVerdict.A_BETTER
        return JudgeVerdict.A_BETTER if flipped else JudgeVerdict.B_BETTER

    ab_verdict = normalize(ab_result, flipped=False)
    ba_verdict = normalize(ba_result, flipped=True)

    if ab_verdict == ba_verdict:
        final = ab_verdict
        confidence = "high"
    else:
        final = JudgeVerdict.TIE  # Disagreement → tie
        confidence = "low"

    return {
        "verdict": final.value,
        "confidence": confidence,
        "ab_result": ab_result,
        "ba_result": ba_result,
    }

BDI Mental States

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Belief:
    subject: str
    predicate: str
    object_: Any
    confidence: float = 1.0
    source: str = "observation"

@dataclass
class Desire:
    goal: str
    priority: float  # 0.0 - 1.0
    conditions: list[str] = field(default_factory=list)

@dataclass
class Intention:
    action_plan: list[str]
    committed_to: str  # which desire this serves
    status: str = "pending"  # pending | active | complete | abandoned

class BDIAgent:
    def __init__(self):
        self.beliefs: list[Belief] = []
        self.desires: list[Desire] = []
        self.intentions: list[Intention] = []

    def perceive(self, rdf_triples: list[tuple]) -> None:
        """Convert RDF context into beliefs."""
        for subject, predicate, obj in rdf_triples:
            self.beliefs.append(Belief(
                subject=subject,
                predicate=predicate,
                object_=obj,
            ))

    def deliberate(self) -> Desire | None:
        """Select highest-priority achievable desire."""
        achievable = [
            d for d in self.desires
            if self._conditions_met(d.conditions)
        ]
        if not achievable:
            return None
        return max(achievable, key=lambda d: d.priority)

    def plan(self, desire: Desire) -> Intention:
        """Generate action plan for a desire."""
        # In production: call LLM to generate plan
        steps = [f"Execute step for: {desire.goal}"]
        intention = Intention(
            action_plan=steps,
            committed_to=desire.goal,
        )
        self.intentions.append(intention)
        return intention

    def _conditions_met(self, conditions: list[str]) -> bool:
        belief_strings = {
            f"{b.subject}:{b.predicate}:{b.object_}" for b in self.beliefs
        }
        return all(c in belief_strings for c in conditions)

    def as_context_block(self) -> str:
        """Serialize mental state for injection into LLM context."""
        beliefs_text = "\n".join(
            f"  - {b.subject} {b.predicate} {b.object_} (conf={b.confidence})"
            for b in self.beliefs[-10:]
        )
        desires_text = "\n".join(
            f"  - [{b.priority:.1f}] {b.goal}" for b in self.desires
        )
        intentions_text = "\n".join(
            f"  - {i.committed_to}: {i.status}" for i in self.intentions
        )
        return f"""BDI MENTAL STATE
Beliefs (recent):
{beliefs_text}
Desires:
{desires_text}
Intentions:
{intentions_text}
"""

Filesystem Context Pattern

from pathlib import Path
import json

# Use filesystem as infinite context extension
class FilesystemContext:
    def __init__(self, workspace: str = ".agent_workspace"):
        self.workspace = Path(workspace)
        self.workspace.mkdir(exist_ok=True)

    def offload(self, key: str, data: Any) -> str:
        """Write large data to file, return reference string for context."""
        path = self.workspace / f"{key}.json"
        path.write_text(json.dumps(data, indent=2))
        size = len(json.dumps(data))
        return f"[FILE_REF:{key}] ({size} bytes) → {path}"

    def load(self, key: str) -> Any:
        """Load previously offloaded data."""
        path = self.workspace / f"{key}.json"
        return json.loads(path.read_text())

    def list_available(self) -> str:
        """Let agent discover what context is available."""
        files = list(self.workspace.glob("*.json"))
        if not files:
            return "No context files available."
        lines = []
        for f in files:
            size = f.stat().st_size
            lines.append(f"  - {f.stem}: {size} bytes")
        return "AVAILABLE CONTEXT FILES:\n" + "\n".join(lines)

    def write_plan(self, plan: list[str]) -> str:
        """Persist agent plan so it survives context resets."""
        return self.offload("current_plan", {"steps": plan, "current": 0})

    def tick_plan(self) -> str | None:
        """Advance to next step, return current step or None if done."""
        data = self.load("current_plan")
        idx = data["current"]
        if idx >= len(data["steps"]):
            return None
        data["current"] += 1
        self.offload("current_plan", data)
        return data["steps"][idx]

Skill Trigger Reference

Skill	Activate When User Says
`context-fundamentals`	"explain context windows", "design agent architecture"
`context-degradation`	"diagnose context problems", "fix lost-in-middle", "debug agent failures"
`context-compression`	"compress context", "summarize conversation", "reduce token usage"
`context-optimization`	"optimize context", "reduce token costs", "implement KV-cache"
`multi-agent-patterns`	"design multi-agent system", "implement supervisor pattern"
`memory-systems`	"implement agent memory", "build knowledge graph", "track entities"
`tool-design`	"design agent tools", "reduce tool complexity", "implement MCP tools"
`filesystem-context`	"offload context to files", "agent scratch pad", "file-based context"
`hosted-agents`	"build background agent", "sandboxed execution", "multiplayer agent"
`evaluation`	"evaluate agent performance", "build test framework", "measure quality"
`advanced-evaluation`	"implement LLM-as-judge", "compare model outputs", "mitigate bias"
`project-development`	"start LLM project", "design batch pipeline", "evaluate task-model fit"
`bdi-mental-states`	"model agent mental states", "implement BDI architecture", "transform RDF to beliefs"

Common Patterns

Progressive Context Loading

# Only load full skill content when triggered — saves tokens on every request
class SkillLoader:
    def __init__(self, skills_dir: str = "skills"):
        self.skills_dir = Path(skills_dir)
        self._index = None

    def get_index(self) -> str:
        """Load lightweight index (names + one-line descriptions only)."""
        if self._index:
            return self._index
        skills = []
        for skill_dir in self.skills_dir.iterdir():
            readme = skill_dir / "README.md"
            if readme.exists():
                first_line = readme.read_text().splitlines()[0]
                skills.append(f"- {skill_dir.name}: {first_line}")
        self._index = "\n".join(skills)
        return self._index

    def load_skill(self, skill_name: str) -> str:
        """Load full skill content only when needed."""
        skill_file = self.skills_dir / skill_name / "SKILL.md"
        if not skill_file.exists():
            raise FileNotFoundError(f"Skill not found: {skill_name}")
        return skill_file.read_text()

Token Budget Enforcement

import tiktoken

def enforce_budget(
    content: str,
    max_tokens: int,
    model: str = "gpt-4o",
    strategy: str = "truncate_middle",
) -> str:
    """
    Ensure content fits within token budget.
    Strategies: truncate_end | truncate_middle | summarize
    """
    enc = tiktoken.encoding_for_model(model)
    tokens = enc.encode(content)

    if len(tokens) <= max_tokens:
        return content

    if strategy == "truncate_end":
        return enc.decode(tokens[:max_tokens])

    if strategy == "truncate_middle":
        keep = max_tokens // 2
        start = enc.decode(tokens[:keep])
        end = enc.decode(tokens[-keep:])
        return f"{start}\n\n[... {len(tokens) - max_tokens} tokens truncated ...]\n\n{end}"

    raise ValueError(f"Unknown strategy: {strategy}")

Troubleshooting

Agent loses track of earlier decisions

Cause: Message history too long, decisions buried in the middle. Fix: Use AgentMemory to extract decisions into persistent JSONL; inject only the decision log at context start.

Tool calls return too much data

Cause: Tool output floods the context window. Fix: Use FilesystemContext.offload() and return file references; agent reads only what it needs.

Multi-agent handoffs lose context

Cause: Raw message history passed between agents. Fix: Use create_handoff_context() — structured summaries only, never raw history.

LLM-as-Judge gives inconsistent verdicts

Cause: Position bias (model prefers whichever response appears first). Fix: Use pairwise_judge() which runs A/B and B/A, resolves disagreements as ties.

Agent ignores early instructions

Cause: Instructions in the middle of a long system prompt. Fix: Place critical constraints at the top and bottom of the system prompt; use U-shaped placement.

Context grows unbounded in long sessions

Cause: No compression strategy; messages accumulate. Fix: Run compress_conversation() every N turns; keep the last 10 messages verbatim, summarize the rest.

Environment Variables

ANTHROPIC_API_KEY=          # Required for Claude API calls
OPENAI_API_KEY=             # Optional, for OpenAI-based evaluation
AGENT_WORKSPACE_DIR=        # Optional, filesystem context directory (default: .agent_workspace)
AGENT_MEMORY_PATH=          # Optional, JSONL memory file path (default: agent_memory.jsonl)

References

Repository
Cursor Plugin Directory
Cited in: Meta Context Engineering via Agentic Skill Evolution — Peking University (2026)
Open Plugins Standard

Similar Skills

cache-components

139.2k

mcp-builder

9 files

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

anthropics-skills-13

124.2k

canvas-design

20 files

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

anthropics-skills-13

124.2k

Stats

Stars36

Forks8

Last CommitMar 18, 2026

Actions

View Source View Plugin View on GitHub View README

Cursor

Listed on Cursor Plugin Directory. Install via the Cursor plugin panel or reference .plugin/plugin.json directly.

Manual / Custom Agent

Clone and reference skill files directly:

git clone https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering.git

Load skill content from skills/<skill-name>/SKILL.md into your agent's system prompt or context.

Plugin Bundles

Plugin	Skills Included
`context-engineering-fundamentals`	context-fundamentals, context-degradation, context-compression, context-optimization
`agent-architecture`	multi-agent-patterns, memory-systems, tool-design, filesystem-context, hosted-agents
`agent-evaluation`	evaluation, advanced-evaluation
`agent-development`	project-development
`cognitive-architecture`	bdi-mental-states

Repository Structure

Agent-Skills-for-Context-Engineering/
├── .plugin/
│   └── plugin.json              # Open Plugins manifest
├── skills/
│   ├── context-fundamentals/    # Context anatomy, token budgets
│   ├── context-degradation/     # Failure modes and diagnostics
│   ├── context-compression/     # Compression and summarization
│   ├── context-optimization/    # Caching, masking, compaction
│   ├── multi-agent-patterns/    # Orchestrator, peer, hierarchical
│   ├── memory-systems/          # Short/long-term, graph memory
│   ├── tool-design/             # Effective tool construction
│   ├── filesystem-context/      # File-based context offloading
│   ├── hosted-agents/           # Sandboxed background agents
│   ├── evaluation/              # Agent evaluation frameworks
│   ├── advanced-evaluation/     # LLM-as-a-Judge techniques
│   ├── project-development/     # LLM project methodology
│   └── bdi-mental-states/       # BDI cognitive architecture
└── examples/
    ├── digital-brain-skill/     # Personal OS for founders
    ├── x-to-book-system/        # Multi-agent X→book pipeline
    ├── llm-as-judge-skills/     # TypeScript evaluation tools
    └── book-sft-pipeline/       # Style transfer fine-tuning

Core Concepts

Context Window Anatomy

# The five components competing for attention budget
context = {
    "system_prompt": "...",          # Role, instructions, constraints
    "tool_definitions": [...],       # Available tools and schemas
    "retrieved_documents": [...],    # RAG results, memory lookups
    "message_history": [...],        # Conversation turns
    "tool_outputs": [...],           # Results from tool calls
}

# Token budget allocation example
TOTAL_BUDGET = 128_000  # tokens
budget = {
    "system_prompt":      2_000,   # 1.6%  — keep tight
    "tool_definitions":   5_000,   # 3.9%  — prune unused tools
    "retrieved_documents":40_000,  # 31%   — highest ROI
    "message_history":   70_000,   # 55%   — compress aggressively
    "tool_outputs":      11_000,   # 8.5%  — offload to filesystem
}

Context Degradation Patterns

# Pattern 1: Lost-in-the-middle
# Critical information placed in the center of a long context
# degrades recall significantly. Always place key info at edges.

def order_context_for_attention(documents: list[str], query: str) -> list[str]:
    """Place most relevant documents first and last."""
    scored = rank_by_relevance(documents, query)
    n = len(scored)
    ordered = [None] * n
    # High relevance → positions 0 and -1
    for i, doc in enumerate(scored):
        if i % 2 == 0:
            ordered[i // 2] = doc          # fill from front
        else:
            ordered[n - 1 - i // 2] = doc  # fill from back
    return ordered

# Pattern 2: Context poisoning
# Contradictory or stale information causes unpredictable behavior
def validate_context_consistency(facts: list[dict]) -> list[dict]:
    """Remove contradicting or outdated facts before injection."""
    seen_keys = {}
    clean = []
    for fact in sorted(facts, key=lambda f: f["timestamp"], reverse=True):
        key = fact["subject"] + fact["predicate"]
        if key not in seen_keys:
            seen_keys[key] = True
            clean.append(fact)
    return clean

Context Compression

import anthropic

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY env var

def compress_conversation(
    messages: list[dict],
    keep_last_n: int = 10,
    model: str = "claude-opus-4-5",
) -> list[dict]:
    """
    Compress long conversation history into a summary + recent tail.
    Preserves decisions, outcomes, and key entities.
    """
    if len(messages) <= keep_last_n:
        return messages

    to_compress = messages[:-keep_last_n]
    recent = messages[-keep_last_n:]

    summary_prompt = f"""Summarize this conversation segment.
Preserve: decisions made, key entities, open questions, errors encountered.
Discard: pleasantries, repetition, superseded plans.

Conversation:
{format_messages(to_compress)}
"""

    response = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=[{"role": "user", "content": summary_prompt}],
    )

    summary_message = {
        "role": "assistant",
        "content": f"[COMPRESSED HISTORY]\n{response.content[0].text}",
    }

    return [summary_message] + recent


def format_messages(messages: list[dict]) -> str:
    return "\n".join(
        f"{m['role'].upper()}: {m['content']}" for m in messages
    )

Multi-Agent Patterns

# Orchestrator pattern — one agent routes, subagents execute
class OrchestratorAgent:
    def __init__(self, subagents: dict[str, "SubAgent"]):
        self.subagents = subagents
        self.client = anthropic.Anthropic()

    def route(self, task: str) -> str:
        """Determine which subagent handles this task."""
        routing_prompt = f"""Given this task, which specialist should handle it?
Specialists: {list(self.subagents.keys())}
Task: {task}
Reply with only the specialist name."""

        response = self.client.messages.create(
            model="claude-opus-4-5",
            max_tokens=50,
            system="You are a routing agent. Reply with only the specialist name.",
            messages=[{"role": "user", "content": routing_prompt}],
        )
        return response.content[0].text.strip()

    def execute(self, task: str) -> str:
        specialist = self.route(task)
        if specialist not in self.subagents:
            raise ValueError(f"Unknown specialist: {specialist}")
        # Pass minimal context — only what the subagent needs
        return self.subagents[specialist].run(task)


# Context handoff — pass structured summaries, not raw history
def create_handoff_context(completed_work: dict) -> str:
    """Minimal handoff context between agents."""
    return f"""AGENT HANDOFF
Task: {completed_work['task']}
Status: {completed_work['status']}
Key Outputs: {completed_work['outputs']}
Open Questions: {completed_work.get('open_questions', 'None')}
Next Agent Should: {completed_work['next_steps']}
"""

Memory Systems

import json
from pathlib import Path
from datetime import datetime

# Append-only JSONL memory — agent-friendly, auditable
class AgentMemory:
    def __init__(self, path: str = "agent_memory.jsonl"):
        self.path = Path(path)
        # Schema declaration as first line
        if not self.path.exists():
            self.path.write_text(
                json.dumps({"_schema": "v1", "fields": ["ts", "type", "key", "value"]}) + "\n"
            )

    def remember(self, memory_type: str, key: str, value: str) -> None:
        entry = {
            "ts": datetime.utcnow().isoformat(),
            "type": memory_type,   # "fact" | "decision" | "entity" | "error"
            "key": key,
            "value": value,
        }
        with self.path.open("a") as f:
            f.write(json.dumps(entry) + "\n")

    def recall(self, memory_type: str | None = None, limit: int = 50) -> list[dict]:
        entries = []
        with self.path.open() as f:
            for line in f:
                entry = json.loads(line)
                if "_schema" in entry:
                    continue
                if memory_type is None or entry["type"] == memory_type:
                    entries.append(entry)
        return entries[-limit:]  # most recent N

    def recall_as_context(self, memory_type: str | None = None) -> str:
        entries = self.recall(memory_type)
        if not entries:
            return "No relevant memories."
        lines = [f"[{e['ts']}] {e['type']}/{e['key']}: {e['value']}" for e in entries]
        return "\n".join(lines)


# Usage
memory = AgentMemory()
memory.remember("decision", "database_choice", "PostgreSQL — chosen for JSONB support")
memory.remember("entity", "user_id_format", "UUID v4, stored as TEXT")

# Inject into agent context
context = f"""AGENT MEMORY
{memory.recall_as_context()}
---
"""

Tool Design Principles

# Good tool: single responsibility, structured output, error info included
def search_codebase(
    query: str,
    file_pattern: str = "**/*.py",
    max_results: int = 10,
) -> dict:
    """
    Search codebase for relevant code.

    Returns structured results an agent can parse without hallucination.
    Always include metadata — agents need to know WHERE results came from.
    """
    import glob, re

    results = []
    for filepath in glob.glob(file_pattern, recursive=True):
        try:
            content = Path(filepath).read_text()
            if query.lower() in content.lower():
                # Find line numbers for precise context
                lines = content.splitlines()
                matches = [
                    {"line": i + 1, "text": line}
                    for i, line in enumerate(lines)
                    if query.lower() in line.lower()
                ]
                results.append({
                    "file": filepath,
                    "match_count": len(matches),
                    "matches": matches[:3],  # Top 3 per file
                })
        except (UnicodeDecodeError, PermissionError):
            pass

    return {
        "query": query,
        "total_files_matched": len(results),
        "results": results[:max_results],
        "truncated": len(results) > max_results,
    }


# Tool output offloading — don't bloat context with large outputs
def run_with_file_output(tool_fn, args: dict, output_path: str) -> str:
    """
    Run a tool and write output to file instead of returning to context.
    Returns a file reference the agent can selectively read.
    """
    result = tool_fn(**args)
    Path(output_path).write_text(json.dumps(result, indent=2))
    return f"[OUTPUT SAVED: {output_path}] — {len(str(result))} chars. Read with read_file('{output_path}')."

LLM-as-Judge Evaluation

import anthropic
from enum import Enum

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY

class JudgeVerdict(Enum):
    A_BETTER = "A"
    B_BETTER = "B"
    TIE = "TIE"

def pairwise_judge(
    prompt: str,
    response_a: str,
    response_b: str,
    criteria: list[str],
    model: str = "claude-opus-4-5",
) -> dict:
    """
    Compare two responses with position bias mitigation.
    Runs A/B and B/A then averages to cancel order effects.
    """

    def single_comparison(first: str, second: str) -> str:
        criteria_text = "\n".join(f"- {c}" for c in criteria)
        judge_prompt = f"""Compare these two responses to the prompt below.

Prompt: {prompt}

Response 1:
{first}

Response 2:
{second}

Criteria:
{criteria_text}

Which response better satisfies the criteria?
Reply with exactly one of: RESPONSE_1, RESPONSE_2, TIE
Then on a new line explain in 1-2 sentences."""

        resp = client.messages.create(
            model=model,
            max_tokens=256,
            system="You are an impartial evaluator. Be concise and consistent.",
            messages=[{"role": "user", "content": judge_prompt}],
        )
        return resp.content[0].text.strip()

    # Run both orderings to mitigate position bias
    ab_result = single_comparison(response_a, response_b)
    ba_result = single_comparison(response_b, response_a)

    # Normalize: in ba_result, "RESPONSE_1" means B won
    def normalize(result: str, flipped: bool) -> JudgeVerdict:
        first_line = result.splitlines()[0]
        if "TIE" in first_line:
            return JudgeVerdict.TIE
        if "RESPONSE_1" in first_line:
            return JudgeVerdict.B_BETTER if flipped else JudgeVerdict.A_BETTER
        return JudgeVerdict.A_BETTER if flipped else JudgeVerdict.B_BETTER

    ab_verdict = normalize(ab_result, flipped=False)
    ba_verdict = normalize(ba_result, flipped=True)

    if ab_verdict == ba_verdict:
        final = ab_verdict
        confidence = "high"
    else:
        final = JudgeVerdict.TIE  # Disagreement → tie
        confidence = "low"

    return {
        "verdict": final.value,
        "confidence": confidence,
        "ab_result": ab_result,
        "ba_result": ba_result,
    }

BDI Mental States

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Belief:
    subject: str
    predicate: str
    object_: Any
    confidence: float = 1.0
    source: str = "observation"

@dataclass
class Desire:
    goal: str
    priority: float  # 0.0 - 1.0
    conditions: list[str] = field(default_factory=list)

@dataclass
class Intention:
    action_plan: list[str]
    committed_to: str  # which desire this serves
    status: str = "pending"  # pending | active | complete | abandoned

class BDIAgent:
    def __init__(self):
        self.beliefs: list[Belief] = []
        self.desires: list[Desire] = []
        self.intentions: list[Intention] = []

    def perceive(self, rdf_triples: list[tuple]) -> None:
        """Convert RDF context into beliefs."""
        for subject, predicate, obj in rdf_triples:
            self.beliefs.append(Belief(
                subject=subject,
                predicate=predicate,
                object_=obj,
            ))

    def deliberate(self) -> Desire | None:
        """Select highest-priority achievable desire."""
        achievable = [
            d for d in self.desires
            if self._conditions_met(d.conditions)
        ]
        if not achievable:
            return None
        return max(achievable, key=lambda d: d.priority)

    def plan(self, desire: Desire) -> Intention:
        """Generate action plan for a desire."""
        # In production: call LLM to generate plan
        steps = [f"Execute step for: {desire.goal}"]
        intention = Intention(
            action_plan=steps,
            committed_to=desire.goal,
        )
        self.intentions.append(intention)
        return intention

    def _conditions_met(self, conditions: list[str]) -> bool:
        belief_strings = {
            f"{b.subject}:{b.predicate}:{b.object_}" for b in self.beliefs
        }
        return all(c in belief_strings for c in conditions)

    def as_context_block(self) -> str:
        """Serialize mental state for injection into LLM context."""
        beliefs_text = "\n".join(
            f"  - {b.subject} {b.predicate} {b.object_} (conf={b.confidence})"
            for b in self.beliefs[-10:]
        )
        desires_text = "\n".join(
            f"  - [{b.priority:.1f}] {b.goal}" for b in self.desires
        )
        intentions_text = "\n".join(
            f"  - {i.committed_to}: {i.status}" for i in self.intentions
        )
        return f"""BDI MENTAL STATE
Beliefs (recent):
{beliefs_text}
Desires:
{desires_text}
Intentions:
{intentions_text}
"""

Filesystem Context Pattern

from pathlib import Path
import json

# Use filesystem as infinite context extension
class FilesystemContext:
    def __init__(self, workspace: str = ".agent_workspace"):
        self.workspace = Path(workspace)
        self.workspace.mkdir(exist_ok=True)

    def offload(self, key: str, data: Any) -> str:
        """Write large data to file, return reference string for context."""
        path = self.workspace / f"{key}.json"
        path.write_text(json.dumps(data, indent=2))
        size = len(json.dumps(data))
        return f"[FILE_REF:{key}] ({size} bytes) → {path}"

    def load(self, key: str) -> Any:
        """Load previously offloaded data."""
        path = self.workspace / f"{key}.json"
        return json.loads(path.read_text())

    def list_available(self) -> str:
        """Let agent discover what context is available."""
        files = list(self.workspace.glob("*.json"))
        if not files:
            return "No context files available."
        lines = []
        for f in files:
            size = f.stat().st_size
            lines.append(f"  - {f.stem}: {size} bytes")
        return "AVAILABLE CONTEXT FILES:\n" + "\n".join(lines)

    def write_plan(self, plan: list[str]) -> str:
        """Persist agent plan so it survives context resets."""
        return self.offload("current_plan", {"steps": plan, "current": 0})

    def tick_plan(self) -> str | None:
        """Advance to next step, return current step or None if done."""
        data = self.load("current_plan")
        idx = data["current"]
        if idx >= len(data["steps"]):
            return None
        data["current"] += 1
        self.offload("current_plan", data)
        return data["steps"][idx]

Skill Trigger Reference

Skill	Activate When User Says
`context-fundamentals`	"explain context windows", "design agent architecture"
`context-degradation`	"diagnose context problems", "fix lost-in-middle", "debug agent failures"
`context-compression`	"compress context", "summarize conversation", "reduce token usage"
`context-optimization`	"optimize context", "reduce token costs", "implement KV-cache"
`multi-agent-patterns`	"design multi-agent system", "implement supervisor pattern"
`memory-systems`	"implement agent memory", "build knowledge graph", "track entities"
`tool-design`	"design agent tools", "reduce tool complexity", "implement MCP tools"
`filesystem-context`	"offload context to files", "agent scratch pad", "file-based context"
`hosted-agents`	"build background agent", "sandboxed execution", "multiplayer agent"
`evaluation`	"evaluate agent performance", "build test framework", "measure quality"
`advanced-evaluation`	"implement LLM-as-judge", "compare model outputs", "mitigate bias"
`project-development`	"start LLM project", "design batch pipeline", "evaluate task-model fit"
`bdi-mental-states`	"model agent mental states", "implement BDI architecture", "transform RDF to beliefs"

Common Patterns

Progressive Context Loading

# Only load full skill content when triggered — saves tokens on every request
class SkillLoader:
    def __init__(self, skills_dir: str = "skills"):
        self.skills_dir = Path(skills_dir)
        self._index = None

    def get_index(self) -> str:
        """Load lightweight index (names + one-line descriptions only)."""
        if self._index:
            return self._index
        skills = []
        for skill_dir in self.skills_dir.iterdir():
            readme = skill_dir / "README.md"
            if readme.exists():
                first_line = readme.read_text().splitlines()[0]
                skills.append(f"- {skill_dir.name}: {first_line}")
        self._index = "\n".join(skills)
        return self._index

    def load_skill(self, skill_name: str) -> str:
        """Load full skill content only when needed."""
        skill_file = self.skills_dir / skill_name / "SKILL.md"
        if not skill_file.exists():
            raise FileNotFoundError(f"Skill not found: {skill_name}")
        return skill_file.read_text()

Token Budget Enforcement

import tiktoken

def enforce_budget(
    content: str,
    max_tokens: int,
    model: str = "gpt-4o",
    strategy: str = "truncate_middle",
) -> str:
    """
    Ensure content fits within token budget.
    Strategies: truncate_end | truncate_middle | summarize
    """
    enc = tiktoken.encoding_for_model(model)
    tokens = enc.encode(content)

    if len(tokens) <= max_tokens:
        return content

    if strategy == "truncate_end":
        return enc.decode(tokens[:max_tokens])

    if strategy == "truncate_middle":
        keep = max_tokens // 2
        start = enc.decode(tokens[:keep])
        end = enc.decode(tokens[-keep:])
        return f"{start}\n\n[... {len(tokens) - max_tokens} tokens truncated ...]\n\n{end}"

    raise ValueError(f"Unknown strategy: {strategy}")

Troubleshooting

Agent loses track of earlier decisions

Cause: Message history too long, decisions buried in the middle. Fix: Use AgentMemory to extract decisions into persistent JSONL; inject only the decision log at context start.

Tool calls return too much data

Cause: Tool output floods the context window. Fix: Use FilesystemContext.offload() and return file references; agent reads only what it needs.

Multi-agent handoffs lose context

Cause: Raw message history passed between agents. Fix: Use create_handoff_context() — structured summaries only, never raw history.

LLM-as-Judge gives inconsistent verdicts

Cause: Position bias (model prefers whichever response appears first). Fix: Use pairwise_judge() which runs A/B and B/A, resolves disagreements as ties.

Agent ignores early instructions

Cause: Instructions in the middle of a long system prompt. Fix: Place critical constraints at the top and bottom of the system prompt; use U-shaped placement.

Context grows unbounded in long sessions

Cause: No compression strategy; messages accumulate. Fix: Run compress_conversation() every N turns; keep the last 10 messages verbatim, summarize the rest.

Environment Variables

ANTHROPIC_API_KEY=          # Required for Claude API calls
OPENAI_API_KEY=             # Optional, for OpenAI-based evaluation
AGENT_WORKSPACE_DIR=        # Optional, filesystem context directory (default: .agent_workspace)
AGENT_MEMORY_PATH=          # Optional, JSONL memory file path (default: agent_memory.jsonl)

References

Repository
Cursor Plugin Directory
Cited in: Meta Context Engineering via Agentic Skill Evolution — Peking University (2026)
Open Plugins Standard