Context Engineering

Overview

context-engineering provides systematic strategies for optimizing Claude Code context window usage. It helps you monitor token consumption, reduce context load, design context-efficient skills, and apply proven optimization patterns.

Purpose: Maximize Claude Code effectiveness while managing token costs and maintaining conversation quality

The 5 Context Optimization Operations:

Monitor Context Usage - Track token consumption, identify heavy consumers
Reduce Context Load - Remove stale content, minimize loaded files
Optimize Skill Design - Progressive disclosure, efficient reference loading
Separate Planning/Execution - Keep execution context clean
File-Based Strategies - Externalize large data (95% token savings)

Key Benefits:

29-39% performance improvement with context editing strategies
95% token savings using file-based approaches for large data
Sustained quality in multi-turn conversations
Cost reduction through efficient token usage
Prevention of context overflow and conversation drift

Context Window Sizes (2025):

Sonnet 4/4.5: 200k tokens (standard), 500k-1M (beta for Tier 4)
Auto-compaction: Triggers around 80% usage (~160k for 200k window)

When to Use

Use context-engineering when:

Approaching Context Limits - Context usage >60-70%, need to optimize before hitting limits
Building Large Skills - Creating skills with extensive documentation, need efficient loading strategies
Token Cost Management - Reducing API costs through optimization
Multi-Turn Conversations - Maintaining coherence across extended sessions
Skill Design Phase - Planning context-efficient architecture from start
Performance Optimization - Improving response quality and latency
Conversation Quality - Preventing drift and maintaining focus
MCP Integration - Managing context from Model Context Protocol servers
Large Data Handling - Working with extensive datasets or outputs

Prerequisites

Understanding of Claude context windows
Access to context monitoring (Claude Code /context command)
Familiarity with progressive disclosure pattern
Skills under development or optimization

Operations

Operation 1: Monitor Context Usage

Purpose: Track token consumption, identify context-heavy elements, and detect optimization opportunities

When to Use This Operation:

Beginning of optimization effort (baseline measurement)
During development (continuous monitoring)
When approaching context limits (>60-70% usage)
Investigating performance issues

Process:

Check Current Context Usage

Use /context command in Claude Code to view:
- Total tokens used
- Percentage of context window
- Files loaded
- Recent tool calls

Identify Heavy Consumers
- Large files loaded (>5,000 tokens each)
- Extensive conversation history
- Many tool call results
- Large CLAUDE.md files
Analyze Usage Patterns
- Which files are loaded but rarely referenced?
- Are all tool results still relevant?
- Is conversation history necessary?
- Are there duplicate or redundant contexts?
Document Baseline
- Record current token usage
- Note context-heavy elements
- Identify optimization targets
Set Optimization Goals
- Target token reduction (e.g., reduce by 20%)
- Performance improvement targets
- Quality maintenance requirements

Validation Checklist:

Current context usage measured (tokens and percentage)
Context-heavy elements identified (files, history, tool results)
Baseline documented for comparison
Optimization targets set
High-impact optimization opportunities noted

Outputs:

Current context usage metrics
List of context-heavy elements
Baseline measurement
Optimization targets
Priority optimization opportunities

Time Estimate: 10-15 minutes

Example:

Context Usage Analysis
======================
Current Usage: 145,000 tokens (72% of 200k window)

Heavy Consumers:
1. CLAUDE.md: 25,000 tokens (17%)
2. Large skill files: 40,000 tokens (28%)
   - planning-architect/SKILL.md: 15,000 tokens
   - development-workflow/common-patterns.md: 12,000 tokens
   - review-multi/scoring-rubric.md: 8,000 tokens
3. Conversation history: 30,000 tokens (21%)
4. Tool call results: 20,000 tokens (14%)

Optimization Opportunities:
- Split large CLAUDE.md (25k → 10k target)
- Use references/ loading instead of full files (40k → 15k)
- Clear old tool results (20k → 5k)

Target: Reduce to ~100k tokens (50% of window, 31% reduction)

Operation 2: Reduce Context Load

Purpose: Remove stale content, minimize loaded files, and reduce token consumption

When to Use This Operation:

Context usage >70% (approaching limits)
Performance degradation noticed
Before major operations (clear space)
Periodic maintenance (every few hours)

Process:

Remove Stale Tool Results
- Identify old tool call results no longer needed
- Tool results from exploratory work
- Superseded information
Minimize File Loading
- Only load files actively needed
- Use Grep instead of Read for searching
- Load specific sections, not entire files
- Unload files when done with them
Optimize Conversation History
- Context editing auto-clears stale content
- Summarize long conversations if needed
- Start fresh session for new major tasks
Reduce CLAUDE.md Size
- Keep only essential long-term instructions
- Move project-specific details to separate files
- Use CLAUDE.local.md for temporary preferences
- Target: <5,000 tokens for CLAUDE.md
Apply Progressive Loading
- Load overview/index files first
- Load detailed references only when needed
- Use skill references/ on-demand loading

Validation Checklist:

Stale tool results cleared
Only necessary files loaded
CLAUDE.md size optimized (<5,000 tokens if possible)
Progressive loading applied where relevant
Context usage reduced measurably

Outputs:

Reduced token count
Cleaner context window
List of removed/optimized elements
New context usage measurement

Time Estimate: 15-30 minutes

Example Reduction:

Before Optimization: 145,000 tokens (72%)

Actions Taken:
1. Cleared 50 old tool results: -15,000 tokens
2. Unloaded 3 large files no longer needed: -18,000 tokens
3. Optimized CLAUDE.md (split to CLAUDE.local.md): -12,000 tokens
4. Used references/ loading instead of full files: -25,000 tokens

After Optimization: 75,000 tokens (37%)

Reduction: 70,000 tokens (48% reduction)
Quality Impact: None - relevant context maintained

Operation 3: Optimize Skill Design

Purpose: Design context-efficient skills using progressive disclosure, lazy loading, and token-aware architecture

When to Use This Operation:

Planning new skills (design for efficiency from start)
Refactoring existing skills (improve context efficiency)
Building large/complex skills (manage context proactively)
Creating skill ecosystems (coordinate context usage)

Process:

Apply Progressive Disclosure
- SKILL.md: Overview + essentials only (<1,200 lines, ~3,000-5,000 tokens)
- references/: Detailed guides loaded on-demand (300-600 lines each, ~1,000-2,000 tokens)
- scripts/: Automation loaded when needed
Token Impact: 70-80% reduction vs monolithic (5k vs 20k+ tokens)
Design for Lazy Loading
- Separate content into focused reference files
- Each reference file covers one topic
- Load specific reference, not entire skill
- Example: Load references/structure-review-guide.md not entire review-multi
Optimize File Sizes
- SKILL.md target: 800-1,200 lines (2,500-4,000 tokens)
- Reference files: 300-600 lines (1,000-2,000 tokens each)
- Keep files focused and concise
Use Token-Efficient Formats
- Tables instead of prose (higher information density)
- Lists instead of paragraphs (more scannable)
- Code blocks for examples (clear and concise)
- Quick Reference sections (high-density lookup)
Consider Context Budget
- Estimate token usage for skill
- Simple skill: 3,000-8,000 tokens total
- Medium skill: 10,000-20,000 tokens total
- Complex skill: 25,000-40,000 tokens total
- Design within budget

Validation Checklist:

Progressive disclosure applied (SKILL.md + references/)
SKILL.md <1,200 lines (~4,000 tokens or less)
Reference files 300-600 lines each
Files focused on single topics (lazy loadable)
Token-efficient formats used (tables, lists, code blocks)
Estimated total tokens within budget
Skill can be partially loaded (references/ on-demand)

Outputs:

Context-efficient skill design
Token budget estimate
Progressive disclosure plan
Reference file organization

Time Estimate: 20-40 minutes (during planning phase)

Example:

Skill Design: api-integration

Token Budget Analysis:
- SKILL.md: 900 lines → ~3,000 tokens
- references/ (3 files):
  - api-guide.md: 400 lines → ~1,300 tokens
  - auth-patterns.md: 350 lines → ~1,200 tokens
  - examples.md: 300 lines → ~1,000 tokens
- README.md: 300 lines → ~1,000 tokens

Total if all loaded: ~7,500 tokens
Typical usage: SKILL.md only → 3,000 tokens (60% savings)
With 1 reference: 3,000 + 1,200 → 4,200 tokens (44% savings)

Progressive Disclosure Impact:
- Monolithic (all in SKILL.md): ~7,500 tokens always loaded
- Progressive (SKILL.md + on-demand refs): 3,000-4,500 tokens typical
- Token Savings: 40-60% depending on usage

Design: ✅ Context-efficient with progressive disclosure

Operation 4: Separate Planning from Execution

Purpose: Keep execution context clean by separating exploratory planning from focused implementation

When to Use This Operation:

Starting complex development work
Context getting cluttered with exploration
Need clean context for implementation
Before critical/focused work

Process:

Planning Phase (Separate Session)
- Broad codebase exploration
- Research and pattern discovery
- Architecture decisions
- Task breakdown
- Output: Plan documents, task lists
Characteristics: High context usage, exploratory, broad
Execution Phase (Fresh Session)
- Load plan documents (not exploration history)
- Focused implementation
- Specific file operations
- Minimal context bloat
Characteristics: Clean context, focused, efficient
Session Transition
- End planning session when plan complete
- Save planning artifacts (plans, task lists, decisions)
- Start new session for execution
- Load only: plan docs, CLAUDE.md, immediate dependencies
Maintain Clean Execution Context
- Don't re-explore during execution
- Follow plan, don't re-research
- Load files as needed, unload when done
- Keep focus on implementation

Validation Checklist:

Planning and execution in separate sessions (when appropriate)
Planning artifacts saved and documented
Execution session starts with clean context
Only plan docs and essentials loaded for execution
No re-exploration during execution
Context stays focused on current task

Outputs:

Clean execution context
Focused implementation
Reduced context bloat
Better performance and quality

Time Estimate: Planning decision (0-5 min), session management (as needed)

Example:

Planning Session (Context: 150k tokens, 75% usage):
- Explored 20 files for research
- Analyzed patterns across codebase
- Made architecture decisions
- Created detailed plan
- Output: skill-plan.md (comprehensive)

[End session, save plan]

Execution Session (Context: 30k tokens, 15% usage):
- Load: CLAUDE.md + skill-plan.md + development-workflow
- Context: Clean, focused, 30k tokens
- Implementation: Follow plan, build skill
- Load references as needed (not all at once)

Result: 80% context reduction (150k → 30k)
Quality: Higher (clean context, focused work)

Research Finding: "Separating planning from execution keeps implementation context clean" - confirmed by 2025 best practices

Operation 5: File-Based Optimization

Purpose: Externalize large data to temporary files for on-demand analysis, achieving 95% token savings

When to Use This Operation:

Handling large data sets (>5,000 tokens)
Processing extensive outputs (logs, reports)
Working with large MCP responses
Managing generated content

Process:

Identify Large Data
- Data >5,000 tokens (typically >10,000 characters)
- Repeated reference to same large content
- Extensive generated outputs
- Large MCP server responses

Externalize to Files

# Save large data to temp file
echo "large data here" > /tmp/large-data.txt

Instead of keeping in conversation context

Reference File Instead of Content

Large dataset saved to: /tmp/analysis-data.json (50,000 tokens)

To analyze: Read /tmp/analysis-data.json when needed

Token Impact: 50,000 tokens → ~500 tokens (95% reduction)

Load On-Demand
- Read file only when specific analysis needed
- Process in chunks if necessary
- Don't keep full content in context
Clean Up Temp Files
- Remove temp files when no longer needed
- Don't accumulate unused files

Validation Checklist:

Large data identified (>5,000 tokens)
Data externalized to files
File paths documented (reference in conversation)
On-demand loading used (not full content in context)
Token savings measured (before/after)
Quality maintained (can access data when needed)
Temp files cleaned up when done

Outputs:

Externalized data files
File path references
Significant token savings (often 90-95%)
Maintained data accessibility

Time Estimate: 10-20 minutes (setup and management)

Example:

Scenario: Analyzing large log file (100,000 tokens)

Before Optimization:
- Full log in conversation context: 100,000 tokens
- Context usage: 50% just for log data

After Optimization:
1. Save log to /tmp/app-log.txt
2. Reference in context: "Log saved to /tmp/app-log.txt (100k tokens)"
3. Read specific sections when needed:
   - Read first 50 lines for overview
   - Grep for errors
   - Read relevant sections on-demand

Token Usage:
- Before: 100,000 tokens in context
- After: ~500 tokens (file reference) + ~2,000 tokens (specific reads)
- Savings: 97,500 tokens (97.5% reduction)

Quality: Maintained - can still analyze log on-demand
Access: Full log available when needed

Research Finding: "File-based approach achieves 95% token savings" - proven in 2025 optimization studies

Best Practices

1. Quality Over Quantity

Practice: Focus on relevant, high-quality context rather than loading everything

Rationale: Every piece should be current, accurate, and directly relevant to task

Application: Before loading file, ask: "Do I need this right now for current task?"

2. Progressive Disclosure Always

Practice: Design all skills with SKILL.md + references/ pattern

Rationale: 70-80% token reduction vs monolithic design

Application: SKILL.md <1,200 lines, details in references/ loaded on-demand

3. Monitor Regularly

Practice: Check context usage periodically, especially in long sessions

Rationale: Auto-compaction triggers at ~80%, but proactive monitoring prevents drift

Application: Check /context every 30-60 minutes in active development

4. File-Based for Large Data

Practice: Externalize data >5,000 tokens to files

Rationale: 95% token savings while maintaining accessibility

Application: Save to /tmp/, reference file path, load on-demand

5. Separate Planning from Execution

Practice: Plan in one session, execute in clean session with plan artifacts

Rationale: Keeps execution context focused, prevents exploratory noise

Application: When planning >1 hour, start fresh session for implementation

6. Clean Context Before Critical Work

Practice: Reduce context load before important/complex operations

Rationale: Clean context improves quality and performance

Application: Before complex implementation, clear unnecessary context

7. Use Appropriate Tools

Practice: Choose tools that minimize context usage

Rationale: Some tools add more context than others

Application:

Grep instead of Read for searching (doesn't load full file)
Glob for finding files (doesn't load content)
Task agents for exploration (separate context)

8. Optimize CLAUDE.md

Practice: Keep CLAUDE.md concise (<5,000 tokens), split if needed

Rationale: CLAUDE.md loaded every session, large files waste context

Application:

Essential standards in CLAUDE.md (project level)
Temporary preferences in CLAUDE.local.md
Specific domain knowledge in separate files loaded as needed

Context Budgets for Skills

Simple Skills (Total: 5,000-10,000 tokens)

Structure:

SKILL.md only or SKILL.md + 1-2 small references
No scripts or minimal automation
~400-800 lines total

Token Breakdown:

SKILL.md: 3,000-4,000 tokens
References (if any): 1,000-2,000 tokens each
README: 1,000 tokens

Example: format-validator, simple helpers

Medium Skills (Total: 15,000-30,000 tokens)

Structure:

SKILL.md + 3-5 references + scripts
~1,500-3,000 lines total

Token Breakdown:

SKILL.md: 3,000-5,000 tokens
References: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
Scripts: 2-3 files × 1,000 tokens = 2,000-3,000 tokens
README: 1,000-1,500 tokens

Progressive Loading: Load SKILL.md (3-5k) + specific reference when needed (+1.5k) = 4.5-6.5k typical

Example: prompt-builder, skill-researcher

Complex Skills (Total: 40,000-60,000 tokens)

Structure:

SKILL.md + 7-10 references + 4+ scripts
~4,000-7,000 lines total

Token Breakdown:

SKILL.md: 4,000-6,000 tokens
References: 7-10 files × 2,000 tokens = 14,000-20,000 tokens
Scripts: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
README: 1,500-2,000 tokens

Progressive Loading: Load SKILL.md (4-6k) + 1-2 references as needed (+2-4k) = 6-10k typical

Example: review-multi, testing-validator

Key: Even complex skills only load 6-10k tokens typically (not full 40-60k)

Common Mistakes

Mistake 1: Loading Everything Upfront

Symptom: Context quickly fills with all skill files

Cause: Reading all files instead of progressive loading

Fix: Load SKILL.md first, load references/ only when needed for specific operations

Prevention: Follow progressive disclosure pattern

Mistake 2: Not Monitoring Context

Symptom: Unexpected context overflow, performance degradation

Cause: No visibility into token usage

Fix: Check /context regularly, monitor usage patterns

Prevention: Check context every 30-60 min in active sessions

Mistake 3: Large Monolithic CLAUDE.md

Symptom: Every session starts with 20k-50k tokens used

Cause: Putting everything in CLAUDE.md

Fix: Split CLAUDE.md - essentials only, separate files for detailed knowledge

Prevention: Keep CLAUDE.md <5,000 tokens, use multiple files

Mistake 4: Keeping Stale Tool Results

Symptom: Context bloated with old exploratory results

Cause: Not clearing old tool calls

Fix: Context editing auto-clears, but can manually manage by starting fresh sessions

Prevention: Fresh session for major transitions (planning → execution)

Mistake 5: Not Using File-Based Strategies

Symptom: Large data sets consuming 30-50% of context

Cause: Keeping large outputs in conversation

Fix: Save to /tmp/ files, reference file path, load on-demand

Prevention: Any data >5,000 tokens → externalize to file

Mistake 6: Ignoring Progressive Disclosure

Symptom: Skills with 2,000+ line SKILL.md files

Cause: Not using references/ for detailed content

Fix: Extract detailed content to references/, keep SKILL.md as overview

Prevention: Design with progressive disclosure from start (use planning-architect)

Quick Reference

Context Window Limits (2025)

Model	Standard	Beta (Tier 4)	Auto-Compact
Sonnet 4/4.5	200k tokens	500k-1M tokens	~80% (~160k for 200k)

Token Savings Strategies

Strategy	Token Savings	Application
Progressive Disclosure	70-80%	SKILL.md + references/ vs monolithic
File-Based Externalization	95%	Large data >5k tokens to /tmp/ files
Context Editing	29-39%	Auto-clears stale content
Lazy Loading	60-70%	Load references on-demand vs all upfront
Optimized CLAUDE.md	Variable	Keep <5k tokens vs 20-50k bloat

Skill Token Budgets

Skill Complexity	Total Tokens	Typical Load	Progressive Load
Simple	5k-10k	3k-4k	SKILL.md only
Medium	15k-30k	4k-6k	SKILL.md + 1 reference
Complex	40k-60k	6k-10k	SKILL.md + 2 references

Context Usage Guidelines

Usage %	Status	Action
<50%	✅ Healthy	Normal operation
50-70%	⚠️ Monitor	Check periodically, plan optimization
70-80%	⚠️ Optimize	Reduce context load soon
>80%	❌ Critical	Immediate optimization needed (auto-compact triggers)

Optimization Decision Tree

Is context >70%?
├─ Yes → Reduce immediately (Operation 2)
│   ├─ Clear stale tool results
│   ├─ Unload unnecessary files
│   └─ Start fresh session if needed
│
└─ No → Preventive optimization
    ├─ Is data >5k tokens? → File-based (Operation 5)
    ├─ Building skills? → Progressive disclosure (Operation 3)
    └─ Long session? → Consider planning/execution split (Operation 4)

Quick Optimization Actions

Immediate (When context >80%):

1. Check usage: /context command
2. Clear old tool results (context editing helps)
3. Start fresh session with essentials only
4. Load plan docs, not exploration history

Preventive (During development):

1. Design skills with progressive disclosure
2. Use file-based for large data (>5k tokens)
3. Monitor context every 30-60 min
4. Separate planning from execution (for complex work)

Common Commands

# Monitor context usage
/context

# Read specific lines (not full file)
Read file_path --limit 50

# Search without loading (uses Grep, more efficient)
Grep "pattern" path/

# Find files without loading content
Glob "*.py" path/

# Externalize large data
Bash: command > /tmp/output.txt
# Then reference: "See /tmp/output.txt for results"

Token Estimation

Quick Estimation:

1 line of code/text ≈ 3-4 tokens (average)
1,000 lines ≈ 3,000-4,000 tokens
Dense prose: 3-3.5 tokens/line
Code with comments: 2.5-3 tokens/line
Tables/lists: 2-2.5 tokens/line (more efficient)

For More Information

Context monitoring: references/context-monitoring-guide.md
Reduction strategies: references/reduction-strategies.md
Optimization patterns: references/optimization-patterns.md
Analysis script: scripts/analyze-context-usage.py

context-engineering helps you maximize Claude Code effectiveness through strategic context management, ensuring optimal performance, quality, and cost-efficiency throughout development.

context-engineering

Context Engineering

Overview

When to Use

Prerequisites

Operations

Operation 1: Monitor Context Usage

Operation 2: Reduce Context Load

Operation 3: Optimize Skill Design

Operation 4: Separate Planning from Execution

Operation 5: File-Based Optimization

Best Practices

1. Quality Over Quantity

2. Progressive Disclosure Always

3. Monitor Regularly

4. File-Based for Large Data

5. Separate Planning from Execution

6. Clean Context Before Critical Work

7. Use Appropriate Tools

8. Optimize CLAUDE.md

Context Budgets for Skills

Simple Skills (Total: 5,000-10,000 tokens)

Medium Skills (Total: 15,000-30,000 tokens)

Complex Skills (Total: 40,000-60,000 tokens)

Common Mistakes

Mistake 1: Loading Everything Upfront

Mistake 2: Not Monitoring Context

Mistake 3: Large Monolithic CLAUDE.md

Mistake 4: Keeping Stale Tool Results

Mistake 5: Not Using File-Based Strategies

Mistake 6: Ignoring Progressive Disclosure

Quick Reference

Context Window Limits (2025)

Token Savings Strategies

Skill Token Budgets

Context Usage Guidelines

Optimization Decision Tree

Quick Optimization Actions

Common Commands

Token Estimation

For More Information

Similar Skills