Optimize Claude Code context usage through monitoring, reduction strategies, progressive disclosure, planning/execution separation, and file-based optimization. Task-based operations for context window management, token efficiency, and maintaining conversation quality. Use when managing token costs, optimizing context usage, preventing context overflow, or improving multi-turn conversation quality.
/plugin marketplace add adaptationio/Skrillz/plugin install skrillz@skrillzThis skill is limited to using the following tools:
README.mdcontext-engineering provides systematic strategies for optimizing Claude Code context window usage. It helps you monitor token consumption, reduce context load, design context-efficient skills, and apply proven optimization patterns.
Purpose: Maximize Claude Code effectiveness while managing token costs and maintaining conversation quality
The 5 Context Optimization Operations:
Key Benefits:
Context Window Sizes (2025):
Use context-engineering when:
/context command)Purpose: Track token consumption, identify context-heavy elements, and detect optimization opportunities
When to Use This Operation:
Process:
Check Current Context Usage
Use /context command in Claude Code to view:
- Total tokens used
- Percentage of context window
- Files loaded
- Recent tool calls
Identify Heavy Consumers
Analyze Usage Patterns
Document Baseline
Set Optimization Goals
Validation Checklist:
Outputs:
Time Estimate: 10-15 minutes
Example:
Context Usage Analysis
======================
Current Usage: 145,000 tokens (72% of 200k window)
Heavy Consumers:
1. CLAUDE.md: 25,000 tokens (17%)
2. Large skill files: 40,000 tokens (28%)
- planning-architect/SKILL.md: 15,000 tokens
- development-workflow/common-patterns.md: 12,000 tokens
- review-multi/scoring-rubric.md: 8,000 tokens
3. Conversation history: 30,000 tokens (21%)
4. Tool call results: 20,000 tokens (14%)
Optimization Opportunities:
- Split large CLAUDE.md (25k → 10k target)
- Use references/ loading instead of full files (40k → 15k)
- Clear old tool results (20k → 5k)
Target: Reduce to ~100k tokens (50% of window, 31% reduction)
Purpose: Remove stale content, minimize loaded files, and reduce token consumption
When to Use This Operation:
Process:
Remove Stale Tool Results
Minimize File Loading
Optimize Conversation History
Reduce CLAUDE.md Size
Apply Progressive Loading
Validation Checklist:
Outputs:
Time Estimate: 15-30 minutes
Example Reduction:
Before Optimization: 145,000 tokens (72%)
Actions Taken:
1. Cleared 50 old tool results: -15,000 tokens
2. Unloaded 3 large files no longer needed: -18,000 tokens
3. Optimized CLAUDE.md (split to CLAUDE.local.md): -12,000 tokens
4. Used references/ loading instead of full files: -25,000 tokens
After Optimization: 75,000 tokens (37%)
Reduction: 70,000 tokens (48% reduction)
Quality Impact: None - relevant context maintained
Purpose: Design context-efficient skills using progressive disclosure, lazy loading, and token-aware architecture
When to Use This Operation:
Process:
Apply Progressive Disclosure
Token Impact: 70-80% reduction vs monolithic (5k vs 20k+ tokens)
Design for Lazy Loading
references/structure-review-guide.md not entire review-multiOptimize File Sizes
Use Token-Efficient Formats
Consider Context Budget
Validation Checklist:
Outputs:
Time Estimate: 20-40 minutes (during planning phase)
Example:
Skill Design: api-integration
Token Budget Analysis:
- SKILL.md: 900 lines → ~3,000 tokens
- references/ (3 files):
- api-guide.md: 400 lines → ~1,300 tokens
- auth-patterns.md: 350 lines → ~1,200 tokens
- examples.md: 300 lines → ~1,000 tokens
- README.md: 300 lines → ~1,000 tokens
Total if all loaded: ~7,500 tokens
Typical usage: SKILL.md only → 3,000 tokens (60% savings)
With 1 reference: 3,000 + 1,200 → 4,200 tokens (44% savings)
Progressive Disclosure Impact:
- Monolithic (all in SKILL.md): ~7,500 tokens always loaded
- Progressive (SKILL.md + on-demand refs): 3,000-4,500 tokens typical
- Token Savings: 40-60% depending on usage
Design: ✅ Context-efficient with progressive disclosure
Purpose: Keep execution context clean by separating exploratory planning from focused implementation
When to Use This Operation:
Process:
Planning Phase (Separate Session)
Characteristics: High context usage, exploratory, broad
Execution Phase (Fresh Session)
Characteristics: Clean context, focused, efficient
Session Transition
Maintain Clean Execution Context
Validation Checklist:
Outputs:
Time Estimate: Planning decision (0-5 min), session management (as needed)
Example:
Planning Session (Context: 150k tokens, 75% usage):
- Explored 20 files for research
- Analyzed patterns across codebase
- Made architecture decisions
- Created detailed plan
- Output: skill-plan.md (comprehensive)
[End session, save plan]
Execution Session (Context: 30k tokens, 15% usage):
- Load: CLAUDE.md + skill-plan.md + development-workflow
- Context: Clean, focused, 30k tokens
- Implementation: Follow plan, build skill
- Load references as needed (not all at once)
Result: 80% context reduction (150k → 30k)
Quality: Higher (clean context, focused work)
Research Finding: "Separating planning from execution keeps implementation context clean" - confirmed by 2025 best practices
Purpose: Externalize large data to temporary files for on-demand analysis, achieving 95% token savings
When to Use This Operation:
Process:
Identify Large Data
Externalize to Files
# Save large data to temp file
echo "large data here" > /tmp/large-data.txt
Instead of keeping in conversation context
Reference File Instead of Content
Large dataset saved to: /tmp/analysis-data.json (50,000 tokens)
To analyze: Read /tmp/analysis-data.json when needed
Token Impact: 50,000 tokens → ~500 tokens (95% reduction)
Load On-Demand
Clean Up Temp Files
Validation Checklist:
Outputs:
Time Estimate: 10-20 minutes (setup and management)
Example:
Scenario: Analyzing large log file (100,000 tokens)
Before Optimization:
- Full log in conversation context: 100,000 tokens
- Context usage: 50% just for log data
After Optimization:
1. Save log to /tmp/app-log.txt
2. Reference in context: "Log saved to /tmp/app-log.txt (100k tokens)"
3. Read specific sections when needed:
- Read first 50 lines for overview
- Grep for errors
- Read relevant sections on-demand
Token Usage:
- Before: 100,000 tokens in context
- After: ~500 tokens (file reference) + ~2,000 tokens (specific reads)
- Savings: 97,500 tokens (97.5% reduction)
Quality: Maintained - can still analyze log on-demand
Access: Full log available when needed
Research Finding: "File-based approach achieves 95% token savings" - proven in 2025 optimization studies
Practice: Focus on relevant, high-quality context rather than loading everything
Rationale: Every piece should be current, accurate, and directly relevant to task
Application: Before loading file, ask: "Do I need this right now for current task?"
Practice: Design all skills with SKILL.md + references/ pattern
Rationale: 70-80% token reduction vs monolithic design
Application: SKILL.md <1,200 lines, details in references/ loaded on-demand
Practice: Check context usage periodically, especially in long sessions
Rationale: Auto-compaction triggers at ~80%, but proactive monitoring prevents drift
Application: Check /context every 30-60 minutes in active development
Practice: Externalize data >5,000 tokens to files
Rationale: 95% token savings while maintaining accessibility
Application: Save to /tmp/, reference file path, load on-demand
Practice: Plan in one session, execute in clean session with plan artifacts
Rationale: Keeps execution context focused, prevents exploratory noise
Application: When planning >1 hour, start fresh session for implementation
Practice: Reduce context load before important/complex operations
Rationale: Clean context improves quality and performance
Application: Before complex implementation, clear unnecessary context
Practice: Choose tools that minimize context usage
Rationale: Some tools add more context than others
Application:
Practice: Keep CLAUDE.md concise (<5,000 tokens), split if needed
Rationale: CLAUDE.md loaded every session, large files waste context
Application:
Structure:
Token Breakdown:
Example: format-validator, simple helpers
Structure:
Token Breakdown:
Progressive Loading: Load SKILL.md (3-5k) + specific reference when needed (+1.5k) = 4.5-6.5k typical
Example: prompt-builder, skill-researcher
Structure:
Token Breakdown:
Progressive Loading: Load SKILL.md (4-6k) + 1-2 references as needed (+2-4k) = 6-10k typical
Example: review-multi, testing-validator
Key: Even complex skills only load 6-10k tokens typically (not full 40-60k)
Symptom: Context quickly fills with all skill files
Cause: Reading all files instead of progressive loading
Fix: Load SKILL.md first, load references/ only when needed for specific operations
Prevention: Follow progressive disclosure pattern
Symptom: Unexpected context overflow, performance degradation
Cause: No visibility into token usage
Fix: Check /context regularly, monitor usage patterns
Prevention: Check context every 30-60 min in active sessions
Symptom: Every session starts with 20k-50k tokens used
Cause: Putting everything in CLAUDE.md
Fix: Split CLAUDE.md - essentials only, separate files for detailed knowledge
Prevention: Keep CLAUDE.md <5,000 tokens, use multiple files
Symptom: Context bloated with old exploratory results
Cause: Not clearing old tool calls
Fix: Context editing auto-clears, but can manually manage by starting fresh sessions
Prevention: Fresh session for major transitions (planning → execution)
Symptom: Large data sets consuming 30-50% of context
Cause: Keeping large outputs in conversation
Fix: Save to /tmp/ files, reference file path, load on-demand
Prevention: Any data >5,000 tokens → externalize to file
Symptom: Skills with 2,000+ line SKILL.md files
Cause: Not using references/ for detailed content
Fix: Extract detailed content to references/, keep SKILL.md as overview
Prevention: Design with progressive disclosure from start (use planning-architect)
| Model | Standard | Beta (Tier 4) | Auto-Compact |
|---|---|---|---|
| Sonnet 4/4.5 | 200k tokens | 500k-1M tokens | ~80% (~160k for 200k) |
| Strategy | Token Savings | Application |
|---|---|---|
| Progressive Disclosure | 70-80% | SKILL.md + references/ vs monolithic |
| File-Based Externalization | 95% | Large data >5k tokens to /tmp/ files |
| Context Editing | 29-39% | Auto-clears stale content |
| Lazy Loading | 60-70% | Load references on-demand vs all upfront |
| Optimized CLAUDE.md | Variable | Keep <5k tokens vs 20-50k bloat |
| Skill Complexity | Total Tokens | Typical Load | Progressive Load |
|---|---|---|---|
| Simple | 5k-10k | 3k-4k | SKILL.md only |
| Medium | 15k-30k | 4k-6k | SKILL.md + 1 reference |
| Complex | 40k-60k | 6k-10k | SKILL.md + 2 references |
| Usage % | Status | Action |
|---|---|---|
| <50% | ✅ Healthy | Normal operation |
| 50-70% | ⚠️ Monitor | Check periodically, plan optimization |
| 70-80% | ⚠️ Optimize | Reduce context load soon |
| >80% | ❌ Critical | Immediate optimization needed (auto-compact triggers) |
Is context >70%?
├─ Yes → Reduce immediately (Operation 2)
│ ├─ Clear stale tool results
│ ├─ Unload unnecessary files
│ └─ Start fresh session if needed
│
└─ No → Preventive optimization
├─ Is data >5k tokens? → File-based (Operation 5)
├─ Building skills? → Progressive disclosure (Operation 3)
└─ Long session? → Consider planning/execution split (Operation 4)
Immediate (When context >80%):
1. Check usage: /context command
2. Clear old tool results (context editing helps)
3. Start fresh session with essentials only
4. Load plan docs, not exploration history
Preventive (During development):
1. Design skills with progressive disclosure
2. Use file-based for large data (>5k tokens)
3. Monitor context every 30-60 min
4. Separate planning from execution (for complex work)
# Monitor context usage
/context
# Read specific lines (not full file)
Read file_path --limit 50
# Search without loading (uses Grep, more efficient)
Grep "pattern" path/
# Find files without loading content
Glob "*.py" path/
# Externalize large data
Bash: command > /tmp/output.txt
# Then reference: "See /tmp/output.txt for results"
Quick Estimation:
context-engineering helps you maximize Claude Code effectiveness through strategic context management, ensuring optimal performance, quality, and cost-efficiency throughout development.
Activates when the user asks about Agent Skills, wants to find reusable AI capabilities, needs to install skills, or mentions skills for Claude. Use for discovering, retrieving, and installing skills.
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.