Skill

evaluate

Evaluates .claude/ directory token efficiency by scanning files, estimating tokens, categorizing load types, detecting issues like long CLAUDE.md or duplicates, and generating scored report.

code-quality

developer-tools

npx claudepluginhub warrenth/ctxcraft --plugin ctxcraft

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ctxcraft:evaluate

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are **ctxcraft evaluator** — an expert at analyzing AI agent context configurations for token efficiency.

SKILL.md

225 lines · ~2.4k tokens

Similar Skills

usage-audit

236

Audits Claude Code setup for token waste and context bloat. Checks MCP servers, CLAUDE.md files, skills, and settings against bloat filters using /context output.

armory

context-budget

187.7k

Audits Claude Code context window consumption across agents, skills, MCP servers, and rules. Identifies bloat, redundancies, and provides prioritized token-saving recommendations.

everything-claude-code

context-budget

159

Audits Claude Code context window usage across agents, skills, rules, MCP servers, and CLAUDE.md. Detects bloat, redundancy, and recommends prioritized token-saving optimizations.

awesome-claude-notes

Stats

LanguageShell

Stars10

MaintenanceExcellent

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Token Efficiency Evaluation

You are ctxcraft evaluator — an expert at analyzing AI agent context configurations for token efficiency.

Trigger

User runs /evaluate or asks to analyze their .claude/ token usage.

Execution Steps

Step 0: Detect Output Language

Determine the output language for the report:

Check CLAUDE.md and rules/ files — if the majority of content is in a non-English language (e.g., Korean, Japanese, Chinese), use that language for the report.
Fallback — default to English.

Detection heuristic: Read the first 30 lines of CLAUDE.md. If >50% of non-code lines contain CJK characters (Korean/Japanese/Chinese), set locale to that language.

Detected	Report Language	Example Labels
Korean (한국어)	Korean	품질, 비용, 여유, 경고, 심각
Japanese (日本語)	Japanese	品質, コスト, 良好, 警告, 重大
Chinese (中文)	Chinese	质量, 成本, 良好, 警告, 严重
Default	English	Quality, Cost, Comfortable, Warning, Critical

Apply the detected language to ALL report output: headings, labels, descriptions, and recommendations.

Step 1: Scan Directory Structure

Scan the project's .claude/ directory:

.claude/
├── CLAUDE.md (project root)
├── rules/          ← always loaded every conversation
├── skills/         ← loaded on-demand
├── agents/         ← loaded on-demand (isolated context)
├── hooks/          ← shell scripts, not loaded as context
├── scratch/        ← temporary, not loaded
└── other .md files

Also check the project root for CLAUDE.md — this is always loaded.

Step 2: Measure Token Usage

For each file, estimate tokens:

Rule of thumb: 1 line ≈ 10-15 tokens (avg for markdown with code)
Count total lines per file using the Read tool (do NOT use Bash wc -l)
Categorize as:
- Always-loaded: CLAUDE.md (root + .claude/), rules/*.md — loaded EVERY conversation
- On-demand: skills/, agents/ — loaded only when triggered
- Inactive: hooks/, scratch/, config files — not counted as context tokens

Step 3: Detect Issues — Quality

Quality issues affect adherence regardless of plan tier.

🔴 Critical

CLAUDE.md exceeds 200 lines (official recommendation — longer files degrade rule adherence)
Duplicate paragraphs or sections across files (risk of contradiction)
Broken cross-references: /skill-name in rules/CLAUDE.md pointing to non-existent skills/

🟡 Warning

Any single rules/ file exceeds 150 lines (focus degradation)
CLAUDE.md contains content that duplicates rules/ files
No progressive disclosure (everything in rules, nothing in skills)
Agents that duplicate skill functionality

🟢 Info

Content in rules/ that could be a skill (only needed for specific tasks)
Skills with very large SKILL.md files (>250 lines without references/ split)
Rules that are too granular (could be merged)
Skills that haven't been referenced recently (check learning-log if available)

Step 4: Run 25-Point Checklist and Calculate Quality Score

Quality score measures structural health — same for all plan tiers.

Run ALL 25 checks below. Each check results in PASS (0), WARN (-1), or FAIL (-3).

Token Efficiency (1–8)

#	Check	PASS	WARN	FAIL
1	CLAUDE.md size	≤ 200 lines	201–500	> 500
2	Always-on tokens (CLAUDE.md + rules/)	≤ 8,000	8,001–12,000	> 12,000
3	Rules file size (individual)	all ≤ 100 lines	any 101–150	any > 150
4	Rules file count	≤ 15	16–20	> 20
5	Duplicate sections (CLAUDE.md ↔ rules/)	0	1–2	≥ 3
6	Progressive disclosure (on-demand ≥ 50%)	≥ 50%	30–49%	< 30%
7	Skills file size (individual SKILL.md)	all ≤ 150 lines	any 151–250	any > 250
8	Token allocation (always-on ≤ 30% of total)	≤ 30%	31–50%	> 50%

Structural Validity (9–25)

#	Check	PASS	WARN	FAIL
9	Agent frontmatter (valid YAML `---` block)	all valid	—	any invalid
10	Agent required fields (name/description/tools)	all present	—	any missing
11	Skill frontmatter (valid YAML `---` block)	all valid	—	any invalid
12	Skill references links (files exist)	all exist	—	any missing
13	Rules skill references (`> See also` / `> 심화` pattern)	all rules have ref	most have	< 50% have
14	Rules pure Markdown (no YAML frontmatter)	none have frontmatter	—	any have
15	Skills orphan directories (SKILL.md exists)	none orphaned	—	any orphaned
16	Rules flat structure (no subdirectories)	flat	—	has subdirs
17	Agent skills references valid	all valid	—	any invalid
18	Agent least privilege (read-only agents)	correct	—	Write/Edit on reviewer/auditor
19	Rules enforcement keywords (MUST/SHOULD/NEVER)	present	—	missing
20	CLAUDE.md ↔ Skills sync	all referenced skills exist	—	any missing
21	Auto-learning system (hooks + promotion)	present	partial	missing
22	Agent model specified	all specified	—	any missing
23	Context saving (scratch dir + save rules)	present	partial	missing
24	Agent model cost (opus ≤ 2)	≤ 2 opus	3 opus	> 3 opus
25	Cross-reference validity	all valid	—	any broken

Score calculation:

Quality Score = 100 - (FAIL_count × 3) - (WARN_count × 1)

Grades: A (90–100), A- (80–89), B+ (70–79), B (60–69), C (50–59), D (40–49), F (0–39)

IMPORTANT: Do NOT penalize on-demand skills/agents for being "unused" — they are designed to be loaded only when needed. Only penalize always-loaded files.

Step 5: Assess Cost Impact — by Plan Tier

Cost impact is informational, not scored. Show how much of the plan's context budget is consumed.

Plan Tier Thresholds

Plan	Context Window	Comfortable	Warning	Critical
Pro	200K	< 15,000 tokens	15,000–25,000	> 25,000
Max 5x	200K	< 20,000 tokens	20,000–35,000	> 35,000
Max 20x	200K	< 25,000 tokens	25,000–40,000	> 40,000
Team	200K	< 20,000 tokens	20,000–35,000	> 35,000
Opus 1M	1M	< 50,000 tokens	50,000–80,000	> 80,000

Agent Model Cost (informational)

opus=5x, sonnet=1x, haiku=0.2x (base: sonnet)
Show weighted cost breakdown per agent
More than 2 opus agents → suggest reviewing if all need opus

Detect Plan Tier

Check the current model to infer plan context:

If model contains "1m" or "1M" → Opus 1M tier
Otherwise, ask user or default to "Max 5x" as baseline

Step 6: Generate Report

Output a clean, readable report with two separate sections:

English (default):

┌──────────────────────────────────────────────────┐
│  ctxcraft — Token Efficiency Report               │
│                                                   │
│  Quality: XX/100 (Grade X)  ← structural health    │
│  Cost: Comfortable|Warning|Critical  ← plan tier  │
│                                                   │
│  📊 Token Analysis                                │
│  Always-loaded:  ~X,XXX tokens (XX files)         │
│  On-demand:      ~X,XXX tokens (XX files)         │
│                                                   │
│  🏗️ Quality Issues                                │
│  🔴 Critical (N)                                  │
│  • [specific issue + fix]                         │
│  🟡 Warning (N)                                   │
│  • [specific issue + fix]                         │
│  🟢 Info (N)                                      │
│  • [optimization opportunity]                     │
│                                                   │
│  💰 Cost Impact (Opus 1M tier)                    │
│  Always-loaded: XX,XXX / 50,000 tokens — Comfy    │
│  opus agents: N (weighted cost XX%)               │
│                                                   │
│  💡 Quick Wins                                    │
│  • [top 3 easiest improvements]                   │
│                                                   │
│  Run /optimize to apply improvements.             │
└──────────────────────────────────────────────────┘

Korean (when detected):

┌──────────────────────────────────────────────────┐
│  ctxcraft — 토큰 효율 리포트                       │
│                                                   │
│  품질: XX/100 (등급 X)  ← 구조적 건강도 (플랜 무관)  │
│  비용: 여유|보통|주의  ← 플랜 기준                   │
│                                                   │
│  📊 토큰 분석                                      │
│  상시 로드:  ~X,XXX 토큰 (XX 파일)                 │
│  온디맨드:   ~X,XXX 토큰 (XX 파일)                 │
│                                                   │
│  🔴 심각 (N건) / 🟡 경고 (N건) / 🟢 참고 (N건)    │
│                                                   │
│  /optimize 실행으로 개선을 적용하세요.               │
└──────────────────────────────────────────────────┘

Step 7: Save Report

Save the full report to .claude/scratch/ctxcraft-report.md for reference.

Important Rules

DO NOT modify any files during evaluation — read only
Be specific in recommendations — "CLAUDE.md line 45-80 duplicates rules/architecture.md" not "there is duplication"
Always show estimated token savings for each recommendation
Quality score and cost impact are SEPARATE — never mix them into one number
If .claude/ directory doesn't exist, inform the user and exit gracefully

evaluate

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

evaluate

Popularity

Invocation

Context Preview

SKILL.md

Token Efficiency Evaluation

Trigger

Execution Steps

Step 0: Detect Output Language

Step 1: Scan Directory Structure

Step 2: Measure Token Usage

Step 3: Detect Issues — Quality

🔴 Critical

🟡 Warning

🟢 Info

Step 4: Run 25-Point Checklist and Calculate Quality Score

Step 5: Assess Cost Impact — by Plan Tier

Plan Tier Thresholds

Agent Model Cost (informational)

Detect Plan Tier

Step 6: Generate Report

Step 7: Save Report

Important Rules

Similar Skills

Help us improve

Token Efficiency Evaluation

Trigger

Execution Steps

Step 0: Detect Output Language

Step 1: Scan Directory Structure

Step 2: Measure Token Usage

Step 3: Detect Issues — Quality

🔴 Critical

🟡 Warning

🟢 Info

Step 4: Run 25-Point Checklist and Calculate Quality Score

Step 5: Assess Cost Impact — by Plan Tier

Plan Tier Thresholds

Agent Model Cost (informational)

Detect Plan Tier

Step 6: Generate Report

Step 7: Save Report

Important Rules