Evaluator-Optimizer pattern knowledge for automatic iteration cycles. Implements Anthropic's agent architecture pattern for continuous improvement. Triggers: evaluator-optimizer, iteration pattern, 평가-최적화, 評価最適化, 评估优化
/plugin marketplace add popup-studio-ai/bkit-claude-code/plugin install bkit@bkit-marketplaceThis skill is limited to using the following tools:
The Evaluator-Optimizer pattern is one of five key agentic patterns identified by Anthropic for building effective AI systems. It creates a feedback loop between generation and evaluation to iteratively improve output quality.
flowchart LR
subgraph Pattern["Evaluator-Optimizer Pattern"]
direction LR
Input["Input"]
Gen["Generator"]
Output["Output"]
Eval["Evaluator"]
Decision["Decision"]
Input --> Gen
Gen --> Output
Output --> Eval
Eval --> Decision
Eval -.->|"Feedback<br/>(if not pass)"| Gen
end
style Input fill:#95a5a6,color:#fff
style Gen fill:#4a90d9,color:#fff
style Output fill:#50c878,color:#fff
style Eval fill:#d94a4a,color:#fff
style Decision fill:#9b59b6,color:#fff
Creates initial output or applies improvements.
Responsibilities:
- Generate initial implementation
- Apply fixes based on evaluator feedback
- Refactor code based on suggestions
- Create missing components
Tools Used:
- Write: Create new files
- Edit: Modify existing files
- Bash: Run generators, build tools
Assesses output quality against defined criteria.
Responsibilities:
- Analyze output against criteria
- Identify gaps and issues
- Score quality metrics
- Generate improvement suggestions
Evaluator Types:
1. Gap Evaluator (gap-detector)
- Compares design vs implementation
- Measures API/model/component match rate
2. Quality Evaluator (code-analyzer)
- Checks code complexity
- Finds security issues
- Detects code smells
3. Functional Evaluator (qa-monitor)
- Validates via log analysis
- Checks error handling
- Verifies expected behavior
Connects evaluator output to generator input.
Feedback Content:
- List of issues found
- Priority ranking (Critical > Warning > Info)
- Specific file:line locations
- Suggested fixes
- Score improvement needed
Feedback Format:
{
"iteration": 2,
"score": 78,
"target": 90,
"issues": [
{
"type": "gap",
"severity": "critical",
"location": "src/api/auth.ts:45",
"message": "Missing error handler for INVALID_TOKEN",
"suggestion": "Add catch block with proper error response"
}
]
}
Determines when to stop iterating.
Success Conditions:
- All quality thresholds met
- No critical issues remaining
- Score >= target percentage
Failure Conditions:
- Maximum iterations reached
- No improvement for N iterations
- Unfixable issues detected
Configurable Thresholds:
- gap_match_rate: 90% (default)
- quality_score: 80% (default)
- max_iterations: 5 (default)
- no_improvement_limit: 3 (default)
Use Case: Single evaluator, straightforward fixes
Flow:
1. Generate initial output
2. Evaluate with single criteria
3. If fail, apply fix and re-evaluate
4. Repeat until pass or max iterations
Example:
- Code style fixes (linting)
- Type error corrections
- Simple refactoring
Use Case: Complex quality requirements
Flow:
1. Generate initial output
2. Run Evaluator 1 (gap analysis)
3. Run Evaluator 2 (quality check)
4. Run Evaluator 3 (functional test)
5. Aggregate scores and issues
6. Apply fixes prioritized by severity
7. Re-evaluate all
Example:
- Feature implementation
- API development
- Component creation
Use Case: Large-scale improvements
Flow:
1. Stage 1: Fix critical issues only
2. Stage 2: Fix warning-level issues
3. Stage 3: Apply optimizations
4. Each stage has its own iteration limit
Example:
- Legacy code modernization
- Security hardening
- Performance optimization
flowchart TB
subgraph PDCA["PDCA + Evaluator-Optimizer"]
direction TB
subgraph Cycle["PDCA Cycle"]
direction LR
Plan["Plan"]
Design["Design"]
Do["Do"]
Check["Check"]
Act["Act"]
Plan --> Design --> Do --> Check
Check <--> Act
end
subgraph EO["Evaluator-Optimizer Loop"]
Iterate["Iterate until<br/>quality met"]
end
Do --> EO
Check --> EO
EO --> Iterate
end
style Plan fill:#3498db,color:#fff
style Design fill:#9b59b6,color:#fff
style Do fill:#27ae60,color:#fff
style Check fill:#e74c3c,color:#fff
style Act fill:#f39c12,color:#fff
style Iterate fill:#1abc9c,color:#fff
criteria:
api_endpoints:
match_rate: 90%
weight: 30%
data_models:
match_rate: 90%
weight: 30%
components:
match_rate: 85%
weight: 20%
error_handling:
coverage: 80%
weight: 20%
criteria:
security:
critical_issues: 0
weight: 40%
complexity:
max_per_function: 15
weight: 20%
duplication:
max_lines: 10
weight: 20%
maintainability:
score: 70
weight: 20%
criteria:
error_logs:
count: 0
weight: 40%
success_logs:
coverage: 100%
weight: 30%
response_time:
p95_ms: 500
weight: 30%
BAD:
"Make the code better"
GOOD:
"Reduce function complexity to <= 10"
"Achieve 90% test coverage"
"Fix all security issues with severity >= high"
Order of fixing:
1. Critical security vulnerabilities
2. Functional bugs (broken behavior)
3. Design-implementation gaps
4. Code quality issues
5. Performance optimizations
6. Style improvements
Per iteration, fix:
- Maximum 5 issues
- Only same-severity issues
- Related issues together
Rationale:
- Easier to verify improvements
- Faster feedback cycles
- Reduces regression risk
Maintain iteration log:
- Score history
- Issues fixed per iteration
- Time per iteration
- Files modified
Use for:
- Detecting stuck iterations
- Identifying improvement trends
- Post-mortem analysis
Causes:
- Criteria too strict
- Issues require human judgment
- Circular dependencies
Solutions:
- Relax thresholds temporarily
- Mark issues as "requires-human"
- Break circular dependencies manually
Causes:
- Fixes create new issues
- Conflicting criteria
Solutions:
- Apply fixes in isolation
- Prioritize criteria order
- Use staged improvement pattern
Causes:
- Too many files analyzed
- Complex evaluations
- Large codebase
Solutions:
- Scope to specific feature/folder
- Cache evaluation results
- Use incremental evaluation
From Anthropic's Agent Patterns:
1. Prompt Chaining - Sequential processing
2. Routing - Direct to appropriate handler
3. Parallelization - Concurrent processing
4. Orchestrator-Workers - Task delegation
5. Evaluator-Optimizer - Quality iteration ← THIS PATTERN
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.