From agent-patterns
Design and run a self-improving evaluator/optimizer loop. Use this skill when asked to "set up an eval loop", "build an optimizer", "improve this output iteratively", or "create a generate-evaluate-improve cycle" for any agent output.
npx claudepluginhub ats-kinoshita-iso/agent-workshop --plugin agent-patternsThis skill uses the workspace's default tool permissions.
Design and execute a **generate -> evaluate -> improve -> repeat** loop that
Guides strict Test-Driven Development (TDD): write failing tests first for features, bugfixes, refactors before any production code. Enforces red-green-refactor cycle.
Guides systematic root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Guides A/B test setup with mandatory gates for hypothesis validation, metrics definition, sample size calculation, and execution readiness checks.
Design and execute a generate -> evaluate -> improve -> repeat loop that iteratively refines an agent output until a quality threshold is met.
[Generator] --> output --> [Evaluator] --> score + feedback
|
score >= threshold? --> YES --> done
|
NO
|
[Optimizer] --> improved prompt
|
[Generator] --> new output
Before starting the loop, specify:
/agent-review criteria)Define the initial generation prompt:
System: <system prompt for the generator>
User: <task description>
The generator should produce a single, structured output per run. Avoid generating lists of alternatives -- the evaluator handles iteration.
The evaluator scores the generator's output and produces structured feedback:
{
"score": 3.5,
"passed": false,
"issues": [
"Issue description 1",
"Issue description 2"
],
"suggestions": [
"Specific improvement 1",
"Specific improvement 2"
]
}
The evaluator must be deterministic about scoring criteria -- define them explicitly before starting the loop.
The optimizer receives the current prompt and evaluator feedback, and produces an improved prompt for the next generator run:
Previous prompt: <current generator prompt>
Evaluator score: <score>
Issues found: <list of issues>
Suggestions: <list of suggestions>
Produce an improved generator prompt that addresses these issues.
Execute iterations until the threshold is met or max iterations reached:
Iteration 1:
Generator output: <output>
Evaluator score: <score> / 5.0
Passed threshold: Yes/No
Issues: <issues if any>
Iteration 2 (if needed):
Improved prompt: <what changed>
Generator output: <new output>
Evaluator score: <new score>
Passed threshold: Yes/No
Final result: <the output that passed, or the best output if max iterations reached>
After the loop completes, provide: