From agent-patterns
Evaluate the output quality of an agent or pipeline run. Use this skill when asked to "review this agent output", "score this result", "evaluate agent quality", or "suggest improvements" to an agent's response or pipeline output.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-patterns:agent-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evaluate the quality of an agent or pipeline output against defined criteria.
Evaluate the quality of an agent or pipeline output against defined criteria. Produce a structured review with a numeric score and specific improvement suggestions.
Before scoring, identify which dimensions apply to this output:
| Dimension | Description | Applicable? |
|---|---|---|
| Correctness | Output matches the expected answer or solves the problem | Always |
| Completeness | All required sub-tasks or fields are addressed | Always |
| Format compliance | Output matches the required format (JSON, markdown, etc.) | If format specified |
| Conciseness | No unnecessary verbosity or repetition | Always |
| Safety | No harmful, biased, or policy-violating content | Always |
| Tool use quality | Tools called correctly with valid arguments | If tools were used |
Rate each applicable dimension on a scale of 1-5:
1 = Failing (major problems)
2 = Poor (significant issues)
3 = Acceptable (meets minimum bar)
4 = Good (minor issues only)
5 = Excellent (no issues)
For each dimension scored below 4, list concrete issues:
Format:
Issue: <dimension>
Found: "<exact quote from output>"
Problem: <why this is wrong>
Fix: <specific improvement>
Calculate the overall score as a weighted average of dimension scores. Apply this verdict based on the overall score:
| Score | Verdict |
|---|---|
| 4.5 - 5.0 | EXCELLENT -- ready to use |
| 3.5 - 4.4 | GOOD -- minor improvements recommended |
| 2.5 - 3.4 | ACCEPTABLE -- improvements needed before production use |
| 1.5 - 2.4 | POOR -- significant rework required |
| 1.0 - 1.4 | FAILING -- output should be discarded and regenerated |
List 1-3 actionable improvements in priority order:
For each suggestion, include:
npx claudepluginhub ats-kinoshita-iso/agent-workshop --plugin agent-patternsSelf-rates agent output on 5 axes (accuracy, completeness, clarity, actionability, conciseness) with concrete evidence per criterion, producing a structured 1-5 scorecard with improvement suggestions.
自评输出质量:按准确性、完整性、清晰度、可执行性、简洁性五个维度生成1-5评分卡和具体改进建议。适合复杂代码或设计产出后的反思步骤。
Triages Copilot Studio agent evaluation scores, diagnoses failure root causes, and suggests actionable fixes using a structured playbook.