From copilot-studio
Analyzes CSV exports from Copilot Studio Evaluate tab to identify test failures and propose YAML fixes for agent topics.
How this skill is triggered — by the user, by Claude, or both
Slash command
/copilot-studio:analyze-evalscopilot-studio-testThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze evaluation results exported from the Copilot Studio UI as CSV.
Analyze evaluation results exported from the Copilot Studio UI as CSV.
Ask the user for the CSV file path if not already provided. The file is typically exported from Copilot Studio's Evaluate tab and named Evaluate <agent name> <date>.csv in their Downloads folder.
Read the CSV file. The in-product evaluation CSV has these columns:
| Column | Meaning |
|---|---|
question | The test utterance |
expectedResponse | Expected response (may be empty) |
actualResponse | What the agent responded |
testMethodType_1 | Eval method (e.g., GeneralQuality) |
result_1 | Pass or Fail |
passingScore_1 | Score threshold (may be empty) |
explanation_1 | Why it passed/failed (e.g., "Seems relevant; Seems incomplete; Knowledge sources not cited") |
The _1 suffix indicates the first eval method. There may be additional methods (_2, _3, etc.) with the same column pattern.
Focus on failed evaluations (result_1 = Fail, or any result_N = Fail).
For each failure, use the explanation column to understand the issue:
SearchAndSummarizeContent nodes.SendActivity messages.actualResponse (e.g., GenAIToolPlannerRateLimitReached) — These are runtime errors, not authoring issues. Flag them to the user as transient failures to retry.For each failure, identify the relevant YAML file(s):
Glob: **/agent.mcs.ymlPropose specific YAML changes to fix each failure. Present them to the user as a summary:
Wait for user decision. The user can:
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running evaluations.
npx claudepluginhub microsoft/skills-for-copilot-studio --plugin copilot-studioProduces a structured SHIP/ITERATE/BLOCK triage report from Copilot Studio evaluation results (CSV, summary, or plain text). Grounded in the Practical Guidance on Agent Evaluation 10-step playbook.
Generates CSV test sets for Copilot Studio Evaluate tab by reading agent YAML files for topics, instructions, knowledge. Covers core functionality, edges, system topics with grader suggestions. Use for agent evaluation prep.
Runs evaluations on ADK agents: writing eval datasets, analyzing failures, comparing results, and optimizing agents using the Quality Flywheel methodology.