From copilot-studio
Runs batch test suites against published Copilot Studio agents using Power CAT Copilot Studio Kit and Dataverse API. Configures settings.json with environment credentials and reports pass/fail results with latencies.
How this skill is triggered — by the user, by Claude, or both
Slash command
/copilot-studio:run-tests-kitcopilot-studio-testThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run a batch test suite against a **published** Copilot Studio agent using the [Power CAT Copilot Studio Kit](https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit).
Run a batch test suite against a published Copilot Studio agent using the Power CAT Copilot Studio Kit.
The user must have:
Read tests/settings.json (relative to the user's project CWD) and check for missing or placeholder values (containing YOUR_).
If the file doesn't exist, create it from the template:
cp ${CLAUDE_SKILL_DIR}/../../tests/settings-example.json ./tests/settings.json
If values are missing, ask the user for each missing value. Explain where to find each one:
dataverse.environmentUrl): "What is your Dataverse environment URL? Find it in Power Platform admin center or Copilot Studio > Settings > Session Details. It looks like https://orgXXXXXX.crm.dynamics.com"dataverse.tenantId): "What is your Azure tenant ID? Find it in Azure Portal > Microsoft Entra ID > Overview. It's a GUID like c87f36f7-fc65-453c-9019-0d724f21bc42"dataverse.clientId): "What is your App Registration client ID? Find it in Azure Portal > App Registrations > your app > Application (client) ID. It's a GUID."testRun.agentConfigurationId): "What is your agent configuration ID? In Copilot Studio, go to your agent > Tests tab. The ID is a GUID found in the URL or test configuration."testRun.agentTestSetId): "What is your test set ID? In Copilot Studio, go to your agent > Tests tab > select your test set. The ID is a GUID found in the URL."Ask for ALL missing values at once (don't ask one at a time).
Write tests/settings.json with the collected values:
{
"dataverse": {
"environmentUrl": "<value>",
"tenantId": "<value>",
"clientId": "<value>"
},
"testRun": {
"agentConfigurationId": "<value>",
"agentTestSetId": "<value>"
}
}
If all values are already configured and valid, proceed to Phase 2.
Ensure tests/package.json exists in the user's project. If not, copy it:
cp ${CLAUDE_SKILL_DIR}/../../tests/package.json ./tests/package.json
Install dependencies if tests/node_modules/ doesn't exist:
npm install --prefix tests
Run the test script in the background with a 100-minute timeout (6000000ms):
node ${CLAUDE_SKILL_DIR}/../../tests/run-tests.js --config-dir ./tests
Use run_in_background: true for this command. Save the returned task ID.
Wait 10 seconds, then check the background task output (non-blocking check).
Detect the authentication state from the output:
If the output contains "Using cached token": Authentication succeeded automatically. Tell the user: "Authentication successful (cached credentials). Tests are running, this may take several minutes..."
If the output contains "use a web browser to open the page": Extract the URL and device code from the message. Present this prominently to the user:
Authentication Required
Open your browser to: https://microsoft.com/devicelogin Enter the code: XXXXXXXXX (extract the actual code from the output)
After signing in, the tests will continue automatically.
If the output contains an error: Report the error to the user and stop.
If the output is empty or incomplete: Wait another 10 seconds and check again (retry up to 3 times).
Wait for the background task to complete (blocking). The script polls every 20 seconds until all tests finish and downloads results as a CSV.
Read the final output to get the success rate and CSV filename.
Proceed to Phase 3.
Get the results: Glob: tests/test-results-*.csv — read the most recent CSV file (newest by modification time).
Parse the CSV columns:
| Column | Meaning |
|---|---|
| Test Utterance | The user message that was tested |
| Expected Response | What the test expected |
| Response | What the agent actually responded |
| Latency (ms) | Response time |
| Result | Success, Failed, Unknown, Error, or Pending |
| Test Type | Response Match, Topic Match, Generative Answers, Multi-turn, Plan Validation, or Attachments |
| Result Reason | Why the test passed or failed |
Focus on failed tests (Result = Failed or Error). For each failure, analyze:
SendActivity messages, instructions, or generative answer config.SearchAndSummarizeContent, and agent instructions.Proceed to Phase 4 (Propose Fixes).
For each failure, identify the relevant YAML file(s):
Glob: **/agent.mcs.ymlPropose specific YAML changes to fix each failure. Present them to the user as a summary:
Wait for user decision. The user can:
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running tests.
Result: 1=Success, 2=Failed, 3=Unknown, 4=Error, 5=Pending
Test Type: 1=Response Match, 2=Topic Match, 3=Attachments, 4=Generative Answers, 5=Multi-turn, 6=Plan Validation
Run Status: 1=Not Run, 2=Running, 3=Complete, 4=Not Available, 5=Pending, 6=Error
npx claudepluginhub microsoft/skills-for-copilot-studio --plugin copilot-studioRuns evaluations on Copilot Studio draft agents via Power Platform Evaluation API. Lists test sets, starts/polls runs, fetches results, proposes YAML fixes. Use to test changes without publishing.
Writes, runs, and analyzes structured test suites for Agentforce agents. Supports smoke tests, batch execution, and iterative fix loops using sf CLI commands.
Writes, runs, and analyzes structured test suites for Agentforce agents using sf agent test and sf agent preview CLI commands. Supports smoke tests, batch execution, trace analysis, and iterative fix loops.