From copilot-studio
Runs batch test suites against published Copilot Studio agents using Power CAT Copilot Studio Kit via Dataverse API. Configures settings.json with environment details and reports pass/fail results with latencies.
npx claudepluginhub microsoft/skills-for-copilot-studio --plugin copilot-studioThis skill is limited to using the following tools:
Run a batch test suite against a **published** Copilot Studio agent using the [Power CAT Copilot Studio Kit](https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit).
Enforces C++ Core Guidelines for writing, reviewing, and refactoring modern C++ code (C++17+), promoting RAII, immutability, type safety, and idiomatic practices.
Provides patterns for shared UI in Compose Multiplatform across Android, iOS, Desktop, and Web: state management with ViewModels/StateFlow, navigation, theming, and performance.
Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Run a batch test suite against a published Copilot Studio agent using the Power CAT Copilot Studio Kit.
The user must have:
Read tests/settings.json (relative to the user's project CWD) and check for missing or placeholder values (containing YOUR_).
If the file doesn't exist, create it from the template:
cp ${CLAUDE_SKILL_DIR}/../../tests/settings-example.json ./tests/settings.json
If values are missing, ask the user for each missing value. Explain where to find each one:
dataverse.environmentUrl): "What is your Dataverse environment URL? Find it in Power Platform admin center or Copilot Studio > Settings > Session Details. It looks like https://orgXXXXXX.crm.dynamics.com"dataverse.tenantId): "What is your Azure tenant ID? Find it in Azure Portal > Microsoft Entra ID > Overview. It's a GUID like c87f36f7-fc65-453c-9019-0d724f21bc42"dataverse.clientId): "What is your App Registration client ID? Find it in Azure Portal > App Registrations > your app > Application (client) ID. It's a GUID."testRun.agentConfigurationId): "What is your agent configuration ID? In Copilot Studio, go to your agent > Tests tab. The ID is a GUID found in the URL or test configuration."testRun.agentTestSetId): "What is your test set ID? In Copilot Studio, go to your agent > Tests tab > select your test set. The ID is a GUID found in the URL."Ask for ALL missing values at once (don't ask one at a time).
Write tests/settings.json with the collected values:
{
"dataverse": {
"environmentUrl": "<value>",
"tenantId": "<value>",
"clientId": "<value>"
},
"testRun": {
"agentConfigurationId": "<value>",
"agentTestSetId": "<value>"
}
}
If all values are already configured and valid, proceed to Phase 2.
Ensure tests/package.json exists in the user's project. If not, copy it:
cp ${CLAUDE_SKILL_DIR}/../../tests/package.json ./tests/package.json
Install dependencies if tests/node_modules/ doesn't exist:
npm install --prefix tests
Run the test script in the background with a 100-minute timeout (6000000ms):
node ${CLAUDE_SKILL_DIR}/../../tests/run-tests.js --config-dir ./tests
Use run_in_background: true for this command. Save the returned task ID.
Wait 10 seconds, then check the background task output (non-blocking check).
Detect the authentication state from the output:
If the output contains "Using cached token": Authentication succeeded automatically. Tell the user: "Authentication successful (cached credentials). Tests are running, this may take several minutes..."
If the output contains "use a web browser to open the page": Extract the URL and device code from the message. Present this prominently to the user:
Authentication Required
Open your browser to: https://microsoft.com/devicelogin Enter the code: XXXXXXXXX (extract the actual code from the output)
After signing in, the tests will continue automatically.
If the output contains an error: Report the error to the user and stop.
If the output is empty or incomplete: Wait another 10 seconds and check again (retry up to 3 times).
Wait for the background task to complete (blocking). The script polls every 20 seconds until all tests finish and downloads results as a CSV.
Read the final output to get the success rate and CSV filename.
Proceed to Phase 3.
Get the results: Glob: tests/test-results-*.csv — read the most recent CSV file (newest by modification time).
Parse the CSV columns:
| Column | Meaning |
|---|---|
| Test Utterance | The user message that was tested |
| Expected Response | What the test expected |
| Response | What the agent actually responded |
| Latency (ms) | Response time |
| Result | Success, Failed, Unknown, Error, or Pending |
| Test Type | Response Match, Topic Match, Generative Answers, Multi-turn, Plan Validation, or Attachments |
| Result Reason | Why the test passed or failed |
Focus on failed tests (Result = Failed or Error). For each failure, analyze:
SendActivity messages, instructions, or generative answer config.SearchAndSummarizeContent, and agent instructions.Proceed to Phase 4 (Propose Fixes).
For each failure, identify the relevant YAML file(s):
Glob: **/agent.mcs.ymlPropose specific YAML changes to fix each failure. Present them to the user as a summary:
Wait for user decision. The user can:
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running tests.
Result: 1=Success, 2=Failed, 3=Unknown, 4=Error, 5=Pending
Test Type: 1=Response Match, 2=Topic Match, 3=Attachments, 4=Generative Answers, 5=Multi-turn, 6=Plan Validation
Run Status: 1=Not Run, 2=Running, 3=Complete, 4=Not Available, 5=Pending, 6=Error