From umbraco-mcp-skills
Build LLM eval tests for MCP tool collections. Reads .discover.json and creates eval setup and scenario test files per collection. Use after running '/build-tools'.
npx claudepluginhub umbraco/umbraco-mcp-base --plugin umbraco-mcp-skillsThis skill uses the workspace's default tool permissions.
Generate LLM eval tests for MCP tool collections created by `/build-tools`. This skill reads `.discover.json` and the existing tool files, then builds eval test files in `tests/evals/` one collection at a time.
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.
Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.
Generate LLM eval tests for MCP tool collections created by /build-tools. This skill reads .discover.json and the existing tool files, then builds eval test files in tests/evals/ one collection at a time.
IMPORTANT: This skill ONLY creates files inside tests/evals/ — the setup file and eval test files. Do NOT create or modify tool files, collection indexes, integration tests, mock handlers, or any other files.
Key difference from integration tests (/build-tools-tests):
npm run build first (run against dist/)Before running, ensure:
/build-tools (tool collections exist in src/umbraco-api/tools/)npm run buildANTHROPIC_API_KEY).discover.json/build-evals form)This skill orchestrates the following agent — use it for the relevant step:
| Agent | When to use |
|---|---|
eval-test-creator | Creating eval test files (Step 5) |
BUILD BEFORE RUNNING. Eval tests run against dist/index.js. Always npm run build first.
ONE COLLECTION AT A TIME. Complete each collection before starting the next.
GROUP TOOLS BY WORKFLOW. Unlike integration tests (one file per tool), eval tests group related tools into workflow scenarios. A collection with 5 tools might have just 1-2 eval test files.
ITERATE ON PROMPTS. Eval tests are probabilistic. If a test fails, the fix is usually in the prompt — make instructions more explicit, add search steps for IDs, use unique identifiers.
VERBOSE DURING DEVELOPMENT. Always set verbose: true when creating/debugging. Disable after tests pass reliably.
RUN COMMANDS SEPARATELY. Always run build and test as separate Bash calls. Never chain them with &&.
Process one collection at a time. Complete each collection fully before starting the next.
Read .discover.json from the project root:
{
"apiName": "Umbraco Forms Management API",
"swaggerUrl": "https://localhost:44324/umbraco/swagger/forms-management/swagger.json",
"baseUrl": "https://localhost:44324",
"collections": ["form", "form-template", "field-type", "folder"]
}
If an argument was provided, filter to only that collection. If .discover.json doesn't exist, tell the user to run npx @umbraco-cms/create-umbraco-mcp-server discover first.
For each collection, verify:
src/umbraco-api/tools/{collection}/index.ts exists — if not, skip and tell the user to run /build-tools firstCheck if eval tests already exist for this collection by looking for tests/evals/{collection}-*.test.ts files. Skip if eval test files already exist — evals have already been created for this collection.
Then ensure the eval setup exists and the project builds:
tests/evals/helpers/e2e-setup.ts doesn't exist, create it (Step 3)tests/evals/jest.config.ts doesn't exist, create it (Step 3)npm run build
Fix any build errors before continuing. Evals run against dist/index.js — if it doesn't build, evals can't run.
For each collection, read:
src/umbraco-api/tools/{collection}/index.ts — to get the list of tools and collection metadataBuild a mental inventory of which operations are available. This determines what workflow scenarios to create.
Eval tests use a centralized setup at tests/evals/. Only create these files if they don't already exist.
tests/evals/jest.config.tsimport type { Config } from "jest";
const config: Config = {
preset: "ts-jest/presets/js-with-ts-esm",
testEnvironment: "node",
extensionsToTreatAsEsm: [".ts"],
rootDir: "../..",
moduleNameMapper: {
"^(\\.{1,2}/.*)\\.js$": "$1",
"^@/(.*)$": "<rootDir>/src/$1",
},
transform: {
"^.+\\.tsx?$": [
"ts-jest",
{
useESM: true,
},
],
},
testMatch: ["<rootDir>/tests/evals/**/*.test.ts"],
setupFilesAfterEnv: ["<rootDir>/tests/evals/helpers/e2e-setup.ts"],
setupFiles: ["<rootDir>/jest.setup.ts"],
testPathIgnorePatterns: ["/node_modules/"],
moduleFileExtensions: ["ts", "tsx", "js", "jsx", "json", "node"],
maxConcurrency: 1,
maxWorkers: 1,
testTimeout: 120000,
slowTestThreshold: 300,
};
export default config;
tests/evals/helpers/e2e-setup.tsDetect API mode: Check if src/mocks/ exists with handler files. If mocks exist, use USE_MOCK_API: "true". Otherwise, configure for real API using .env credentials.
Mock API setup (if src/mocks/ exists):
import path from "path";
import { configureEvals, ClaudeModels } from "@umbraco-cms/mcp-server-sdk/evals";
configureEvals({
mcpServerPath: path.resolve(process.cwd(), "dist/index.js"),
mcpServerName: "{mcp-server-name}",
serverEnv: {
USE_MOCK_API: "true",
DISABLE_MCP_CHAINING: "true",
UMBRACO_CLIENT_ID: "test-client",
UMBRACO_CLIENT_SECRET: "test-secret",
UMBRACO_BASE_URL: "http://localhost:9999",
},
defaultModel: ClaudeModels.Haiku,
defaultMaxTurns: 10,
defaultMaxBudgetUsd: 0.25,
defaultTimeoutMs: 60000,
});
Real API setup (no mocks):
import path from "path";
import { configureEvals, ClaudeModels } from "@umbraco-cms/mcp-server-sdk/evals";
configureEvals({
mcpServerPath: path.resolve(process.cwd(), "dist/index.js"),
mcpServerName: "{mcp-server-name}",
serverEnv: {
UMBRACO_CLIENT_ID: process.env.UMBRACO_CLIENT_ID || "",
UMBRACO_CLIENT_SECRET: process.env.UMBRACO_CLIENT_SECRET || "",
UMBRACO_BASE_URL: process.env.UMBRACO_BASE_URL || "",
DISABLE_MCP_CHAINING: "true",
},
defaultModel: ClaudeModels.Haiku,
defaultMaxTurns: 10,
defaultMaxBudgetUsd: 0.25,
defaultTimeoutMs: 60000,
});
Key rules:
process.cwd() to resolve dist/index.js — setup runs from project root via JestmcpServerName from package.json name field or src/index.tsDISABLE_MCP_CHAINING: "true" to avoid connecting to chained servers during testssetupFilesAfterEnv — test files do NOT need to import itAlso ensure the test:evals script in package.json uses the dedicated config:
"test:evals": "npm run build && node --experimental-vm-modules $(npm root)/jest/bin/jest.js --config tests/evals/jest.config.ts --runInBand --forceExit"
Based on the tools available in the collection (from Step 2), design 1-2 workflow scenarios:
Test the full create-read-update-delete cycle:
Test read operations:
Test search and retrieval:
Test hierarchy operations:
Aim for 1-2 scenarios per collection. A collection with CRUD tools needs one lifecycle test. A collection with only read tools needs one read-only test.
Use the eval-test-creator agent.
Create one file per workflow scenario in tests/evals/: tests/evals/{collection}-{workflow}.test.ts
| Workflow | File name |
|---|---|
| CRUD lifecycle | {collection}-crud.test.ts |
| Read-only | {collection}-read.test.ts |
| Search | {collection}-search.test.ts |
| Hierarchical | {collection}-hierarchy.test.ts |
import { describe, it } from "@jest/globals";
import {
runScenarioTest,
setupConsoleMock,
getDefaultTimeoutMs,
} from "@umbraco-cms/mcp-server-sdk/evals";
const COLLECTION_TOOLS = [
"create-entity",
"list-entities",
"get-entity",
"update-entity",
"delete-entity",
] as const;
describe("{Collection} CRUD Operations", () => {
setupConsoleMock();
const timeout = getDefaultTimeoutMs();
it(
"should complete full CRUD workflow",
runScenarioTest({
prompt: `Complete these tasks in order:
1. Generate a unique identifier using the current timestamp
2. Create a new entity named "Eval Test {timestamp}" with description "Created by eval test"
3. List all entities and confirm the one you created appears in the results
4. Get the entity you created by its ID to verify the details
5. Update the entity name to "Updated Eval Test {timestamp}"
6. Delete the entity you created
7. Say "CRUD workflow completed successfully"`,
tools: [...COLLECTION_TOOLS],
requiredTools: ["create-entity", "list-entities", "get-entity", "update-entity", "delete-entity"],
successPattern: "CRUD workflow completed successfully",
verbose: true,
}),
timeout
);
});
Note: No setup import needed — e2e-setup.ts is loaded automatically via setupFilesAfterEnv in tests/evals/jest.config.ts.
Compile after creating: npm run compile. Fix errors before continuing.
Build first (required — evals test against dist/):
npm run build
Then run the eval tests for this collection:
npm run test:evals -- --testPathPattern="{collection}"
dist/index.js get created?defaultMaxTurns or simplify the promptdefaultMaxBudgetUsdRepeat steps 1-6 for the next collection in .discover.json.
After all collections have eval tests:
npm run build
npm run test:evals
Then run /count-mcp-tools to confirm all collections have evals. All collections should show "yes" in the Evals column. If any show "no", note which collections are missing eval tests (may be acceptable for some collections).
Report what was generated:
After running, the eval tests directory should contain:
tests/evals/
├── jest.config.ts # Separate Jest config for evals
├── helpers/
│ └── e2e-setup.ts # configureEvals setup (loaded via setupFilesAfterEnv)
├── {collection}-crud.test.ts # CRUD lifecycle test (if applicable)
├── {collection}-read.test.ts # Read-only test (if applicable)
└── ...
After eval tests pass:
verbose: true in test files for cleaner CI output/mcp-testing for reference on advanced eval patterns