From umbraco-mcp-skills
Load MCP eval testing patterns using @umbraco-cms/mcp-server-sdk/evals. Use when writing LLM-based acceptance tests for MCP tools.
npx claudepluginhub umbraco/umbraco-mcp-base --plugin umbraco-mcp-skillsThis skill uses the workspace's default tool permissions.
This skill loads eval testing patterns for MCP tools using `@umbraco-cms/mcp-server-sdk/evals`. Eval tests verify tools work correctly when driven by an LLM agent.
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.
Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.
This skill loads eval testing patterns for MCP tools using @umbraco-cms/mcp-server-sdk/evals. Eval tests verify tools work correctly when driven by an LLM agent.
For integration tests, use /build-tools-tests instead.
Use this skill when:
Eval tests live in tests/evals/ with a dedicated Jest config. The setup file is loaded automatically via setupFilesAfterEnv — test files do NOT need to import it.
tests/evals/helpers/e2e-setup.tsimport path from "path";
import { configureEvals, ClaudeModels } from "@umbraco-cms/mcp-server-sdk/evals";
configureEvals({
mcpServerPath: path.resolve(process.cwd(), "dist/index.js"),
mcpServerName: "my-mcp-server",
serverEnv: { USE_MOCK_API: "true" },
defaultModel: ClaudeModels.Haiku,
defaultMaxTurns: 10,
defaultMaxBudgetUsd: 0.25,
defaultTimeoutMs: 60000,
});
tests/evals/jest.config.tsimport type { Config } from "jest";
const config: Config = {
preset: "ts-jest/presets/js-with-ts-esm",
testEnvironment: "node",
extensionsToTreatAsEsm: [".ts"],
rootDir: "../..",
testMatch: ["<rootDir>/tests/evals/**/*.test.ts"],
setupFilesAfterEnv: ["<rootDir>/tests/evals/helpers/e2e-setup.ts"],
maxConcurrency: 1,
maxWorkers: 1,
testTimeout: 120000,
};
export default config;
// tests/evals/entity-crud.test.ts
import { describe, it } from "@jest/globals";
import {
runScenarioTest,
setupConsoleMock,
getDefaultTimeoutMs,
} from "@umbraco-cms/mcp-server-sdk/evals";
describe("entity evals", () => {
setupConsoleMock();
it(
"should complete workflow",
runScenarioTest({
prompt: `Complete these tasks:
1. Create an item named "Test"
2. Delete the item
3. Say "Workflow completed"`,
tools: ["create-item", "delete-item"],
requiredTools: ["create-item", "delete-item"],
successPattern: "Workflow completed",
}),
getDefaultTimeoutMs()
);
});
| Option | Purpose |
|---|---|
prompt | Step-by-step instructions for the LLM |
tools | Tools available to the LLM agent |
requiredTools | Tools that must be called for the test to pass |
successPattern | String the LLM must output to indicate success |
Group tools that work together in a single eval test to verify the workflow:
it(
"should create, list, and delete",
runScenarioTest({
prompt: `Complete these tasks:
1. Create a form named "Test Form ${Date.now()}"
2. List all forms and confirm the new one appears
3. Delete the form you created
4. Say "CRUD workflow completed"`,
tools: ["create-form", "list-forms", "delete-form"],
requiredTools: ["create-form", "list-forms", "delete-form"],
successPattern: "CRUD workflow completed",
}),
getDefaultTimeoutMs()
);
# Build first (evals run against dist/)
npm run build
# Run all evals
npm run test:evals
# Run specific eval file
npm run test:evals -- --testPathPattern="entity"
# Verbose mode shows full LLM conversation
E2E_VERBOSITY=verbose npm run test:evals
npm run build)maxBudgetUsd low to catch inefficient tool usage# Verbose mode shows full conversation
E2E_VERBOSITY=verbose npm run test:evals
# Run specific eval file
npm run test:evals -- --testPathPattern="entity"
| Issue | Solution |
|---|---|
| Eval timeout | Increase maxTurns or simplify prompt |
| Wrong tool selected | Improve tool description clarity |
| Missing parameters | Add examples to tool descriptions |
| Tool not found | Check tool name matches exactly |
| Budget exceeded | Simplify workflow or increase maxBudgetUsd |