npx claudepluginhub copilotkit/aimockThis skill uses the workspace's default tool permissions.
aimock is a zero-dependency mock infrastructure for AI apps. Fixture-driven. Multi-provider (OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Cohere). Multimedia endpoints (image generation, text-to-speech, audio transcription, video generation). MCP, A2A, AG-UI, and vector DB mocking. Runs a real HTTP server on a real port — works across processes, unlike MSW-style inte...
Acquire memory dumps from live systems/VMs and analyze with Volatility 3 for processes, networks, DLLs, injections in incident response or malware hunts.
Provides x86-64/ARM disassembly patterns, calling conventions, control flow recognition for static analysis of executables and compiled binaries.
Identifies anti-debugging checks like IsDebuggerPresent, NtQueryInformationProcess in Windows binaries; suggests bypasses via patches/hooks/scripts for malware analysis, CTFs, authorized RE.
aimock is a zero-dependency mock infrastructure for AI apps. Fixture-driven. Multi-provider (OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Cohere). Multimedia endpoints (image generation, text-to-speech, audio transcription, video generation). MCP, A2A, AG-UI, and vector DB mocking. Runs a real HTTP server on a real port — works across processes, unlike MSW-style interceptors. WebSocket support for OpenAI Responses/Realtime and Gemini Live APIs. Record-and-replay for all endpoints including multimedia. Chaos testing and Prometheus metrics.
ChatCompletionRequest)start() take effect immediatelysequenceIndex (match count tracked per fixture)| Field | Type | Matches Against |
|---|---|---|
userMessage | string | Substring of last role: "user" message text |
userMessage | RegExp | Pattern test on last role: "user" message text |
inputText | string | Substring of embedding input text (concatenated if multiple inputs) |
inputText | RegExp | Pattern test on embedding input text |
toolName | string | Exact match on any tool in request's tools[] array (by function.name) |
toolCallId | string | Exact match on tool_call_id of last role: "tool" message |
model | string | Exact match on req.model |
model | RegExp | Pattern test on req.model |
responseFormat | string | Exact match on req.response_format.type ("json_object", "json_schema") |
sequenceIndex | number | Matches only when this fixture's match count equals the given index (0-based) |
endpoint | string | Restrict to endpoint type: "chat", "image", "speech", "transcription", "video", "embedding" |
predicate | (req: ChatCompletionRequest) => boolean | Custom function — full access to request |
AND logic: all specified fields must match. Empty match {} = catch-all.
Multi-part content (e.g., [{type: "text", text: "hello"}]) is automatically extracted — userMessage matching works regardless of content format.
{
content: "Hello!";
}
// Preferred: object form (auto-stringified by the fixture loader)
{
toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }];
}
// Also accepted: JSON string form (backward compatible)
{
toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }];
}
Both object and string forms are accepted for arguments. The fixture loader auto-stringifies objects via JSON.stringify(). Object form is preferred for readability.
{
embedding: [0.1, 0.2, 0.3, -0.5, 0.8];
}
The embedding vector is returned for each input in the request. If no embedding fixture matches, deterministic embeddings are auto-generated from the input text hash — you only need fixtures when you want specific vectors.
// Single image
{
image: {
url: "https://example.com/generated.png"
}
}
// Multiple images
{
images: [{ url: "https://example.com/1.png" }, { b64Json: "iVBOR..." }]
}
Use match: { endpoint: "image" } to prevent cross-matching with chat fixtures.
{ audio: "base64-encoded-audio-data" }
// With explicit format (default: mp3)
{ audio: "base64-data", format: "opus" }
// Simple
{ transcription: { text: "Hello world" } }
// Verbose with timestamps
{ transcription: { text: "Hello world", language: "en", duration: 2.5, words: [...], segments: [...] } }
{ video: { id: "vid-1", status: "completed", url: "https://example.com/video.mp4" } }
Video uses async polling — POST /v1/videos creates, GET /v1/videos/{id} checks status.
{ error: { message: "Rate limited", type: "rate_limit_error" }, status: 429 }
The optional chaos field on a fixture enables probabilistic failure injection:
{
chaos?: {
dropRate?: number; // Probability (0-1) of returning a 500 error
malformedRate?: number; // Probability (0-1) of returning malformed JSON
disconnectRate?: number; // Probability (0-1) of disconnecting mid-stream
}
}
Rates are evaluated per-request. When triggered, the chaos failure replaces the normal response.
mock.onMessage("hello", { content: "Hi there!" });
The most common pattern. Fixture 1 triggers the tool call, fixture 2 handles the tool result.
// Step 1: User asks about weather → LLM calls tool
mock.onMessage("weather", {
toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }],
});
// Step 2: Tool result comes back → LLM responds with text
mock.addFixture({
match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
response: { content: "It's 72°F in San Francisco." },
});
Why predicate, not userMessage? After a tool call, the client replays the same conversation with the tool result appended. The user message hasn't changed — userMessage: "weather" would match the SAME fixture again, creating an infinite loop.
// Match specific input text
mock.onEmbedding("search query", {
embedding: [0.1, 0.2, 0.3, 0.4, 0.5],
});
// Match with regex
mock.onEmbedding(/product.*description/, {
embedding: [0.9, -0.1, 0.5, 0.3, 0.2],
});
// onJsonOutput auto-sets responseFormat: "json_object" and stringifies objects
mock.onJsonOutput("extract entities", {
entities: [
{ name: "Acme Corp", type: "company" },
{ name: "Jane Doe", type: "person" },
],
});
// Equivalent manual form:
mock.addFixture({
match: { userMessage: "extract entities", responseFormat: "json_object" },
response: { content: '{"entities":[...]}' },
});
// First call returns tool call, second returns text
mock.on(
{ userMessage: "status", sequenceIndex: 0 },
{ toolCalls: [{ name: "check_status", arguments: {} }] },
);
mock.on({ userMessage: "status", sequenceIndex: 1 }, { content: "All systems operational." });
Match counts are tracked per fixture group and reset with reset() or resetMatchCounts().
mock.onMessage(
"tell me a story",
{ content: "Once upon a time..." },
{
streamingProfile: {
ttft: 200, // 200ms before first token
tps: 30, // 30 tokens per second after that
jitter: 0.1, // ±10% random variance
},
},
);
Common in supervisor/orchestrator patterns where the system prompt changes:
mock.addFixture({
match: {
predicate: (req) => {
const sys = req.messages.find((m) => m.role === "system")?.content ?? "";
return typeof sys === "string" && sys.includes("Flights found: false");
},
},
response: { toolCalls: [{ name: "search_flights", arguments: {} }] },
});
Prevents unmatched requests from returning 404 and crashing the test:
mock.addFixture({
match: { predicate: () => true },
response: { content: "I understand. How can I help?" },
});
Must go at the front so it matches before substring-based fixtures:
mock.prependFixture({
match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
response: { content: "Done!" },
});
mock.onMessage(
"long response",
{ content: "This will be cut short..." },
{
truncateAfterChunks: 3, // Stop after 3 SSE chunks
disconnectAfterMs: 500, // Or disconnect after 500ms
},
);
mock.addFixture({
match: { userMessage: "flaky" },
response: { content: "Sometimes works!" },
chaos: { dropRate: 0.3 },
});
30% of requests matching this fixture will get a 500 error instead of the response. Can also use malformedRate (garbled JSON) or disconnectRate (connection dropped mid-stream).
Server-level chaos applies to ALL requests:
mock.setChaos({ dropRate: 0.1 }); // 10% of all requests fail
mock.clearChaos(); // Remove server-level chaos
mock.nextRequestError(429, { message: "Rate limited", type: "rate_limit_error" });
// Next request gets 429, then fixture auto-removes itself
{
"fixtures": [
{
"match": { "userMessage": "hello" },
"response": { "content": "Hi!" }
},
{
"match": { "userMessage": "weather" },
"response": {
"toolCalls": [
{
"name": "get_weather",
"arguments": { "city": "SF", "units": "fahrenheit" }
}
]
}
},
{
"match": { "inputText": "search query" },
"response": { "embedding": [0.1, 0.2, 0.3] }
},
{
"match": { "userMessage": "status", "sequenceIndex": 0 },
"response": { "content": "First response" }
}
]
}
JSON auto-stringify: In JSON fixture files, arguments and content can be objects — the loader auto-stringifies them with JSON.stringify(). The escaped-string form ("{\"city\":\"SF\"}") still works but objects are preferred for readability.
JSON files cannot use RegExp or predicate — those are code-only features. streamingProfile is supported in JSON fixture files.
Load with mock.loadFixtureFile("./fixtures/greetings.json") or mock.loadFixtureDir("./fixtures/").
All providers share the same fixture pool — write fixtures once, they work for any endpoint.
| Endpoint | Provider | Protocol |
|---|---|---|
POST /v1/chat/completions | OpenAI | HTTP |
POST /v1/responses | OpenAI | HTTP + WS |
POST /v1/messages | Anthropic | HTTP |
POST /v1/embeddings | OpenAI | HTTP |
POST /v1beta/models/{model}:{method} | Google Gemini | HTTP |
POST /model/{modelId}/invoke | AWS Bedrock | HTTP |
POST /openai/deployments/{id}/chat/completions | Azure OpenAI | HTTP |
POST /openai/deployments/{id}/embeddings | Azure OpenAI | HTTP |
GET /health | — | HTTP |
GET /ready | — | HTTP |
POST /model/{modelId}/invoke-with-response-stream | AWS Bedrock | HTTP |
POST /model/{modelId}/converse | AWS Bedrock | HTTP |
POST /model/{modelId}/converse-stream | AWS Bedrock | HTTP |
POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:generateContent | Vertex AI | HTTP |
POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:streamGenerateContent | Vertex AI | HTTP |
POST /api/chat | Ollama | HTTP |
POST /api/generate | Ollama | HTTP |
GET /api/tags | Ollama | HTTP |
POST /v2/chat | Cohere | HTTP |
GET /metrics | — | HTTP |
GET /v1/models | OpenAI-compat | HTTP |
WS /v1/responses | OpenAI | WebSocket |
WS /v1/realtime | OpenAI | WebSocket |
WS /ws/google.ai...BidiGenerateContent | Gemini Live | WebSocket |
POST /v1/images/generations | OpenAI | HTTP |
POST /v1beta/models/{model}:predict | Gemini Imagen | HTTP |
POST /v1/audio/speech | OpenAI | HTTP |
POST /v1/audio/transcriptions | OpenAI | HTTP |
POST /v1/videos | OpenAI | HTTP |
GET /v1/videos/{id} | OpenAI | HTTP |
Fixture responses can include optional override fields to control auto-generated envelope values. These are merged into the provider-specific response format (OpenAI, Claude, Gemini, Responses API).
| Field | Type | Default | Description |
|---|---|---|---|
id | string | auto-generated | Override response ID (e.g., chatcmpl-custom) |
created | number | Date.now()/1000 | Override Unix timestamp |
model | string | echoes request | Override model name in response |
usage | object | zeroed | Override token counts: { prompt_tokens, completion_tokens, total_tokens }. OpenAI Chat includes usage in response body; Responses API uses response.usage. When omitted, auto-computed from content length |
finishReason | string | "stop" / "tool_calls" | Override finish reason. Mappings: stop -> end_turn (Claude), STOP (Gemini); tool_calls -> tool_use (Claude), FUNCTION_CALL (Gemini); length -> max_tokens (Claude), MAX_TOKENS (Gemini); content_filter -> SAFETY (Gemini), failed (Responses API) |
role | string | "assistant" | Override message role |
systemFingerprint | string | (omitted) | Add system_fingerprint to response |
mock.onMessage("hello", {
content: "Hi!",
model: "gpt-4-turbo-2024-04-09",
usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
systemFingerprint: "fp_abc123",
});
{
"match": { "userMessage": "hello" },
"response": {
"content": "Hi!",
"model": "gpt-4-turbo-2024-04-09",
"usage": { "prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15 },
"systemFingerprint": "fp_abc123"
}
}
These fields map correctly across all provider formats — for example, finishReason: "stop" becomes finish_reason: "stop" in OpenAI, stop_reason: "end_turn" in Claude, and finishReason: "STOP" in Gemini.
| Feature | OpenAI Chat | OpenAI Responses | Claude | Gemini | Bedrock | Azure | Ollama | Cohere |
|---|---|---|---|---|---|---|---|---|
| Text | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Tool Calls | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Content + Tool Calls | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Streaming | SSE | SSE | SSE | SSE | Binary | SSE | NDJSON | SSE |
| Reasoning | Yes | Yes | Yes | Yes | Yes | Yes | -- | -- |
| Web Searches | -- | Yes | -- | -- | -- | -- | -- | -- |
| Response Overrides | Yes | Yes | Yes | Yes | -- | Yes | -- | -- |
Order matters — first match wins. Specific fixtures before general ones. Use prependFixture() to force priority.
arguments accepts both objects and strings — "arguments": {"key":"value"} (preferred, auto-stringified) or "arguments": "{\"key\":\"value\"}" (legacy). The same applies to content fields that contain JSON. The fixture loader detects typeof === "object" and calls JSON.stringify() automatically.
Latency is per-chunk, not total — latency: 100 means 100ms between each SSE chunk, not 100ms total response time. Similarly, truncateAfterChunks and disconnectAfterMs are for simulating stream interruptions (added in v1.3.0).
streamingProfile takes precedence over latency — when both are set on a fixture, streamingProfile controls timing. Use one or the other.
Tool result messages don't change the user message — after a tool call, the client sends the same conversation + tool result. Matching on userMessage will hit the SAME fixture again → infinite loop. Always use predicate checking role === "tool" for tool results.
clearFixtures() preserves the array reference — uses .length = 0, not reassignment. The running server reads the same array object.
Journal records everything — including 404 "no match" responses. Use mock.getLastRequest() to debug mismatches.
All providers share fixtures — a fixture matching "hello" works whether the request comes via /v1/chat/completions (OpenAI), /v1/messages (Anthropic), Gemini, Bedrock, or Azure endpoints.
WebSocket uses the same fixture pool — no special setup needed for WebSocket-based APIs (OpenAI Responses WS, Realtime, Gemini Live).
Embeddings auto-generate if no fixture matches — deterministic vectors are generated from the input text hash. You don't need a catch-all for embedding requests.
Sequential response counts are tracked per fixture — counts reset with reset() or resetMatchCounts(). The count increments after each match of that fixture group (all fixtures sharing the same non-sequenceIndex match fields).
Bedrock uses Anthropic Messages format internally — the adapter normalizes Bedrock requests to ChatCompletionRequest, so the same fixtures work. Bedrock supports both non-streaming (/invoke, /converse) and streaming (/invoke-with-response-stream, /converse-stream) endpoints.
Azure OpenAI routes through the same handlers — /openai/deployments/{id}/chat/completions maps to the completions handler, /openai/deployments/{id}/embeddings maps to the embeddings handler. Fixtures work unchanged.
Ollama defaults to streaming — opposite of OpenAI. Set stream: false explicitly in the request for non-streaming responses.
Ollama tool call arguments is an object, not a JSON string — unlike OpenAI where arguments is a JSON string, Ollama sends and expects a plain object.
Bedrock streaming uses binary Event Stream format — not SSE. The invoke-with-response-stream and converse-stream endpoints use AWS Event Stream binary encoding.
Vertex AI routes to the same handler as consumer Gemini — the same fixtures work for both Vertex AI (/v1/projects/.../models/{m}:generateContent) and consumer Gemini (/v1beta/models/{model}:generateContent).
Cohere requires model field — returns 400 if model is missing from the request body.
Mount additional mock services onto a running LLMock server. All services share one port, one health endpoint, and one request journal.
const llm = new LLMock({ port: 5555 });
llm.mount("/mcp", mcpMock); // MCP tools at /mcp
llm.mount("/a2a", a2aMock); // A2A agents at /a2a
llm.mount("/vector", vectorMock); // Vector DB at /vector
await llm.start();
Any object implementing the Mountable interface (a handleRequest method that returns boolean) can be mounted. Path prefixes are stripped before the service sees the request — /mcp/tools/list arrives as /tools/list.
Unified lifecycle for LLMock + mounted services:
import { createMockSuite } from "@copilotkit/aimock";
const suite = createMockSuite({
port: 0,
fixtures: "./fixtures",
services: { "/mcp": mcpMock, "/a2a": a2aMock },
});
await suite.start();
// suite.llm — the LLMock instance
// suite.url — base URL
afterEach(() => suite.reset()); // resets everything
afterAll(() => suite.stop());
The aimock CLI reads a JSON config and serves all services on one port:
aimock --config aimock.json --port 4010
Config format:
{
"llm": {
"fixtures": "./fixtures",
"latency": 0,
"metrics": true
},
"services": {
"/mcp": { "type": "mcp", "tools": "./mcp-tools.json" },
"/a2a": { "type": "a2a", "agents": "./a2a-agents.json" }
}
}
Mock vector database server for testing RAG pipelines. Supports Pinecone, Qdrant, and ChromaDB API formats.
import { VectorMock } from "@copilotkit/aimock";
const vector = new VectorMock();
// Create a collection and register query results
vector.addCollection("docs", { dimension: 1536 });
vector.onQuery("docs", [
{ id: "doc-1", score: 0.95, metadata: { title: "Getting Started" } },
{ id: "doc-2", score: 0.87, metadata: { title: "API Reference" } },
]);
// Upsert vectors
vector.upsert("docs", [
{ id: "v1", values: [0.1, 0.2, ...], metadata: { title: "Intro" } },
]);
// Dynamic query handler
vector.onQuery("docs", (query) => {
return [{ id: "result", score: 1.0, metadata: { topK: query.topK } }];
});
// Standalone or mounted
const url = await vector.start();
// Or: llm.mount("/vector", vector);
| Provider | Endpoints |
|---|---|
| Pinecone | POST /query, POST /vectors/upsert, POST /vectors/delete, GET /describe-index-stats |
| Qdrant | POST /collections/{name}/points/search, PUT /collections/{name}/points, POST /collections/{name}/points/delete |
| ChromaDB | POST /api/v1/collections/{id}/query, POST /api/v1/collections/{id}/add, GET /api/v1/collections, DELETE /api/v1/collections/{id} |
Built-in mocks for common AI-adjacent services. Registered on the LLMock instance directly — no separate server needed.
// POST /search — matches request `query` field
mock.onSearch("weather", [
{ title: "Weather Report", url: "https://example.com", content: "Sunny today" },
]);
mock.onSearch(/stock\s+price/i, [
{ title: "ACME Stock", url: "https://example.com", content: "$42", score: 0.95 },
]);
// POST /v2/rerank — matches request `query` field
mock.onRerank("machine learning", [
{ index: 0, relevance_score: 0.99 },
{ index: 2, relevance_score: 0.85 },
]);
// POST /v1/moderations — matches request `input` field
mock.onModerate("violent", {
flagged: true,
categories: { violence: true, hate: false },
category_scores: { violence: 0.95, hate: 0.01 },
});
// Catch-all — everything passes
mock.onModerate(/.*/, { flagged: false, categories: {} });
All three services use the same matching logic:
When a fixture doesn't match:
mock.getLastRequest() → check body.messages arraymock.getFixtures() returns fixtures in registration orderuserMessage: match is against the LAST role: "user" message only, substring match (not exact)mock.getRequests() shows all requests including which fixture matched (or null for 404)import { LLMock } from "@copilotkit/aimock";
// Setup — port: 0 picks a random available port
const mock = new LLMock({ port: 0 });
mock.loadFixtureDir("./fixtures");
await mock.start();
process.env.OPENAI_BASE_URL = `${mock.url}/v1`;
// Per-test cleanup
afterEach(() => mock.reset()); // clears fixtures AND journal
// Teardown
afterAll(async () => await mock.stop());
const mock = await LLMock.create({ port: 0 }); // creates + starts in one call
| Method | Purpose |
|---|---|
addFixture(f) | Append fixture (last priority) |
addFixtures(f[]) | Append multiple |
prependFixture(f) | Insert at front (highest priority) |
clearFixtures() | Remove all fixtures |
getFixtures() | Read current fixture list |
on(match, response, opts?) | Shorthand for addFixture |
onMessage(pattern, response, opts?) | Match by user message |
onEmbedding(pattern, response, opts?) | Match by embedding input text |
onJsonOutput(pattern, json, opts?) | Match by user message with responseFormat |
onToolCall(name, response, opts?) | Match by tool name in tools[] |
onToolResult(id, response, opts?) | Match by tool_call_id |
nextRequestError(status, body?) | One-shot error, auto-removes |
loadFixtureFile(path) | Load JSON fixture file |
loadFixtureDir(path) | Load all JSON files in directory |
start() | Start server, returns URL |
stop() | Stop server |
reset() | Clear fixtures + journal + match counts |
resetMatchCounts() | Clear sequence match counts only |
getRequests() | All journal entries |
getLastRequest() | Most recent journal entry |
clearRequests() | Clear journal only |
setChaos(opts) | Set server-level chaos rates |
clearChaos() | Remove server-level chaos |
onSearch(pattern, results) | Match search requests by query |
onRerank(pattern, results) | Match rerank requests by query |
onModerate(pattern, result) | Match moderation requests by input |
onImage(pattern, response) | Match image generation by prompt |
onSpeech(pattern, response) | Match TTS by input text |
onTranscription(response) | Match audio transcription |
onVideo(pattern, response) | Match video generation by prompt |
mount(path, handler) | Mount a Mountable (VectorMock, etc.) |
url / baseUrl | Server URL (throws if not started) |
port | Server port number |
Sequential responses use on() with sequenceIndex in the match — there is no dedicated convenience method.
aimock supports a VCR-style record-and-replay workflow for ALL endpoints including multimedia (image, TTS, transcription, video): unmatched requests are proxied to real provider APIs, and the responses are saved as standard aimock fixture files for deterministic replay. Binary TTS responses are base64-encoded with format derived from Content-Type. Multimedia fixtures automatically include endpoint in their match criteria for correct routing on replay.
# Record mode: proxy unmatched requests to real OpenAI and Anthropic APIs
aimock --record \
--provider-openai https://api.openai.com \
--provider-anthropic https://api.anthropic.com \
-f ./fixtures
# Strict mode: fail on unmatched requests (no proxying, no catch-all 404)
aimock --strict -f ./fixtures
--record enables proxy-on-miss. Requires at least one --provider-* flag.--strict returns a 503 error when no fixture matches AND no proxy is configured (or the proxy attempt fails), instead of silently returning a 404. The proxy is still tried first when --record is set. Use this in CI to prevent unmatched requests from slipping through as silent 404s.--provider-openai, --provider-anthropic, --provider-gemini, --provider-vertexai, --provider-bedrock, --provider-azure, --provider-ollama, --provider-cohere.https://gateway.company.com/llm/v1 correctly proxies to /llm/v1/chat/completions).host/content-length/cookie/accept-encoding. Auth headers (Authorization, x-api-key, api-key) are forwarded but stripped from the recorded fixture.{fixturePath}/recorded/ and use the same JSON format as hand-written fixtures. Nothing special about them.encoding_format in Python's openai SDK), the recorder decodes them into float arrays so fixtures contain readable numeric data instead of opaque base64 strings.warn level so you can see exactly which requests are being forwarded.const mock = new LLMock({ port: 0 });
await mock.start();
// Enable recording at runtime
mock.enableRecording({
providers: {
openai: "https://api.openai.com",
anthropic: "https://api.anthropic.com",
},
fixturePath: "./fixtures/recorded",
});
// ... run tests that hit real APIs for uncovered cases ...
// Disable recording (back to fixture-only mode)
mock.disableRecording();
--record and provider URLs. All requests that don't match existing fixtures are proxied and recorded.{fixturePath}/recorded/. Edit or reorganize as needed.--strict to ensure every request hits a fixture. No network calls escape.