Search everything...

Skill

write-fixtures

Writes test fixtures for @copilotkit/aimock to mock LLM responses, tool call sequences, errors, multi-turn loops, embeddings, structured outputs, and debug mismatches.

OpenAI

npx claudepluginhub copilotkit/aimock

Tool Access

This skill uses the workspace's default tool permissions.

Preview

aimock is a zero-dependency mock infrastructure for AI apps. Fixture-driven. Multi-provider (OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Cohere). Multimedia endpoints (image generation, text-to-speech, audio transcription, video generation). MCP, A2A, AG-UI, and vector DB mocking. Runs a real HTTP server on a real port — works across processes, unlike MSW-style inte...

SKILL.md

Similar Skills

memory-forensics

33.0k

Acquire memory dumps from live systems/VMs and analyze with Volatility 3 for processes, networks, DLLs, injections in incident response or malware hunts.

reverse-engineering

binary-analysis-patterns

33.0k

Provides x86-64/ARM disassembly patterns, calling conventions, control flow recognition for static analysis of executables and compiled binaries.

reverse-engineering

anti-reversing-techniques

33.0k

Identifies anti-debugging checks like IsDebuggerPresent, NtQueryInformationProcess in Windows binaries; suggests bypasses via patches/hooks/scripts for malware analysis, CTFs, authorized RE.

1 file

reverse-engineering

Stats

Stars565

Forks30

Last CommitApr 15, 2026

Actions

View Source View Plugin View on GitHub View README

write-fixtures

From aimock

Writes test fixtures for @copilotkit/aimock to mock LLM responses, tool call sequences, errors, multi-turn loops, embeddings, structured outputs, and debug mismatches.

npx claudepluginhub copilotkit/aimock

Tool Access

This skill uses the workspace's default tool permissions.

Preview

SKILL.md

Writing aimock Test Fixtures

What aimock Is

Core Mental Model

Fixtures = match criteria + response
First-match-wins — order matters
All providers share one fixture pool (provider adapters normalize to ChatCompletionRequest)
Fixtures are live — mutations after start() take effect immediately
Sequential responses are supported via sequenceIndex (match count tracked per fixture)

Match Field Reference

Field	Type	Matches Against
`userMessage`	`string`	Substring of last `role: "user"` message text
`userMessage`	`RegExp`	Pattern test on last `role: "user"` message text
`inputText`	`string`	Substring of embedding input text (concatenated if multiple inputs)
`inputText`	`RegExp`	Pattern test on embedding input text
`toolName`	`string`	Exact match on any tool in request's `tools[]` array (by `function.name`)
`toolCallId`	`string`	Exact match on `tool_call_id` of last `role: "tool"` message
`model`	`string`	Exact match on `req.model`
`model`	`RegExp`	Pattern test on `req.model`
`responseFormat`	`string`	Exact match on `req.response_format.type` (`"json_object"`, `"json_schema"`)
`sequenceIndex`	`number`	Matches only when this fixture's match count equals the given index (0-based)
`endpoint`	`string`	Restrict to endpoint type: `"chat"`, `"image"`, `"speech"`, `"transcription"`, `"video"`, `"embedding"`
`predicate`	`(req: ChatCompletionRequest) => boolean`	Custom function — full access to request

AND logic: all specified fields must match. Empty match {} = catch-all.

Multi-part content (e.g., [{type: "text", text: "hello"}]) is automatically extracted — userMessage matching works regardless of content format.

Response Types

Text

{
  content: "Hello!";
}

Tool Calls

// Preferred: object form (auto-stringified by the fixture loader)
{
  toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }];
}

// Also accepted: JSON string form (backward compatible)
{
  toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }];
}

Both object and string forms are accepted for arguments. The fixture loader auto-stringifies objects via JSON.stringify(). Object form is preferred for readability.

Embedding

{
  embedding: [0.1, 0.2, 0.3, -0.5, 0.8];
}

The embedding vector is returned for each input in the request. If no embedding fixture matches, deterministic embeddings are auto-generated from the input text hash — you only need fixtures when you want specific vectors.

Image

// Single image
{
  image: {
    url: "https://example.com/generated.png"
  }
}
// Multiple images
{
  images: [{ url: "https://example.com/1.png" }, { b64Json: "iVBOR..." }]
}

Use match: { endpoint: "image" } to prevent cross-matching with chat fixtures.

Speech (TTS)

{ audio: "base64-encoded-audio-data" }
// With explicit format (default: mp3)
{ audio: "base64-data", format: "opus" }

Transcription

// Simple
{ transcription: { text: "Hello world" } }
// Verbose with timestamps
{ transcription: { text: "Hello world", language: "en", duration: 2.5, words: [...], segments: [...] } }

Video

{ video: { id: "vid-1", status: "completed", url: "https://example.com/video.mp4" } }

Video uses async polling — POST /v1/videos creates, GET /v1/videos/{id} checks status.

Error

{ error: { message: "Rate limited", type: "rate_limit_error" }, status: 429 }

Chaos (Failure Injection)

The optional chaos field on a fixture enables probabilistic failure injection:

{
  chaos?: {
    dropRate?: number;      // Probability (0-1) of returning a 500 error
    malformedRate?: number; // Probability (0-1) of returning malformed JSON
    disconnectRate?: number; // Probability (0-1) of disconnecting mid-stream
  }
}

Rates are evaluated per-request. When triggered, the chaos failure replaces the normal response.

Common Patterns

Basic text fixture

mock.onMessage("hello", { content: "Hi there!" });

Tool call → tool result → final response (3-step agent loop)

The most common pattern. Fixture 1 triggers the tool call, fixture 2 handles the tool result.

// Step 1: User asks about weather → LLM calls tool
mock.onMessage("weather", {
  toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }],
});

// Step 2: Tool result comes back → LLM responds with text
mock.addFixture({
  match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  response: { content: "It's 72°F in San Francisco." },
});

Why predicate, not userMessage? After a tool call, the client replays the same conversation with the tool result appended. The user message hasn't changed — userMessage: "weather" would match the SAME fixture again, creating an infinite loop.

Embedding fixture

// Match specific input text
mock.onEmbedding("search query", {
  embedding: [0.1, 0.2, 0.3, 0.4, 0.5],
});

// Match with regex
mock.onEmbedding(/product.*description/, {
  embedding: [0.9, -0.1, 0.5, 0.3, 0.2],
});

Structured output / JSON mode

// onJsonOutput auto-sets responseFormat: "json_object" and stringifies objects
mock.onJsonOutput("extract entities", {
  entities: [
    { name: "Acme Corp", type: "company" },
    { name: "Jane Doe", type: "person" },
  ],
});

// Equivalent manual form:
mock.addFixture({
  match: { userMessage: "extract entities", responseFormat: "json_object" },
  response: { content: '{"entities":[...]}' },
});

Sequential responses (same match, different responses)

// First call returns tool call, second returns text
mock.on(
  { userMessage: "status", sequenceIndex: 0 },
  { toolCalls: [{ name: "check_status", arguments: {} }] },
);
mock.on({ userMessage: "status", sequenceIndex: 1 }, { content: "All systems operational." });

Match counts are tracked per fixture group and reset with reset() or resetMatchCounts().

Streaming physics (realistic timing)

mock.onMessage(
  "tell me a story",
  { content: "Once upon a time..." },
  {
    streamingProfile: {
      ttft: 200, // 200ms before first token
      tps: 30, // 30 tokens per second after that
      jitter: 0.1, // ±10% random variance
    },
  },
);

Predicate-based routing (same user message, different context)

Common in supervisor/orchestrator patterns where the system prompt changes:

mock.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages.find((m) => m.role === "system")?.content ?? "";
      return typeof sys === "string" && sys.includes("Flights found: false");
    },
  },
  response: { toolCalls: [{ name: "search_flights", arguments: {} }] },
});

Catch-all (always add one)

Prevents unmatched requests from returning 404 and crashing the test:

mock.addFixture({
  match: { predicate: () => true },
  response: { content: "I understand. How can I help?" },
});

Tool result catch-all with prependFixture

Must go at the front so it matches before substring-based fixtures:

mock.prependFixture({
  match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  response: { content: "Done!" },
});

Stream interruption simulation (v1.3.0+)

mock.onMessage(
  "long response",
  { content: "This will be cut short..." },
  {
    truncateAfterChunks: 3, // Stop after 3 SSE chunks
    disconnectAfterMs: 500, // Or disconnect after 500ms
  },
);

Chaos testing (probabilistic failures)

mock.addFixture({
  match: { userMessage: "flaky" },
  response: { content: "Sometimes works!" },
  chaos: { dropRate: 0.3 },
});

30% of requests matching this fixture will get a 500 error instead of the response. Can also use malformedRate (garbled JSON) or disconnectRate (connection dropped mid-stream).

Server-level chaos applies to ALL requests:

mock.setChaos({ dropRate: 0.1 }); // 10% of all requests fail
mock.clearChaos(); // Remove server-level chaos

Error injection (one-shot)

mock.nextRequestError(429, { message: "Rate limited", type: "rate_limit_error" });
// Next request gets 429, then fixture auto-removes itself

JSON fixture files

{
  "fixtures": [
    {
      "match": { "userMessage": "hello" },
      "response": { "content": "Hi!" }
    },
    {
      "match": { "userMessage": "weather" },
      "response": {
        "toolCalls": [
          {
            "name": "get_weather",
            "arguments": { "city": "SF", "units": "fahrenheit" }
          }
        ]
      }
    },
    {
      "match": { "inputText": "search query" },
      "response": { "embedding": [0.1, 0.2, 0.3] }
    },
    {
      "match": { "userMessage": "status", "sequenceIndex": 0 },
      "response": { "content": "First response" }
    }
  ]
}

JSON auto-stringify: In JSON fixture files, arguments and content can be objects — the loader auto-stringifies them with JSON.stringify(). The escaped-string form ("{\"city\":\"SF\"}") still works but objects are preferred for readability.

JSON files cannot use RegExp or predicate — those are code-only features. streamingProfile is supported in JSON fixture files.

Load with mock.loadFixtureFile("./fixtures/greetings.json") or mock.loadFixtureDir("./fixtures/").

API Endpoints

All providers share the same fixture pool — write fixtures once, they work for any endpoint.

Endpoint	Provider	Protocol
`POST /v1/chat/completions`	OpenAI	HTTP
`POST /v1/responses`	OpenAI	HTTP + WS
`POST /v1/messages`	Anthropic	HTTP
`POST /v1/embeddings`	OpenAI	HTTP
`POST /v1beta/models/{model}:{method}`	Google Gemini	HTTP
`POST /model/{modelId}/invoke`	AWS Bedrock	HTTP
`POST /openai/deployments/{id}/chat/completions`	Azure OpenAI	HTTP
`POST /openai/deployments/{id}/embeddings`	Azure OpenAI	HTTP
`GET /health`	—	HTTP
`GET /ready`	—	HTTP
`POST /model/{modelId}/invoke-with-response-stream`	AWS Bedrock	HTTP
`POST /model/{modelId}/converse`	AWS Bedrock	HTTP
`POST /model/{modelId}/converse-stream`	AWS Bedrock	HTTP
`POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:generateContent`	Vertex AI	HTTP
`POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:streamGenerateContent`	Vertex AI	HTTP
`POST /api/chat`	Ollama	HTTP
`POST /api/generate`	Ollama	HTTP
`GET /api/tags`	Ollama	HTTP
`POST /v2/chat`	Cohere	HTTP
`GET /metrics`	—	HTTP
`GET /v1/models`	OpenAI-compat	HTTP
`WS /v1/responses`	OpenAI	WebSocket
`WS /v1/realtime`	OpenAI	WebSocket
`WS /ws/google.ai...BidiGenerateContent`	Gemini Live	WebSocket
`POST /v1/images/generations`	OpenAI	HTTP
`POST /v1beta/models/{model}:predict`	Gemini Imagen	HTTP
`POST /v1/audio/speech`	OpenAI	HTTP
`POST /v1/audio/transcriptions`	OpenAI	HTTP
`POST /v1/videos`	OpenAI	HTTP
`GET /v1/videos/{id}`	OpenAI	HTTP

Response Template Overrides

Fixture responses can include optional override fields to control auto-generated envelope values. These are merged into the provider-specific response format (OpenAI, Claude, Gemini, Responses API).

Field	Type	Default	Description
`id`	string	auto-generated	Override response ID (e.g., `chatcmpl-custom`)
`created`	number	`Date.now()/1000`	Override Unix timestamp
`model`	string	echoes request	Override model name in response
`usage`	object	zeroed	Override token counts: `{ prompt_tokens, completion_tokens, total_tokens }`. OpenAI Chat includes usage in response body; Responses API uses `response.usage`. When omitted, auto-computed from content length
`finishReason`	string	`"stop"` / `"tool_calls"`	Override finish reason. Mappings: `stop` -> `end_turn` (Claude), `STOP` (Gemini); `tool_calls` -> `tool_use` (Claude), `FUNCTION_CALL` (Gemini); `length` -> `max_tokens` (Claude), `MAX_TOKENS` (Gemini); `content_filter` -> `SAFETY` (Gemini), `failed` (Responses API)
`role`	string	`"assistant"`	Override message role
`systemFingerprint`	string	(omitted)	Add `system_fingerprint` to response

Example

mock.onMessage("hello", {
  content: "Hi!",
  model: "gpt-4-turbo-2024-04-09",
  usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
  systemFingerprint: "fp_abc123",
});

In JSON fixtures

{
  "match": { "userMessage": "hello" },
  "response": {
    "content": "Hi!",
    "model": "gpt-4-turbo-2024-04-09",
    "usage": { "prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15 },
    "systemFingerprint": "fp_abc123"
  }
}

These fields map correctly across all provider formats — for example, finishReason: "stop" becomes finish_reason: "stop" in OpenAI, stop_reason: "end_turn" in Claude, and finishReason: "STOP" in Gemini.

Provider Support Matrix

Feature	OpenAI Chat	OpenAI Responses	Claude	Gemini	Bedrock	Azure	Ollama	Cohere
Text	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Tool Calls	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Content + Tool Calls	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Streaming	SSE	SSE	SSE	SSE	Binary	SSE	NDJSON	SSE
Reasoning	Yes	Yes	Yes	Yes	Yes	Yes	--	--
Web Searches	--	Yes	--	--	--	--	--	--
Response Overrides	Yes	Yes	Yes	Yes	--	Yes	--	--

Critical Gotchas

Order matters — first match wins. Specific fixtures before general ones. Use prependFixture() to force priority.
arguments accepts both objects and strings — "arguments": {"key":"value"} (preferred, auto-stringified) or "arguments": "{\"key\":\"value\"}" (legacy). The same applies to content fields that contain JSON. The fixture loader detects typeof === "object" and calls JSON.stringify() automatically.
Latency is per-chunk, not total — latency: 100 means 100ms between each SSE chunk, not 100ms total response time. Similarly, truncateAfterChunks and disconnectAfterMs are for simulating stream interruptions (added in v1.3.0).
streamingProfile takes precedence over latency — when both are set on a fixture, streamingProfile controls timing. Use one or the other.
Tool result messages don't change the user message — after a tool call, the client sends the same conversation + tool result. Matching on userMessage will hit the SAME fixture again → infinite loop. Always use predicate checking role === "tool" for tool results.
clearFixtures() preserves the array reference — uses .length = 0, not reassignment. The running server reads the same array object.
Journal records everything — including 404 "no match" responses. Use mock.getLastRequest() to debug mismatches.
All providers share fixtures — a fixture matching "hello" works whether the request comes via /v1/chat/completions (OpenAI), /v1/messages (Anthropic), Gemini, Bedrock, or Azure endpoints.
WebSocket uses the same fixture pool — no special setup needed for WebSocket-based APIs (OpenAI Responses WS, Realtime, Gemini Live).
Embeddings auto-generate if no fixture matches — deterministic vectors are generated from the input text hash. You don't need a catch-all for embedding requests.
Sequential response counts are tracked per fixture — counts reset with reset() or resetMatchCounts(). The count increments after each match of that fixture group (all fixtures sharing the same non-sequenceIndex match fields).
Bedrock uses Anthropic Messages format internally — the adapter normalizes Bedrock requests to ChatCompletionRequest, so the same fixtures work. Bedrock supports both non-streaming (/invoke, /converse) and streaming (/invoke-with-response-stream, /converse-stream) endpoints.
Azure OpenAI routes through the same handlers — /openai/deployments/{id}/chat/completions maps to the completions handler, /openai/deployments/{id}/embeddings maps to the embeddings handler. Fixtures work unchanged.
Ollama defaults to streaming — opposite of OpenAI. Set stream: false explicitly in the request for non-streaming responses.
Ollama tool call arguments is an object, not a JSON string — unlike OpenAI where arguments is a JSON string, Ollama sends and expects a plain object.
Bedrock streaming uses binary Event Stream format — not SSE. The invoke-with-response-stream and converse-stream endpoints use AWS Event Stream binary encoding.
Vertex AI routes to the same handler as consumer Gemini — the same fixtures work for both Vertex AI (/v1/projects/.../models/{m}:generateContent) and consumer Gemini (/v1beta/models/{model}:generateContent).
Cohere requires model field — returns 400 if model is missing from the request body.

Mount & Composition

mount() API

Mount additional mock services onto a running LLMock server. All services share one port, one health endpoint, and one request journal.

const llm = new LLMock({ port: 5555 });
llm.mount("/mcp", mcpMock); // MCP tools at /mcp
llm.mount("/a2a", a2aMock); // A2A agents at /a2a
llm.mount("/vector", vectorMock); // Vector DB at /vector
await llm.start();

Any object implementing the Mountable interface (a handleRequest method that returns boolean) can be mounted. Path prefixes are stripped before the service sees the request — /mcp/tools/list arrives as /tools/list.

createMockSuite()

Unified lifecycle for LLMock + mounted services:

import { createMockSuite } from "@copilotkit/aimock";

const suite = createMockSuite({
  port: 0,
  fixtures: "./fixtures",
  services: { "/mcp": mcpMock, "/a2a": a2aMock },
});

await suite.start();
// suite.llm — the LLMock instance
// suite.url — base URL

afterEach(() => suite.reset()); // resets everything
afterAll(() => suite.stop());

aimock CLI config file

The aimock CLI reads a JSON config and serves all services on one port:

aimock --config aimock.json --port 4010

Config format:

{
  "llm": {
    "fixtures": "./fixtures",
    "latency": 0,
    "metrics": true
  },
  "services": {
    "/mcp": { "type": "mcp", "tools": "./mcp-tools.json" },
    "/a2a": { "type": "a2a", "agents": "./a2a-agents.json" }
  }
}

VectorMock

Mock vector database server for testing RAG pipelines. Supports Pinecone, Qdrant, and ChromaDB API formats.

import { VectorMock } from "@copilotkit/aimock";

const vector = new VectorMock();

// Create a collection and register query results
vector.addCollection("docs", { dimension: 1536 });
vector.onQuery("docs", [
  { id: "doc-1", score: 0.95, metadata: { title: "Getting Started" } },
  { id: "doc-2", score: 0.87, metadata: { title: "API Reference" } },
]);

// Upsert vectors
vector.upsert("docs", [
  { id: "v1", values: [0.1, 0.2, ...], metadata: { title: "Intro" } },
]);

// Dynamic query handler
vector.onQuery("docs", (query) => {
  return [{ id: "result", score: 1.0, metadata: { topK: query.topK } }];
});

// Standalone or mounted
const url = await vector.start();
// Or: llm.mount("/vector", vector);

VectorMock endpoints

Provider	Endpoints
Pinecone	`POST /query`, `POST /vectors/upsert`, `POST /vectors/delete`, `GET /describe-index-stats`
Qdrant	`POST /collections/{name}/points/search`, `PUT /collections/{name}/points`, `POST /collections/{name}/points/delete`
ChromaDB	`POST /api/v1/collections/{id}/query`, `POST /api/v1/collections/{id}/add`, `GET /api/v1/collections`, `DELETE /api/v1/collections/{id}`

Service Mocks (Search / Rerank / Moderation)

Built-in mocks for common AI-adjacent services. Registered on the LLMock instance directly — no separate server needed.

Search (Tavily-compatible)

// POST /search — matches request `query` field
mock.onSearch("weather", [
  { title: "Weather Report", url: "https://example.com", content: "Sunny today" },
]);
mock.onSearch(/stock\s+price/i, [
  { title: "ACME Stock", url: "https://example.com", content: "$42", score: 0.95 },
]);

Rerank (Cohere-compatible)

// POST /v2/rerank — matches request `query` field
mock.onRerank("machine learning", [
  { index: 0, relevance_score: 0.99 },
  { index: 2, relevance_score: 0.85 },
]);

Moderation (OpenAI-compatible)

// POST /v1/moderations — matches request `input` field
mock.onModerate("violent", {
  flagged: true,
  categories: { violence: true, hate: false },
  category_scores: { violence: 0.95, hate: 0.01 },
});

// Catch-all — everything passes
mock.onModerate(/.*/, { flagged: false, categories: {} });

Pattern matching

All three services use the same matching logic:

String patterns — case-insensitive substring match
RegExp patterns — full regex test
First match wins — register specific patterns before catch-alls

Debugging Fixture Mismatches

When a fixture doesn't match:

Inspect what the server received: mock.getLastRequest() → check body.messages array
Check fixture order: mock.getFixtures() returns fixtures in registration order
For userMessage: match is against the LAST role: "user" message only, substring match (not exact)
Check the journal: mock.getRequests() shows all requests including which fixture matched (or null for 404)

E2E Test Setup Pattern

import { LLMock } from "@copilotkit/aimock";

// Setup — port: 0 picks a random available port
const mock = new LLMock({ port: 0 });
mock.loadFixtureDir("./fixtures");
await mock.start();
process.env.OPENAI_BASE_URL = `${mock.url}/v1`;

// Per-test cleanup
afterEach(() => mock.reset()); // clears fixtures AND journal

// Teardown
afterAll(async () => await mock.stop());

Static factory shorthand

const mock = await LLMock.create({ port: 0 }); // creates + starts in one call

API Quick Reference

Method	Purpose
`addFixture(f)`	Append fixture (last priority)
`addFixtures(f[])`	Append multiple
`prependFixture(f)`	Insert at front (highest priority)
`clearFixtures()`	Remove all fixtures
`getFixtures()`	Read current fixture list
`on(match, response, opts?)`	Shorthand for `addFixture`
`onMessage(pattern, response, opts?)`	Match by user message
`onEmbedding(pattern, response, opts?)`	Match by embedding input text
`onJsonOutput(pattern, json, opts?)`	Match by user message with `responseFormat`
`onToolCall(name, response, opts?)`	Match by tool name in `tools[]`
`onToolResult(id, response, opts?)`	Match by `tool_call_id`
`nextRequestError(status, body?)`	One-shot error, auto-removes
`loadFixtureFile(path)`	Load JSON fixture file
`loadFixtureDir(path)`	Load all JSON files in directory
`start()`	Start server, returns URL
`stop()`	Stop server
`reset()`	Clear fixtures + journal + match counts
`resetMatchCounts()`	Clear sequence match counts only
`getRequests()`	All journal entries
`getLastRequest()`	Most recent journal entry
`clearRequests()`	Clear journal only
`setChaos(opts)`	Set server-level chaos rates
`clearChaos()`	Remove server-level chaos
`onSearch(pattern, results)`	Match search requests by query
`onRerank(pattern, results)`	Match rerank requests by query
`onModerate(pattern, result)`	Match moderation requests by input
`onImage(pattern, response)`	Match image generation by prompt
`onSpeech(pattern, response)`	Match TTS by input text
`onTranscription(response)`	Match audio transcription
`onVideo(pattern, response)`	Match video generation by prompt
`mount(path, handler)`	Mount a Mountable (VectorMock, etc.)
`url` / `baseUrl`	Server URL (throws if not started)
`port`	Server port number

Sequential responses use on() with sequenceIndex in the match — there is no dedicated convenience method.

Record-and-Replay (VCR Mode)

aimock supports a VCR-style record-and-replay workflow for ALL endpoints including multimedia (image, TTS, transcription, video): unmatched requests are proxied to real provider APIs, and the responses are saved as standard aimock fixture files for deterministic replay. Binary TTS responses are base64-encoded with format derived from Content-Type. Multimedia fixtures automatically include endpoint in their match criteria for correct routing on replay.

CLI usage

# Record mode: proxy unmatched requests to real OpenAI and Anthropic APIs
aimock --record \
  --provider-openai https://api.openai.com \
  --provider-anthropic https://api.anthropic.com \
  -f ./fixtures

# Strict mode: fail on unmatched requests (no proxying, no catch-all 404)
aimock --strict -f ./fixtures

--record enables proxy-on-miss. Requires at least one --provider-* flag.
--strict returns a 503 error when no fixture matches AND no proxy is configured (or the proxy attempt fails), instead of silently returning a 404. The proxy is still tried first when --record is set. Use this in CI to prevent unmatched requests from slipping through as silent 404s.
Provider flags: --provider-openai, --provider-anthropic, --provider-gemini, --provider-vertexai, --provider-bedrock, --provider-azure, --provider-ollama, --provider-cohere.

How it works

Existing fixtures are served first — the router checks all loaded fixtures before considering the proxy.
Misses are proxied — if no fixture matches and recording is enabled, the request is forwarded to the real provider API. Upstream URL path prefixes are preserved (e.g., https://gateway.company.com/llm/v1 correctly proxies to /llm/v1/chat/completions).
All request headers are forwarded (auth headers NOT saved) — all client request headers are passed through to the upstream provider, except hop-by-hop headers and host/content-length/cookie/accept-encoding. Auth headers (Authorization, x-api-key, api-key) are forwarded but stripped from the recorded fixture.
Responses are saved as standard fixtures — recorded files land in {fixturePath}/recorded/ and use the same JSON format as hand-written fixtures. Nothing special about them.
Streaming responses are collapsed — SSE streams are collapsed into a single text or tool-call response for the fixture. The original streaming format is preserved in the live proxy response.
Base64 embedding decoding — when the upstream returns base64-encoded embeddings (the default encoding_format in Python's openai SDK), the recorder decodes them into float arrays so fixtures contain readable numeric data instead of opaque base64 strings.
Loud logging — every proxy hit logs at warn level so you can see exactly which requests are being forwarded.

Programmatic API

const mock = new LLMock({ port: 0 });
await mock.start();

// Enable recording at runtime
mock.enableRecording({
  providers: {
    openai: "https://api.openai.com",
    anthropic: "https://api.anthropic.com",
  },
  fixturePath: "./fixtures/recorded",
});

// ... run tests that hit real APIs for uncovered cases ...

// Disable recording (back to fixture-only mode)
mock.disableRecording();

Workflow

Bootstrap: Run your test suite with --record and provider URLs. All requests that don't match existing fixtures are proxied and recorded.
Review: Check the recorded fixtures in {fixturePath}/recorded/. Edit or reorganize as needed.
Lock down: Run your test suite with --strict to ensure every request hits a fixture. No network calls escape.
Maintain: When APIs change, delete stale fixtures and re-record.

Similar Skills

memory-forensics

33.0k

Acquire memory dumps from live systems/VMs and analyze with Volatility 3 for processes, networks, DLLs, injections in incident response or malware hunts.

reverse-engineering

binary-analysis-patterns

33.0k

Provides x86-64/ARM disassembly patterns, calling conventions, control flow recognition for static analysis of executables and compiled binaries.

reverse-engineering

anti-reversing-techniques

33.0k

Identifies anti-debugging checks like IsDebuggerPresent, NtQueryInformationProcess in Windows binaries; suggests bypasses via patches/hooks/scripts for malware analysis, CTFs, authorized RE.

1 file

reverse-engineering

Stats

Stars565

Forks30

Last CommitApr 15, 2026

Actions

View Source View Plugin View on GitHub View README

Writing aimock Test Fixtures

What aimock Is

Core Mental Model

Fixtures = match criteria + response
First-match-wins — order matters
All providers share one fixture pool (provider adapters normalize to ChatCompletionRequest)
Fixtures are live — mutations after start() take effect immediately
Sequential responses are supported via sequenceIndex (match count tracked per fixture)

Match Field Reference

Field	Type	Matches Against
`userMessage`	`string`	Substring of last `role: "user"` message text
`userMessage`	`RegExp`	Pattern test on last `role: "user"` message text
`inputText`	`string`	Substring of embedding input text (concatenated if multiple inputs)
`inputText`	`RegExp`	Pattern test on embedding input text
`toolName`	`string`	Exact match on any tool in request's `tools[]` array (by `function.name`)
`toolCallId`	`string`	Exact match on `tool_call_id` of last `role: "tool"` message
`model`	`string`	Exact match on `req.model`
`model`	`RegExp`	Pattern test on `req.model`
`responseFormat`	`string`	Exact match on `req.response_format.type` (`"json_object"`, `"json_schema"`)
`sequenceIndex`	`number`	Matches only when this fixture's match count equals the given index (0-based)
`endpoint`	`string`	Restrict to endpoint type: `"chat"`, `"image"`, `"speech"`, `"transcription"`, `"video"`, `"embedding"`
`predicate`	`(req: ChatCompletionRequest) => boolean`	Custom function — full access to request

AND logic: all specified fields must match. Empty match {} = catch-all.

Multi-part content (e.g., [{type: "text", text: "hello"}]) is automatically extracted — userMessage matching works regardless of content format.

Response Types

Text

{
  content: "Hello!";
}

Tool Calls

// Preferred: object form (auto-stringified by the fixture loader)
{
  toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }];
}

// Also accepted: JSON string form (backward compatible)
{
  toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }];
}

Both object and string forms are accepted for arguments. The fixture loader auto-stringifies objects via JSON.stringify(). Object form is preferred for readability.

Embedding

{
  embedding: [0.1, 0.2, 0.3, -0.5, 0.8];
}

Image

// Single image
{
  image: {
    url: "https://example.com/generated.png"
  }
}
// Multiple images
{
  images: [{ url: "https://example.com/1.png" }, { b64Json: "iVBOR..." }]
}

Use match: { endpoint: "image" } to prevent cross-matching with chat fixtures.

Speech (TTS)

{ audio: "base64-encoded-audio-data" }
// With explicit format (default: mp3)
{ audio: "base64-data", format: "opus" }

Transcription

// Simple
{ transcription: { text: "Hello world" } }
// Verbose with timestamps
{ transcription: { text: "Hello world", language: "en", duration: 2.5, words: [...], segments: [...] } }

Video

{ video: { id: "vid-1", status: "completed", url: "https://example.com/video.mp4" } }

Video uses async polling — POST /v1/videos creates, GET /v1/videos/{id} checks status.

Error

{ error: { message: "Rate limited", type: "rate_limit_error" }, status: 429 }

Chaos (Failure Injection)

The optional chaos field on a fixture enables probabilistic failure injection:

{
  chaos?: {
    dropRate?: number;      // Probability (0-1) of returning a 500 error
    malformedRate?: number; // Probability (0-1) of returning malformed JSON
    disconnectRate?: number; // Probability (0-1) of disconnecting mid-stream
  }
}

Rates are evaluated per-request. When triggered, the chaos failure replaces the normal response.

Common Patterns

Basic text fixture

mock.onMessage("hello", { content: "Hi there!" });

Tool call → tool result → final response (3-step agent loop)

The most common pattern. Fixture 1 triggers the tool call, fixture 2 handles the tool result.

// Step 1: User asks about weather → LLM calls tool
mock.onMessage("weather", {
  toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }],
});

// Step 2: Tool result comes back → LLM responds with text
mock.addFixture({
  match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  response: { content: "It's 72°F in San Francisco." },
});

Embedding fixture

// Match specific input text
mock.onEmbedding("search query", {
  embedding: [0.1, 0.2, 0.3, 0.4, 0.5],
});

// Match with regex
mock.onEmbedding(/product.*description/, {
  embedding: [0.9, -0.1, 0.5, 0.3, 0.2],
});

Structured output / JSON mode

// onJsonOutput auto-sets responseFormat: "json_object" and stringifies objects
mock.onJsonOutput("extract entities", {
  entities: [
    { name: "Acme Corp", type: "company" },
    { name: "Jane Doe", type: "person" },
  ],
});

// Equivalent manual form:
mock.addFixture({
  match: { userMessage: "extract entities", responseFormat: "json_object" },
  response: { content: '{"entities":[...]}' },
});

Sequential responses (same match, different responses)

// First call returns tool call, second returns text
mock.on(
  { userMessage: "status", sequenceIndex: 0 },
  { toolCalls: [{ name: "check_status", arguments: {} }] },
);
mock.on({ userMessage: "status", sequenceIndex: 1 }, { content: "All systems operational." });

Match counts are tracked per fixture group and reset with reset() or resetMatchCounts().

Streaming physics (realistic timing)

mock.onMessage(
  "tell me a story",
  { content: "Once upon a time..." },
  {
    streamingProfile: {
      ttft: 200, // 200ms before first token
      tps: 30, // 30 tokens per second after that
      jitter: 0.1, // ±10% random variance
    },
  },
);

Predicate-based routing (same user message, different context)

Common in supervisor/orchestrator patterns where the system prompt changes:

mock.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages.find((m) => m.role === "system")?.content ?? "";
      return typeof sys === "string" && sys.includes("Flights found: false");
    },
  },
  response: { toolCalls: [{ name: "search_flights", arguments: {} }] },
});

Catch-all (always add one)

Prevents unmatched requests from returning 404 and crashing the test:

mock.addFixture({
  match: { predicate: () => true },
  response: { content: "I understand. How can I help?" },
});

Tool result catch-all with prependFixture

Must go at the front so it matches before substring-based fixtures:

mock.prependFixture({
  match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  response: { content: "Done!" },
});

Stream interruption simulation (v1.3.0+)

mock.onMessage(
  "long response",
  { content: "This will be cut short..." },
  {
    truncateAfterChunks: 3, // Stop after 3 SSE chunks
    disconnectAfterMs: 500, // Or disconnect after 500ms
  },
);

Chaos testing (probabilistic failures)

mock.addFixture({
  match: { userMessage: "flaky" },
  response: { content: "Sometimes works!" },
  chaos: { dropRate: 0.3 },
});

30% of requests matching this fixture will get a 500 error instead of the response. Can also use malformedRate (garbled JSON) or disconnectRate (connection dropped mid-stream).

Server-level chaos applies to ALL requests:

mock.setChaos({ dropRate: 0.1 }); // 10% of all requests fail
mock.clearChaos(); // Remove server-level chaos

Error injection (one-shot)

mock.nextRequestError(429, { message: "Rate limited", type: "rate_limit_error" });
// Next request gets 429, then fixture auto-removes itself

JSON fixture files

{
  "fixtures": [
    {
      "match": { "userMessage": "hello" },
      "response": { "content": "Hi!" }
    },
    {
      "match": { "userMessage": "weather" },
      "response": {
        "toolCalls": [
          {
            "name": "get_weather",
            "arguments": { "city": "SF", "units": "fahrenheit" }
          }
        ]
      }
    },
    {
      "match": { "inputText": "search query" },
      "response": { "embedding": [0.1, 0.2, 0.3] }
    },
    {
      "match": { "userMessage": "status", "sequenceIndex": 0 },
      "response": { "content": "First response" }
    }
  ]
}

JSON files cannot use RegExp or predicate — those are code-only features. streamingProfile is supported in JSON fixture files.

Load with mock.loadFixtureFile("./fixtures/greetings.json") or mock.loadFixtureDir("./fixtures/").

API Endpoints

All providers share the same fixture pool — write fixtures once, they work for any endpoint.

Endpoint	Provider	Protocol
`POST /v1/chat/completions`	OpenAI	HTTP
`POST /v1/responses`	OpenAI	HTTP + WS
`POST /v1/messages`	Anthropic	HTTP
`POST /v1/embeddings`	OpenAI	HTTP
`POST /v1beta/models/{model}:{method}`	Google Gemini	HTTP
`POST /model/{modelId}/invoke`	AWS Bedrock	HTTP
`POST /openai/deployments/{id}/chat/completions`	Azure OpenAI	HTTP
`POST /openai/deployments/{id}/embeddings`	Azure OpenAI	HTTP
`GET /health`	—	HTTP
`GET /ready`	—	HTTP
`POST /model/{modelId}/invoke-with-response-stream`	AWS Bedrock	HTTP
`POST /model/{modelId}/converse`	AWS Bedrock	HTTP
`POST /model/{modelId}/converse-stream`	AWS Bedrock	HTTP
`POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:generateContent`	Vertex AI	HTTP
`POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:streamGenerateContent`	Vertex AI	HTTP
`POST /api/chat`	Ollama	HTTP
`POST /api/generate`	Ollama	HTTP
`GET /api/tags`	Ollama	HTTP
`POST /v2/chat`	Cohere	HTTP
`GET /metrics`	—	HTTP
`GET /v1/models`	OpenAI-compat	HTTP
`WS /v1/responses`	OpenAI	WebSocket
`WS /v1/realtime`	OpenAI	WebSocket
`WS /ws/google.ai...BidiGenerateContent`	Gemini Live	WebSocket
`POST /v1/images/generations`	OpenAI	HTTP
`POST /v1beta/models/{model}:predict`	Gemini Imagen	HTTP
`POST /v1/audio/speech`	OpenAI	HTTP
`POST /v1/audio/transcriptions`	OpenAI	HTTP
`POST /v1/videos`	OpenAI	HTTP
`GET /v1/videos/{id}`	OpenAI	HTTP

Response Template Overrides

Fixture responses can include optional override fields to control auto-generated envelope values. These are merged into the provider-specific response format (OpenAI, Claude, Gemini, Responses API).

Field	Type	Default	Description
`id`	string	auto-generated	Override response ID (e.g., `chatcmpl-custom`)
`created`	number	`Date.now()/1000`	Override Unix timestamp
`model`	string	echoes request	Override model name in response
`usage`	object	zeroed	Override token counts: `{ prompt_tokens, completion_tokens, total_tokens }`. OpenAI Chat includes usage in response body; Responses API uses `response.usage`. When omitted, auto-computed from content length
`finishReason`	string	`"stop"` / `"tool_calls"`	Override finish reason. Mappings: `stop` -> `end_turn` (Claude), `STOP` (Gemini); `tool_calls` -> `tool_use` (Claude), `FUNCTION_CALL` (Gemini); `length` -> `max_tokens` (Claude), `MAX_TOKENS` (Gemini); `content_filter` -> `SAFETY` (Gemini), `failed` (Responses API)
`role`	string	`"assistant"`	Override message role
`systemFingerprint`	string	(omitted)	Add `system_fingerprint` to response

Example

mock.onMessage("hello", {
  content: "Hi!",
  model: "gpt-4-turbo-2024-04-09",
  usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
  systemFingerprint: "fp_abc123",
});

In JSON fixtures

{
  "match": { "userMessage": "hello" },
  "response": {
    "content": "Hi!",
    "model": "gpt-4-turbo-2024-04-09",
    "usage": { "prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15 },
    "systemFingerprint": "fp_abc123"
  }
}

Provider Support Matrix

Feature	OpenAI Chat	OpenAI Responses	Claude	Gemini	Bedrock	Azure	Ollama	Cohere
Text	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Tool Calls	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Content + Tool Calls	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Streaming	SSE	SSE	SSE	SSE	Binary	SSE	NDJSON	SSE
Reasoning	Yes	Yes	Yes	Yes	Yes	Yes	--	--
Web Searches	--	Yes	--	--	--	--	--	--
Response Overrides	Yes	Yes	Yes	Yes	--	Yes	--	--

Critical Gotchas

Order matters — first match wins. Specific fixtures before general ones. Use prependFixture() to force priority.
arguments accepts both objects and strings — "arguments": {"key":"value"} (preferred, auto-stringified) or "arguments": "{\"key\":\"value\"}" (legacy). The same applies to content fields that contain JSON. The fixture loader detects typeof === "object" and calls JSON.stringify() automatically.
Latency is per-chunk, not total — latency: 100 means 100ms between each SSE chunk, not 100ms total response time. Similarly, truncateAfterChunks and disconnectAfterMs are for simulating stream interruptions (added in v1.3.0).
streamingProfile takes precedence over latency — when both are set on a fixture, streamingProfile controls timing. Use one or the other.
Tool result messages don't change the user message — after a tool call, the client sends the same conversation + tool result. Matching on userMessage will hit the SAME fixture again → infinite loop. Always use predicate checking role === "tool" for tool results.
clearFixtures() preserves the array reference — uses .length = 0, not reassignment. The running server reads the same array object.
Journal records everything — including 404 "no match" responses. Use mock.getLastRequest() to debug mismatches.
All providers share fixtures — a fixture matching "hello" works whether the request comes via /v1/chat/completions (OpenAI), /v1/messages (Anthropic), Gemini, Bedrock, or Azure endpoints.
WebSocket uses the same fixture pool — no special setup needed for WebSocket-based APIs (OpenAI Responses WS, Realtime, Gemini Live).
Embeddings auto-generate if no fixture matches — deterministic vectors are generated from the input text hash. You don't need a catch-all for embedding requests.
Sequential response counts are tracked per fixture — counts reset with reset() or resetMatchCounts(). The count increments after each match of that fixture group (all fixtures sharing the same non-sequenceIndex match fields).
Bedrock uses Anthropic Messages format internally — the adapter normalizes Bedrock requests to ChatCompletionRequest, so the same fixtures work. Bedrock supports both non-streaming (/invoke, /converse) and streaming (/invoke-with-response-stream, /converse-stream) endpoints.
Azure OpenAI routes through the same handlers — /openai/deployments/{id}/chat/completions maps to the completions handler, /openai/deployments/{id}/embeddings maps to the embeddings handler. Fixtures work unchanged.
Ollama defaults to streaming — opposite of OpenAI. Set stream: false explicitly in the request for non-streaming responses.
Ollama tool call arguments is an object, not a JSON string — unlike OpenAI where arguments is a JSON string, Ollama sends and expects a plain object.
Bedrock streaming uses binary Event Stream format — not SSE. The invoke-with-response-stream and converse-stream endpoints use AWS Event Stream binary encoding.
Vertex AI routes to the same handler as consumer Gemini — the same fixtures work for both Vertex AI (/v1/projects/.../models/{m}:generateContent) and consumer Gemini (/v1beta/models/{model}:generateContent).
Cohere requires model field — returns 400 if model is missing from the request body.

Mount & Composition

mount() API

Mount additional mock services onto a running LLMock server. All services share one port, one health endpoint, and one request journal.

const llm = new LLMock({ port: 5555 });
llm.mount("/mcp", mcpMock); // MCP tools at /mcp
llm.mount("/a2a", a2aMock); // A2A agents at /a2a
llm.mount("/vector", vectorMock); // Vector DB at /vector
await llm.start();

createMockSuite()

Unified lifecycle for LLMock + mounted services:

import { createMockSuite } from "@copilotkit/aimock";

const suite = createMockSuite({
  port: 0,
  fixtures: "./fixtures",
  services: { "/mcp": mcpMock, "/a2a": a2aMock },
});

await suite.start();
// suite.llm — the LLMock instance
// suite.url — base URL

afterEach(() => suite.reset()); // resets everything
afterAll(() => suite.stop());

aimock CLI config file

The aimock CLI reads a JSON config and serves all services on one port:

aimock --config aimock.json --port 4010

Config format:

{
  "llm": {
    "fixtures": "./fixtures",
    "latency": 0,
    "metrics": true
  },
  "services": {
    "/mcp": { "type": "mcp", "tools": "./mcp-tools.json" },
    "/a2a": { "type": "a2a", "agents": "./a2a-agents.json" }
  }
}

VectorMock

Mock vector database server for testing RAG pipelines. Supports Pinecone, Qdrant, and ChromaDB API formats.

import { VectorMock } from "@copilotkit/aimock";

const vector = new VectorMock();

// Create a collection and register query results
vector.addCollection("docs", { dimension: 1536 });
vector.onQuery("docs", [
  { id: "doc-1", score: 0.95, metadata: { title: "Getting Started" } },
  { id: "doc-2", score: 0.87, metadata: { title: "API Reference" } },
]);

// Upsert vectors
vector.upsert("docs", [
  { id: "v1", values: [0.1, 0.2, ...], metadata: { title: "Intro" } },
]);

// Dynamic query handler
vector.onQuery("docs", (query) => {
  return [{ id: "result", score: 1.0, metadata: { topK: query.topK } }];
});

// Standalone or mounted
const url = await vector.start();
// Or: llm.mount("/vector", vector);

VectorMock endpoints

Provider	Endpoints
Pinecone	`POST /query`, `POST /vectors/upsert`, `POST /vectors/delete`, `GET /describe-index-stats`
Qdrant	`POST /collections/{name}/points/search`, `PUT /collections/{name}/points`, `POST /collections/{name}/points/delete`
ChromaDB	`POST /api/v1/collections/{id}/query`, `POST /api/v1/collections/{id}/add`, `GET /api/v1/collections`, `DELETE /api/v1/collections/{id}`

Service Mocks (Search / Rerank / Moderation)

Built-in mocks for common AI-adjacent services. Registered on the LLMock instance directly — no separate server needed.

Search (Tavily-compatible)

// POST /search — matches request `query` field
mock.onSearch("weather", [
  { title: "Weather Report", url: "https://example.com", content: "Sunny today" },
]);
mock.onSearch(/stock\s+price/i, [
  { title: "ACME Stock", url: "https://example.com", content: "$42", score: 0.95 },
]);

Rerank (Cohere-compatible)

// POST /v2/rerank — matches request `query` field
mock.onRerank("machine learning", [
  { index: 0, relevance_score: 0.99 },
  { index: 2, relevance_score: 0.85 },
]);

Moderation (OpenAI-compatible)

// POST /v1/moderations — matches request `input` field
mock.onModerate("violent", {
  flagged: true,
  categories: { violence: true, hate: false },
  category_scores: { violence: 0.95, hate: 0.01 },
});

// Catch-all — everything passes
mock.onModerate(/.*/, { flagged: false, categories: {} });

Pattern matching

All three services use the same matching logic:

String patterns — case-insensitive substring match
RegExp patterns — full regex test
First match wins — register specific patterns before catch-alls

Debugging Fixture Mismatches

When a fixture doesn't match:

Inspect what the server received: mock.getLastRequest() → check body.messages array
Check fixture order: mock.getFixtures() returns fixtures in registration order
For userMessage: match is against the LAST role: "user" message only, substring match (not exact)
Check the journal: mock.getRequests() shows all requests including which fixture matched (or null for 404)

E2E Test Setup Pattern

import { LLMock } from "@copilotkit/aimock";

// Setup — port: 0 picks a random available port
const mock = new LLMock({ port: 0 });
mock.loadFixtureDir("./fixtures");
await mock.start();
process.env.OPENAI_BASE_URL = `${mock.url}/v1`;

// Per-test cleanup
afterEach(() => mock.reset()); // clears fixtures AND journal

// Teardown
afterAll(async () => await mock.stop());

Static factory shorthand

const mock = await LLMock.create({ port: 0 }); // creates + starts in one call

API Quick Reference

Method	Purpose
`addFixture(f)`	Append fixture (last priority)
`addFixtures(f[])`	Append multiple
`prependFixture(f)`	Insert at front (highest priority)
`clearFixtures()`	Remove all fixtures
`getFixtures()`	Read current fixture list
`on(match, response, opts?)`	Shorthand for `addFixture`
`onMessage(pattern, response, opts?)`	Match by user message
`onEmbedding(pattern, response, opts?)`	Match by embedding input text
`onJsonOutput(pattern, json, opts?)`	Match by user message with `responseFormat`
`onToolCall(name, response, opts?)`	Match by tool name in `tools[]`
`onToolResult(id, response, opts?)`	Match by `tool_call_id`
`nextRequestError(status, body?)`	One-shot error, auto-removes
`loadFixtureFile(path)`	Load JSON fixture file
`loadFixtureDir(path)`	Load all JSON files in directory
`start()`	Start server, returns URL
`stop()`	Stop server
`reset()`	Clear fixtures + journal + match counts
`resetMatchCounts()`	Clear sequence match counts only
`getRequests()`	All journal entries
`getLastRequest()`	Most recent journal entry
`clearRequests()`	Clear journal only
`setChaos(opts)`	Set server-level chaos rates
`clearChaos()`	Remove server-level chaos
`onSearch(pattern, results)`	Match search requests by query
`onRerank(pattern, results)`	Match rerank requests by query
`onModerate(pattern, result)`	Match moderation requests by input
`onImage(pattern, response)`	Match image generation by prompt
`onSpeech(pattern, response)`	Match TTS by input text
`onTranscription(response)`	Match audio transcription
`onVideo(pattern, response)`	Match video generation by prompt
`mount(path, handler)`	Mount a Mountable (VectorMock, etc.)
`url` / `baseUrl`	Server URL (throws if not started)
`port`	Server port number

Sequential responses use on() with sequenceIndex in the match — there is no dedicated convenience method.

Record-and-Replay (VCR Mode)

CLI usage

# Record mode: proxy unmatched requests to real OpenAI and Anthropic APIs
aimock --record \
  --provider-openai https://api.openai.com \
  --provider-anthropic https://api.anthropic.com \
  -f ./fixtures

# Strict mode: fail on unmatched requests (no proxying, no catch-all 404)
aimock --strict -f ./fixtures

--record enables proxy-on-miss. Requires at least one --provider-* flag.
--strict returns a 503 error when no fixture matches AND no proxy is configured (or the proxy attempt fails), instead of silently returning a 404. The proxy is still tried first when --record is set. Use this in CI to prevent unmatched requests from slipping through as silent 404s.
Provider flags: --provider-openai, --provider-anthropic, --provider-gemini, --provider-vertexai, --provider-bedrock, --provider-azure, --provider-ollama, --provider-cohere.

How it works

Existing fixtures are served first — the router checks all loaded fixtures before considering the proxy.
Misses are proxied — if no fixture matches and recording is enabled, the request is forwarded to the real provider API. Upstream URL path prefixes are preserved (e.g., https://gateway.company.com/llm/v1 correctly proxies to /llm/v1/chat/completions).
All request headers are forwarded (auth headers NOT saved) — all client request headers are passed through to the upstream provider, except hop-by-hop headers and host/content-length/cookie/accept-encoding. Auth headers (Authorization, x-api-key, api-key) are forwarded but stripped from the recorded fixture.
Responses are saved as standard fixtures — recorded files land in {fixturePath}/recorded/ and use the same JSON format as hand-written fixtures. Nothing special about them.
Streaming responses are collapsed — SSE streams are collapsed into a single text or tool-call response for the fixture. The original streaming format is preserved in the live proxy response.
Base64 embedding decoding — when the upstream returns base64-encoded embeddings (the default encoding_format in Python's openai SDK), the recorder decodes them into float arrays so fixtures contain readable numeric data instead of opaque base64 strings.
Loud logging — every proxy hit logs at warn level so you can see exactly which requests are being forwarded.

Programmatic API

const mock = new LLMock({ port: 0 });
await mock.start();

// Enable recording at runtime
mock.enableRecording({
  providers: {
    openai: "https://api.openai.com",
    anthropic: "https://api.anthropic.com",
  },
  fixturePath: "./fixtures/recorded",
});

// ... run tests that hit real APIs for uncovered cases ...

// Disable recording (back to fixture-only mode)
mock.disableRecording();

Workflow

Bootstrap: Run your test suite with --record and provider URLs. All requests that don't match existing fixtures are proxied and recorded.
Review: Check the recorded fixtures in {fixturePath}/recorded/. Edit or reorganize as needed.
Lock down: Run your test suite with --strict to ensure every request hits a fixture. No network calls escape.
Maintain: When APIs change, delete stale fixtures and re-record.