Skill

langchain-local-dev-loop

Build a fast, deterministic local test loop for LangChain 1.0 / LangGraph 1.0 — FakeListChatModel fixtures, pytest config, VCR cassettes with key redaction, warning-filter policy. Use when adding tests to a new chain, fixing a flaky test, or making integration tests reproducible. Trigger with "langchain pytest", "FakeListChatModel", "VCR langchain", "langchain test fixtures", "langchain integration test".

npx claudepluginhub flight505/skill-forge --plugin langchain-py-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(pytest:*)Bash(python:*)Bash(pip:*)

Preview

An engineer writes the most natural assertion possible:

Supporting Assets

references/fake-model-fixtures.mdreferences/langgraph-test-patterns.mdreferences/one-pager.mdreferences/pytest-config.mdreferences/vcr-cassette-hygiene.md

SKILL.md

Similar Skills

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 30, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

LangChain Local Dev Loop (Python)

Overview

An engineer writes the most natural assertion possible:

def test_summarize():
    out = chain.invoke({"text": "..."})
    assert out.content == "expected summary"

It passes locally against Claude at temperature=0. It fails in CI on the third run with a one-token delta in the output. That is P05: Anthropic's temperature=0 is not greedy — it still samples. Tests against live Claude are not deterministic, period.

So the engineer swaps in FakeListChatModel(responses=["expected summary"]) and the assertion passes. Then the downstream callback that logs cost blows up in CI with KeyError: 'token_usage' — because FakeListChatModel does not emit response_metadata["token_usage"] (P43). Production code reads that key, so either the fake has to synthesize it or the test has to skip the callback.

Meanwhile, the first integration test under VCR records a cassette that ships Authorization: Bearer sk-ant-api03-... in the repo (P44). PR review catches it; the reviewer revokes the key; the dev loop is hosed for an afternoon.

And none of this matters if pytest cannot even collect the suite because import langchain_community emits a DeprecationWarning that -W error promotes to failure (P45).

This skill installs the four layers that make the whole loop fast and safe: FakeListChatModel / FakeListLLM with a metadata-emitting subclass (fixes P43); VCR with filter_headers plus a pre-commit hook (fixes P44); pytest filterwarnings policy in pyproject.toml (fixes P45); and an env-var-gated integration marker so the default pytest run never touches live APIs.

Speed targets: unit tests with FakeListChatModel run in < 100ms per test; VCR-replayed integration tests run in 500ms – 2s per test; live integration tests (the RUN_INTEGRATION=1 gate) run only in nightly or manual workflows.

Pin: langchain-core 1.0.x, langgraph 1.0.x, pytest current, vcrpy current. Pain-catalog anchors: P05, P43, P44, P45.

Prerequisites

Python 3.10+
pip install langchain-core>=1.0,<2.0 langgraph>=1.0,<2.0 pytest vcrpy pytest-recording
For integration tests: at least one provider key (ANTHROPIC_API_KEY, etc.)
Project uses pyproject.toml (PEP 621) for pytest config

Instructions

Step 1 — Deterministic unit tests with `FakeListChatModel`

Use FakeListChatModel from langchain_core.language_models.fake for chat chains and FakeListLLM for legacy completion LLMs. Responses cycle through the list.

from langchain_core.language_models.fake import FakeListChatModel
from langchain_core.prompts import ChatPromptTemplate

def test_classifier_picks_positive():
    fake = FakeListChatModel(responses=["positive"])
    prompt = ChatPromptTemplate.from_messages([("user", "Classify: {text}")])
    chain = prompt | fake
    out = chain.invoke({"text": "I love it"})
    assert out.content == "positive"

This is deterministic, runs in single-digit milliseconds, and has zero provider dependency. Use it for every chain assertion that does not specifically require real model behavior.

Step 2 — Subclass `FakeListChatModel` to emit `response_metadata` (P43 fix)

The stock fake emits no response_metadata["token_usage"]. If your chain has a callback that records cost, the callback crashes under the fake. Subclass and synthesize the metadata instead of mocking around the callback:

from langchain_core.language_models.fake import FakeListChatModel
from langchain_core.outputs import ChatGeneration, ChatResult
from langchain_core.messages import AIMessage

class FakeChatWithUsage(FakeListChatModel):
    """FakeListChatModel that emits response_metadata['token_usage'] so
    downstream callbacks reading token usage do not crash under test."""

    def _generate(self, messages, stop=None, run_manager=None, **kwargs):
        response = self.responses[self.i % len(self.responses)]
        self.i += 1
        message = AIMessage(
            content=response,
            response_metadata={
                "token_usage": {
                    "input_tokens": 10,
                    "output_tokens": len(response.split()),
                    "total_tokens": 10 + len(response.split()),
                },
                "model_name": "fake-chat",
            },
            usage_metadata={
                "input_tokens": 10,
                "output_tokens": len(response.split()),
                "total_tokens": 10 + len(response.split()),
            },
        )
        return ChatResult(generations=[ChatGeneration(message=message)])

Use FakeChatWithUsage whenever a chain's observability / cost path is in the assertion surface. See Fake Model Fixtures for agent, retriever, and embedder fakes.

Step 3 — pytest fixtures that wire the fake into chains

Put fixtures in tests/conftest.py so they are shared across the suite:

# tests/conftest.py
import pytest
from langchain_core.prompts import ChatPromptTemplate
from tests.fakes import FakeChatWithUsage

@pytest.fixture
def fake_chat():
    """Reusable fake chat model. Override responses per-test via
    monkeypatch.setattr(fake_chat, 'responses', [...])."""
    return FakeChatWithUsage(responses=["ok"])

@pytest.fixture
def summarize_chain(fake_chat):
    prompt = ChatPromptTemplate.from_messages([
        ("system", "Summarize the user's text in one line."),
        ("user", "{text}"),
    ])
    return prompt | fake_chat

Per-test response override:

def test_summary_shape(summarize_chain, fake_chat):
    fake_chat.responses = ["short summary"]
    out = summarize_chain.invoke({"text": "long input"})
    assert out.content == "short summary"

Step 4 — VCR cassettes for integration tests with key redaction (P44 fix)

Unit tests should never touch the network. Integration tests do, exactly once — to record a cassette — and every subsequent run replays from the cassette file. vcrpy records headers by default, which means Authorization: Bearer sk-... lands in the fixture unless you filter it.

Configure VCR in tests/conftest.py:

# tests/conftest.py (continued)
import pytest

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "filter_headers": [
            "authorization",
            "x-api-key",
            "anthropic-version",
            "openai-organization",
            "cookie",
        ],
        "filter_query_parameters": ["api_key"],
        # Block accidental re-recording in CI:
        "record_mode": "none",
    }

Use pytest-recording:

import pytest

@pytest.mark.vcr  # cassette at tests/cassettes/<test_name>.yaml
@pytest.mark.integration
def test_live_claude_short_answer():
    from langchain_anthropic import ChatAnthropic
    chat = ChatAnthropic(model="claude-sonnet-4-6", temperature=0, timeout=30)
    out = chat.invoke("Say 'ok' and nothing else.")
    assert "ok" in out.content.lower()

To record (once, locally, with a real key): pytest --record-mode=once tests/. Every other run replays — cassettes are committed, real API is never hit again.

Pre-commit hook to block key leaks:

# .git/hooks/pre-commit or .pre-commit-config.yaml entry
#!/usr/bin/env bash
set -e
if git diff --cached --name-only | grep -q '^tests/cassettes/'; then
    if git diff --cached -U0 -- 'tests/cassettes/' | \
       grep -E '(sk-ant-[a-zA-Z0-9_-]+|sk-[a-zA-Z0-9]{20,}|Bearer\s+[a-zA-Z0-9_-]{20,})'; then
        echo "ERROR: API key pattern found in staged cassette." >&2
        exit 1
    fi
fi

See VCR Cassette Hygiene for the full pre-commit config, record-new-episodes flow, shared-cassette patterns, and the PR review checklist.

Step 5 — Pytest warnings + markers in `pyproject.toml` (P45 fix)

langchain_community and some provider SDKs emit DeprecationWarning at import time. If the suite runs -W error, collection fails before any test does. Set the policy once in pyproject.toml:

[tool.pytest.ini_options]
minversion = "8.0"
testpaths = ["tests"]
addopts = [
    "-ra",
    "--strict-markers",
    "--strict-config",
    "-W", "error",
]
markers = [
    "integration: hits real APIs or replays VCR cassettes (set RUN_INTEGRATION=1)",
    "slow: takes > 1s per test",
    "smoke: minimal healthcheck run in CI",
]
filterwarnings = [
    "error",
    "ignore::DeprecationWarning:langchain_community.*",
    "ignore::DeprecationWarning:pydantic.*",
    "ignore::PendingDeprecationWarning:langchain_core.*",
]

See Pytest Config for the full skeleton including coverage config and parallel execution notes.

Step 6 — Integration-test gating via env var

Default pytest must never hit real APIs. Gate on RUN_INTEGRATION=1:

# tests/conftest.py (continued)
import os
import pytest

def pytest_collection_modifyitems(config, items):
    if os.getenv("RUN_INTEGRATION") == "1":
        return
    skip_integration = pytest.mark.skip(reason="set RUN_INTEGRATION=1 to run")
    for item in items:
        if "integration" in item.keywords:
            item.add_marker(skip_integration)

CI default: pytest (unit only). Nightly / manual: RUN_INTEGRATION=1 pytest -m integration.

Step 7 — LangGraph tests: per-test `thread_id` + state assertions

LangGraph state is scoped to a thread_id. Tests that share a thread_id leak state between each other. Give every test a fresh thread_id and a fresh MemorySaver:

from langgraph.checkpoint.memory import MemorySaver
import uuid, pytest

@pytest.fixture
def graph_config():
    return {"configurable": {"thread_id": str(uuid.uuid4())}}

@pytest.fixture
def checkpointed_graph(fake_chat):
    from my_app.graphs import build_graph
    return build_graph(fake_chat).compile(checkpointer=MemorySaver())

def test_node_emits_plan(checkpointed_graph, graph_config, fake_chat):
    fake_chat.responses = ["step 1\nstep 2\nstep 3"]
    result = checkpointed_graph.invoke({"goal": "deploy"}, graph_config)
    # Assert state shape per node, not just the final output:
    assert result["plan"] == ["step 1", "step 2", "step 3"]
    # Time-travel: inspect every checkpoint for debugging
    history = list(checkpointed_graph.get_state_history(graph_config))
    assert history[-1].values == {"goal": "deploy"}  # initial state

Subgraph isolation testing cross-references langchain-langgraph-subgraphs (pain P21 — parent cannot read child state unless the key is in the parent schema). See LangGraph Test Patterns for the subgraph-shared-state test recipe.

Output

tests/fakes.py with FakeChatWithUsage subclass that emits response_metadata
tests/conftest.py with fake-model fixtures, VCR config, and RUN_INTEGRATION gate
pyproject.toml [tool.pytest.ini_options] block with markers and filterwarnings
tests/cassettes/ committed with filtered headers (no Authorization / x-api-key)
Pre-commit hook grepping cassettes for sk- / sk-ant- / Bearer patterns
LangGraph tests with per-test thread_id and MemorySaver — no cross-test leakage

Test-type matrix

Type	Model	Network	Target speed	Determinism	Use case
Unit	`FakeListChatModel` / `FakeChatWithUsage`	none	< 100ms	total	Chain shape, parser, routing logic
Integration (VCR)	real model, replayed cassette	replay only	500ms – 2s	total (once recorded)	End-to-end chain behavior, provider-specific edge cases
Integration (live)	real model	live API	2s – 30s	probabilistic (P05)	Nightly smoke, recording new cassettes, provider regression
Smoke	real model, minimal prompt	live API	< 5s	probabilistic	CI healthcheck — 1 test per provider, gated on `RUN_INTEGRATION=1`
Load	real model	live API	minutes	probabilistic	Throughput / retry-storm reproduction, never in PR CI

Error Handling

Error	Cause	Fix
`AssertionError` on content despite `temperature=0`	Anthropic `temperature=0` still samples (P05)	Switch to `FakeListChatModel` or VCR replay
`KeyError: 'token_usage'` under fake model	`FakeListChatModel` emits no `response_metadata` (P43)	Use `FakeChatWithUsage` subclass from Step 2
PR review flags `Authorization: Bearer sk-...` in cassette	VCR recorded headers by default (P44)	Set `filter_headers` before recording; re-record; add pre-commit grep hook
`pytest` fails at collection with `DeprecationWarning`	`-W error` + SDK import warnings (P45)	Add `filterwarnings = ["ignore::DeprecationWarning:langchain_community.*"]`
`vcr.errors.CannotOverwriteExistingCassetteException`	Test changed request shape but cassette is stale	`pytest --record-mode=new_episodes` locally, inspect diff, commit
LangGraph test pollutes next test's state	Shared `thread_id` + shared `MemorySaver`	Per-test `thread_id=uuid.uuid4()`, per-test `MemorySaver()`

Examples

A flaky chain assertion, fixed in three commits

Commit 1 — failing test uses real ChatAnthropic, passes locally, fails 1-in-5 in CI at temperature=0 (P05).
Commit 2 — swap to fake model uses FakeListChatModel, passes deterministically, but the cost-logging callback crashes (P43).
Commit 3 — fake with metadata uses FakeChatWithUsage, the callback reads response_metadata["token_usage"] cleanly, the test is green and runs in 40ms.

See Fake Model Fixtures for the full worked example including agent and retriever fakes.

Recording a cassette without leaking a key

# 1. Ensure conftest.py has filter_headers configured FIRST
# 2. Record with real key present in the environment
ANTHROPIC_API_KEY=sk-ant-... pytest --record-mode=once tests/integration/test_summarize.py
# 3. Verify no leak
grep -E 'sk-|Bearer' tests/cassettes/*.yaml && echo "LEAK" || echo "clean"
# 4. Commit cassettes/ — pre-commit hook runs the same grep as a hard gate
git add tests/cassettes/ && git commit -m "test: record summarize cassette"

See VCR Cassette Hygiene for record-new-episodes mode, rerecord-on-mismatch, and the PR review checklist.

LangGraph time-travel debugging on a failing test

When a graph test fails mid-graph, get_state_history(config) returns every checkpoint — you can replay from any point by passing its config.checkpoint_id back into graph.invoke. See LangGraph Test Patterns for the full time-travel debugging recipe and the subgraph-shared-state test pattern (cross-ref langchain-langgraph-subgraphs / pain L30).

Resources

LangChain Python: testing guide
FakeListChatModel API
vcrpy documentation
pytest-recording
LangGraph MemorySaver + get_state_history
Pytest filterwarnings
Pack pain catalog: docs/pain-catalog.md (entries P05, P43, P44, P45)

langchain-local-dev-loop

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

langchain-local-dev-loop

Tool Access

Preview

Supporting Assets

SKILL.md

LangChain Local Dev Loop (Python)

Overview

Prerequisites

Instructions

Step 1 — Deterministic unit tests with FakeListChatModel

Step 2 — Subclass FakeListChatModel to emit response_metadata (P43 fix)

Step 3 — pytest fixtures that wire the fake into chains

Step 4 — VCR cassettes for integration tests with key redaction (P44 fix)

Step 5 — Pytest warnings + markers in pyproject.toml (P45 fix)

Step 6 — Integration-test gating via env var

Step 7 — LangGraph tests: per-test thread_id + state assertions

Output

Test-type matrix

Error Handling

Examples

A flaky chain assertion, fixed in three commits

Recording a cassette without leaking a key

LangGraph time-travel debugging on a failing test

Resources

Similar Skills

Help us improve

LangChain Local Dev Loop (Python)

Overview

Prerequisites

Instructions

Step 1 — Deterministic unit tests with FakeListChatModel

Step 2 — Subclass FakeListChatModel to emit response_metadata (P43 fix)

Step 3 — pytest fixtures that wire the fake into chains

Step 4 — VCR cassettes for integration tests with key redaction (P44 fix)

Step 5 — Pytest warnings + markers in pyproject.toml (P45 fix)

Step 6 — Integration-test gating via env var

Step 7 — LangGraph tests: per-test thread_id + state assertions

Output

Test-type matrix

Error Handling

Examples

A flaky chain assertion, fixed in three commits

Recording a cassette without leaking a key

LangGraph time-travel debugging on a failing test

Resources

Step 1 — Deterministic unit tests with `FakeListChatModel`

Step 2 — Subclass `FakeListChatModel` to emit `response_metadata` (P43 fix)

Step 5 — Pytest warnings + markers in `pyproject.toml` (P45 fix)

Step 7 — LangGraph tests: per-test `thread_id` + state assertions

Step 1 — Deterministic unit tests with `FakeListChatModel`

Step 2 — Subclass `FakeListChatModel` to emit `response_metadata` (P43 fix)

Step 5 — Pytest warnings + markers in `pyproject.toml` (P45 fix)

Step 7 — LangGraph tests: per-test `thread_id` + state assertions