From deepagents-skills
Tests LangGraph/LangChain agents with unit/integration tests, trajectory evaluation, LangSmith datasets, and A/B comparisons using Python/pytest or JS/TS/vitest scripts.
npx claudepluginhub lubu-labs/langchain-agent-skills --plugin langgraph-skillsThis skill uses the workspace's default tool permissions.
Practical workflows for validating agent quality with:
assets/datasets/sample_dataset.jsonassets/examples/README.mdassets/templates/test_template.pyreferences/ab-testing.mdreferences/langsmith-evaluation.mdreferences/trajectory-evaluation.mdreferences/unit-testing-patterns.mdscripts/compare_agents.jsscripts/compare_agents.pyscripts/evaluate_with_langsmith.jsscripts/evaluate_with_langsmith.pyscripts/generate_test_cases.jsscripts/generate_test_cases.pyscripts/mock_llm_responses.jsscripts/mock_llm_responses.pyscripts/run_trajectory_eval.jsscripts/run_trajectory_eval.pyGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Practical workflows for validating agent quality with:
Use this file for high-level flow. Load references/* for detailed implementation.
Choose the smallest approach that answers your question:
| Goal | Primary method | Load first |
|---|---|---|
| Validate node logic quickly | Unit tests with mocks | references/unit-testing-patterns.md |
| Validate multi-step agent behavior | Trajectory evaluation | references/trajectory-evaluation.md |
| Track quality over datasets over time | LangSmith evaluation | references/langsmith-evaluation.md |
| Compare old vs new agent versions | A/B comparison | references/ab-testing.md |
Recommended order:
Run from repo root.
# Python (preferred)
uv run skills/langgraph-testing-evaluation/scripts/generate_test_cases.py my_agent:graph --output tests/ --framework pytest
# JavaScript/TypeScript
node skills/langgraph-testing-evaluation/scripts/generate_test_cases.js ./my-agent.ts:graph --output tests/ --framework vitest
# Python: LLM-as-judge
uv run skills/langgraph-testing-evaluation/scripts/run_trajectory_eval.py my_agent:run_agent my_dataset --method llm-judge --model openai:o3-mini
# Python: trajectory match
uv run skills/langgraph-testing-evaluation/scripts/run_trajectory_eval.py my_agent:run_agent dataset.json --method match --trajectory-match-mode strict --reference-trajectory reference.json
# JavaScript/TypeScript
node skills/langgraph-testing-evaluation/scripts/run_trajectory_eval.js ./agent.ts:runAgent my_dataset --method llm-judge --model openai:o3-mini --max-concurrency 4
# Python
uv run skills/langgraph-testing-evaluation/scripts/evaluate_with_langsmith.py my_agent:run_agent my_dataset --evaluators accuracy,latency --max-concurrency 4
# Python (do not upload experiment results)
uv run skills/langgraph-testing-evaluation/scripts/evaluate_with_langsmith.py my_agent:run_agent my_dataset --evaluators accuracy --no-upload
# JavaScript/TypeScript
node skills/langgraph-testing-evaluation/scripts/evaluate_with_langsmith.js ./agent.ts:runAgent my_dataset --evaluators accuracy,latency --max-concurrency 4
# Python
uv run skills/langgraph-testing-evaluation/scripts/compare_agents.py my_agent:v1 my_agent:v2 dataset.json --output comparison_report.json
# JavaScript/TypeScript
node skills/langgraph-testing-evaluation/scripts/compare_agents.js ./v1.ts:run ./v2.ts:run dataset.json --output comparison_report.json
# JavaScript/TypeScript (force local dataset file only)
node skills/langgraph-testing-evaluation/scripts/compare_agents.js ./v1.ts:run ./v2.ts:run dataset.json --no-langsmith
# Python
uv run skills/langgraph-testing-evaluation/scripts/mock_llm_responses.py create --type sequence --output mock_config.json
# JavaScript/TypeScript
node skills/langgraph-testing-evaluation/scripts/mock_llm_responses.js create --type sequence --output mock_config.json
inputs and outputs objects (optional metadata).input, output) for legacy datasets.references/unit-testing-patterns.mdLoad when:
references/trajectory-evaluation.mdLoad when:
strict, unordered, subset, superset).references/langsmith-evaluation.mdLoad when:
references/ab-testing.mdLoad when:
assets/templates/test_template.pythread_idcompiled_graph.nodes[...]assets/datasets/sample_dataset.jsonexamples: [{ inputs, outputs, metadata }] format.assets/examples/README.mdscripts/generate_test_cases.py / .jsUse for fast test scaffolding.
Inputs:
my_module:graph or my_module.graph./file.ts:graphOutputs:
scripts/run_trajectory_eval.py / .jsUse for trajectory scoring with either:
--method match--method llm-judgeSupports:
.json)--reference-trajectorystrict, unordered, subset, supersetLocal-only mode:
--no-langsmith in both Python and JavaScript scripts (requires local JSON dataset file)scripts/evaluate_with_langsmith.py / .jsUse for dataset-based evaluation runs and experiment tracking.
Supports:
--evaluators accuracy,latency,...)--max-concurrency)Python-only:
--no-upload to run without uploading experiment resultsscripts/compare_agents.py / .jsUse for offline version comparisons:
--no-langsmith to disable remote loading)scripts/mock_llm_responses.py / .jsUse for deterministic test doubles:
If behavior is deterministic and local:
If behavior depends on tool sequence/routing:
If behavior depends on realistic distribution quality:
If approving a replacement model/prompt/graph:
unordered, subset, or superset) where appropriate.