From langsmith-skills
Creates, manages, and uploads evaluation datasets to LangSmith using CLI and SDK. Handles types like final_response, single_step, trajectory, RAG for LLM testing.
npx claudepluginhub langchain-ai/langsmith-skillsThis skill uses the workspace's default tool permissions.
<oneliner>
Builds LangSmith evaluation pipelines: create LLM-as-Judge/custom evaluators, capture agent outputs/trajectories via run functions, run locally with evaluate() or CLI.
Fetches, organizes, and analyzes LangSmith traces for LLM debugging: query by project/metadata/status/time, download JSON, bucket outcomes, examine token/tool patterns, compare pass/fail.
Executes Hugging Face Dataset Viewer API calls to validate datasets, list splits/configs, preview/paginate rows, search text, filter rows, retrieve parquet URLs, sizes, and statistics.
Share bugs, ideas, or general feedback.
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # REQUIRED
LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keys
Authentication is REQUIRED: either set the LANGSMITH_API_KEY environment variable, or pass the --api-key flag to CLI commands (preferred):
langsmith dataset list --api-key $LANGSMITH_API_KEY
IMPORTANT: Always check the environment variables or .env file for LANGSMITH_PROJECT before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.
Python Dependencies
pip install langsmith
JavaScript Dependencies
npm install langsmith
CLI Tool
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
Use the `langsmith` CLI to manage datasets and examples.
langsmith dataset list - List datasets in LangSmithlangsmith dataset get <name-or-id> - View dataset detailslangsmith dataset create --name <name> - Create a new empty datasetlangsmith dataset delete <name-or-id> - Delete a datasetlangsmith dataset export <name-or-id> <output-file> - Export dataset to local JSON filelangsmith dataset upload <file> --name <name> - Upload a local JSON file as a datasetlangsmith example list --dataset <name> - List examples in a datasetlangsmith example create --dataset <name> --inputs <json> - Add an example to a datasetlangsmith example delete <example-id> - Delete an examplelangsmith experiment list --dataset <name> - List experiments for a datasetlangsmith experiment get <name> - View experiment results--limit N - Limit number of results--yes - Skip confirmation prompts (use with caution)IMPORTANT - Safety Prompts:
--yes unless the user explicitly requests it--yes to skip confirmation prompts
<dataset_types_overview> Common evaluation dataset types:
<creating_datasets>
Datasets are JSON files with an array of examples. Each example has inputs and outputs.
Export traces first, then process them into dataset format using code:
# 1. Export traces to JSONL files
langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY
```python
import json
from pathlib import Path
from langsmith import Client
client = Client()
examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")] root = next((r for r in runs if r.get("parent_run_id") is None), None) if root and root.get("inputs") and root.get("outputs"): examples.append({ "trace_id": root.get("trace_id"), "inputs": root["inputs"], "outputs": root["outputs"] })
with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)
</python>
<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";
const client = new Client();
// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));
for (const file of files) {
const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
const runs = lines.map(line => JSON.parse(line));
const root = runs.find(r => r.parent_run_id == null);
if (root?.inputs && root?.outputs) {
examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
}
}
// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));
# Upload local JSON file as a dataset
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset" --api-key $LANGSMITH_API_KEY
client = Client()
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )
</python>
<typescript>
```typescript
import { Client } from "langsmith";
const client = new Client();
// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
description: "Evaluation dataset",
});
await client.createExamples({
inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
datasetName: "My Dataset",
});
<dataset_structures>
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}
</dataset_structures>
<script_usage>
# List all datasets
langsmith dataset list --api-key $LANGSMITH_API_KEY
# Get dataset details
langsmith dataset get "My Dataset" --api-key $LANGSMITH_API_KEY
# Create an empty dataset
langsmith dataset create --name "New Dataset" --description "For evaluation" --api-key $LANGSMITH_API_KEY
# Upload a local JSON file
langsmith dataset upload /tmp/dataset.json --name "My Dataset" --api-key $LANGSMITH_API_KEY
# Export a dataset to local file
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100 --api-key $LANGSMITH_API_KEY
# Delete a dataset
langsmith dataset delete "My Dataset" --api-key $LANGSMITH_API_KEY
# List examples in a dataset
langsmith example list --dataset "My Dataset" --limit 10 --api-key $LANGSMITH_API_KEY
# Add an example
langsmith example create --dataset "My Dataset" \
--inputs '{"query": "test"}' \
--outputs '{"answer": "result"}' --api-key $LANGSMITH_API_KEY
# List experiments
langsmith experiment list --dataset "My Dataset" --api-key $LANGSMITH_API_KEY
langsmith experiment get "eval-v1" --api-key $LANGSMITH_API_KEY
</script_usage>
<example_workflow> Complete workflow from traces to uploaded LangSmith dataset:
# 1. Export traces from LangSmith
langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY
# 2. Process traces into dataset format (using Python/JS code)
# See "Creating Datasets" section above
# 3. Upload to LangSmith
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" --api-key $LANGSMITH_API_KEY
langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory" --api-key $LANGSMITH_API_KEY
# 4. Verify upload
langsmith dataset list --api-key $LANGSMITH_API_KEY
langsmith dataset get "Skills: Final Response" --api-key $LANGSMITH_API_KEY
langsmith example list --dataset "Skills: Final Response" --limit 3 --api-key $LANGSMITH_API_KEY
# 5. Run experiments
langsmith experiment list --dataset "Skills: Final Response" --api-key $LANGSMITH_API_KEY
</example_workflow>
**Dataset upload fails:** - Verify LANGSMITH_API_KEY is set - Check JSON file is valid: each element needs `inputs` (and optionally `outputs`) - Dataset name must be unique, or delete existing first with `langsmith dataset delete`Empty dataset after upload:
inputs keylangsmith example list --dataset "Name"Export has no data:
--full flag to include inputs/outputsinputs and outputs populatedExample count mismatch:
langsmith dataset get "Name" to check remote count